Sophie

Sophie

distrib > Mageia > 5 > x86_64 > media > nonfree-updates > by-pkgid > fd8445e7e4d58b8cfe6e0150bd441ee1 > files > 1269

nvidia-cuda-toolkit-devel-6.5.14-6.1.mga5.nonfree.x86_64.rpm

<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-us" xml:lang="en-us">
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta>
      <meta http-equiv="X-UA-Compatible" content="IE=edge"></meta>
      <meta name="copyright" content="(C) Copyright 2005"></meta>
      <meta name="DC.rights.owner" content="(C) Copyright 2005"></meta>
      <meta name="DC.Type" content="concept"></meta>
      <meta name="DC.Title" content="Host API Overview"></meta>
      <meta name="DC.Format" content="XHTML"></meta>
      <meta name="DC.Identifier" content="host-api-overview"></meta>
      <meta name="DC.Language" content="en-us"></meta>
      <link rel="stylesheet" type="text/css" href="../common/formatting/commonltr.css"></link>
      <link rel="stylesheet" type="text/css" href="../common/formatting/site.css"></link>
      <title>cuRAND :: CUDA Toolkit Documentation</title>
      <!--[if lt IE 9]>
      <script src="../common/formatting/html5shiv-printshiv.min.js"></script>
      <![endif]-->
      <script type="text/javascript" charset="utf-8" src="../common/scripts/tynt/tynt.js"></script>
      <script type="text/javascript" charset="utf-8" src="../common/formatting/jquery.min.js"></script>
      <script type="text/javascript" charset="utf-8" src="../common/formatting/jquery.ba-hashchange.min.js"></script>
      <script type="text/javascript" charset="utf-8" src="../common/formatting/jquery.scrollintoview.min.js"></script>
      <script type="text/javascript" src="../search/htmlFileList.js"></script>
      <script type="text/javascript" src="../search/htmlFileInfoList.js"></script>
      <script type="text/javascript" src="../search/nwSearchFnt.min.js"></script>
      <script type="text/javascript" src="../search/stemmers/en_stemmer.min.js"></script>
      <script type="text/javascript" src="../search/index-1.js"></script>
      <script type="text/javascript" src="../search/index-2.js"></script>
      <script type="text/javascript" src="../search/index-3.js"></script>
      <link rel="canonical" href="http://docs.nvidia.com/cuda/curand/index.html"></link>
      <link rel="stylesheet" type="text/css" href="../common/formatting/qwcode.highlight.css"></link>
   </head>
   <body>
      
      <header id="header"><span id="company">NVIDIA</span><span id="site-title">CUDA Toolkit Documentation</span><form id="search" method="get" action="search">
            <input type="text" name="search-text"></input><fieldset id="search-location">
               <legend>Search In:</legend>
               <label><input type="radio" name="search-type" value="site"></input>Entire Site</label>
               <label><input type="radio" name="search-type" value="document"></input>Just This Document</label></fieldset>
            <button type="reset">clear search</button>
            <button id="submit" type="submit">search</button></form>
      </header>
      <div id="site-content">
         <nav id="site-nav">
            <div class="category closed"><a href="../index.html" title="The root of the site.">CUDA Toolkit
                  v6.5</a></div>
            <div class="category"><a href="index.html" title="cuRAND">cuRAND</a></div>
            <ul>
               <li>
                  <div class="section-link"><a href="introduction.html#introduction">Introduction</a></div>
               </li>
               <li>
                  <div class="section-link"><a href="compatibility-and-versioning.html#compatibility-and-versioning">1.&nbsp;Compatibility and Versioning</a></div>
               </li>
               <li>
                  <div class="section-link"><a href="host-api-overview.html#host-api-overview">2.&nbsp;Host API Overview</a></div>
                  <ul>
                     <li>
                        <div class="section-link"><a href="host-api-overview.html#generator-types">2.1.&nbsp;Generator Types</a></div>
                     </li>
                     <li>
                        <div class="section-link"><a href="host-api-overview.html#generator-options">2.2.&nbsp;Generator Options</a></div>
                        <ul>
                           <li>
                              <div class="section-link"><a href="host-api-overview.html#seed">2.2.1.&nbsp;Seed</a></div>
                           </li>
                           <li>
                              <div class="section-link"><a href="host-api-overview.html#offset">2.2.2.&nbsp;Offset</a></div>
                           </li>
                           <li>
                              <div class="section-link"><a href="host-api-overview.html#order">2.2.3.&nbsp;Order</a></div>
                           </li>
                        </ul>
                     </li>
                     <li>
                        <div class="section-link"><a href="host-api-overview.html#return-values">2.3.&nbsp;Return Values</a></div>
                     </li>
                     <li>
                        <div class="section-link"><a href="host-api-overview.html#generation-functions">2.4.&nbsp;Generation Functions</a></div>
                     </li>
                     <li>
                        <div class="section-link"><a href="host-api-overview.html#host-api-example">2.5.&nbsp;Host API Example</a></div>
                     </li>
                     <li>
                        <div class="section-link"><a href="host-api-overview.html#static-library">2.6.&nbsp;Static Library support</a></div>
                     </li>
                     <li>
                        <div class="section-link"><a href="host-api-overview.html#performance-notes2">2.7.&nbsp;Performance Notes</a></div>
                     </li>
                  </ul>
               </li>
               <li>
                  <div class="section-link"><a href="device-api-overview.html#device-api-overview">3.&nbsp;Device API Overview</a></div>
                  <ul>
                     <li>
                        <div class="section-link"><a href="device-api-overview.html#pseudorandom-sequences">3.1.&nbsp;Pseudorandom Sequences</a></div>
                        <ul>
                           <li>
                              <div class="section-link"><a href="device-api-overview.html#bit-generation-1">3.1.1.&nbsp;Bit Generation with XORWOW and MRG32k3a generators</a></div>
                           </li>
                           <li>
                              <div class="section-link"><a href="device-api-overview.html#bit-generation-2">3.1.2.&nbsp;Bit Generation with the MTGP32 generator</a></div>
                           </li>
                           <li>
                              <div class="section-link"><a href="device-api-overview.html#bit-generation-3">3.1.3.&nbsp;Bit Generation with Philox_4x32_10 generator</a></div>
                           </li>
                           <li>
                              <div class="section-link"><a href="device-api-overview.html#distributions">3.1.4.&nbsp;Distributions</a></div>
                           </li>
                        </ul>
                     </li>
                     <li>
                        <div class="section-link"><a href="device-api-overview.html#quasirandom-sequences">3.2.&nbsp;Quasirandom Sequences</a></div>
                     </li>
                     <li>
                        <div class="section-link"><a href="device-api-overview.html#skip-ahead">3.3.&nbsp;Skip-Ahead</a></div>
                     </li>
                     <li>
                        <div class="section-link"><a href="device-api-overview.html#device-api-for-discrete-distributions">3.4.&nbsp;Device API for discrete distributions</a></div>
                     </li>
                     <li>
                        <div class="section-link"><a href="device-api-overview.html#performance-notes">3.5.&nbsp;Performance Notes</a></div>
                     </li>
                     <li>
                        <div class="section-link"><a href="device-api-overview.html#device-api-example">3.6.&nbsp;Device API Examples</a></div>
                     </li>
                     <li>
                        <div class="section-link"><a href="device-api-overview.html#thrust-and-curand-example">3.7.&nbsp;Thrust and cuRAND Example</a></div>
                     </li>
                     <li>
                        <div class="section-link"><a href="device-api-overview.html#poisson-api-example">3.8.&nbsp;Poisson API Example</a></div>
                     </li>
                  </ul>
               </li>
               <li>
                  <div class="section-link"><a href="testing.html#testing">4.&nbsp;Testing</a></div>
               </li>
               <li>
                  <div class="section-link"><a href="modules.html#modules">5.&nbsp;Modules</a></div>
                  <ul>
                     <li>
                        <div class="section-link"><a href="group__HOST.html#group__HOST">5.1.&nbsp;Host API</a></div>
                     </li>
                     <li>
                        <div class="section-link"><a href="group__DEVICE.html#group__DEVICE">5.2.&nbsp;Device API</a></div>
                     </li>
                  </ul>
               </li>
               <li>
                  <div class="section-link"><a href="bibliography.html#bibliography">A.&nbsp;Bibliography</a></div>
               </li>
               <li>
                  <div class="section-link"><a href="acknowledgements.html#acknowledgements">B.&nbsp;Acknowledgements</a></div>
               </li>
               <li>
                  <div class="section-link"><a href="notices-header.html#notices-header">Notices</a></div>
                  <ul></ul>
               </li>
            </ul>
         </nav>
         <div id="resize-nav"></div>
         <nav id="search-results">
            <h2>Search Results</h2>
            <ol></ol>
         </nav>
         
         <div id="contents-container">
            <div id="breadcrumbs-container">
               <div id="eqn-warning">This document includes math equations
                  (highlighted in red) which are best viewed with <a target="_blank" href="https://www.mozilla.org/firefox">Firefox</a> version 4.0
                  or higher, or another <a target="_blank" href="http://www.w3.org/Math/Software/mathml_software_cat_browsers.html">MathML-aware
                     browser</a>. There is also a <a href="../../pdf/CURAND_Library.pdf">PDF version of this document</a>.
                  
               </div>
               <div id="breadcrumbs"><a href="compatibility-and-versioning.html" shape="rect">&lt; Previous</a> | <a href="device-api-overview.html" shape="rect">Next &gt;</a></div>
               <div id="release-info">cuRAND
                  (<a href="../../pdf/CURAND_Library.pdf">PDF</a>)
                  -
                  
                  v6.5
                  (<a href="https://developer.nvidia.com/cuda-toolkit-archive">older</a>)
                  -
                  Last updated August 1, 2014
                  -
                  <a href="mailto:cudatools@nvidia.com?subject=CUDA Toolkit Documentation Feedback: cuRAND">Send Feedback</a>
                  -
                  <span class="st_facebook"></span><span class="st_twitter"></span><span class="st_linkedin"></span><span class="st_reddit"></span><span class="st_slashdot"></span><span class="st_tumblr"></span><span class="st_sharethis"></span></div>
            </div>
            <article id="contents">
               <div class="topic nested1" id="host-api-overview"><a name="host-api-overview" shape="rect">
                     <!-- --></a><h2 class="topictitle2">2.&nbsp;Host API Overview</h2>
                  <div class="body conbody">
                     <p class="p">To use the host API, user code should include the library header file <samp class="ph codeph">curand.h</samp> and dynamically link against the cuRAND library. The library uses the CUDA runtime, so user code must also use the runtime.
                        The CUDA driver API is not supported by cuRAND.
                     </p>
                     <p class="p">Random numbers are produced by generators. A generator in cuRAND encapsulates all the internal state necessary to produce
                        a sequence of pseudorandom or quasirandom numbers. The normal sequence of operations is as follows:
                     </p>
                     <p class="p">1. Create a new generator of the desired type (see <a class="xref" href="host-api-overview.html#generator-types" shape="rect">Generator Types</a> ) with <samp class="ph codeph">curandCreateGenerator()</samp>.
                     </p>
                     <p class="p">2. Set the generator options (see <a class="xref" href="host-api-overview.html#generator-options" shape="rect">Generator Options</a>); for example, use <samp class="ph codeph">curandSetPseudoRandomGeneratorSeed()</samp> to set the seed.
                     </p>
                     <p class="p">3. Allocate memory on the device with <samp class="ph codeph">cudaMalloc()</samp>.
                     </p>
                     <p class="p">4. Generate random numbers with <samp class="ph codeph">curandGenerate()</samp> or another generation function.
                     </p>
                     <p class="p">5. Use the results.</p>
                     <p class="p">6. If desired, generate more random numbers with more calls to <samp class="ph codeph">curandGenerate()</samp>.
                     </p>
                     <p class="p">7. Clean up with <samp class="ph codeph">curandDestroyGenerator()</samp>.
                     </p>
                     <p class="p">To generate random numbers on the host CPU, in step one above call <samp class="ph codeph">curandCreateGeneratorHost()</samp>, and in step three, allocate a host memory buffer to receive the results. All other calls work identically whether you are
                        generating random numbers on the device or on the host CPU.
                     </p>
                     <p class="p">It is legal to create several generators at the same time. Each generator encapsulates a separate state and is independent
                        of all other generators. The sequence of numbers produced by each generator is deterministic. Given the same set-up parameters,
                        the same sequence will be generated with every run of the program. Generating random numbers on the device will result in
                        the same sequence as generating them on the host CPU.
                     </p>
                     <p class="p">Note that <samp class="ph codeph">curandGenerate()</samp> in step 4 above launches a kernel and returns asynchronously. If you launch another kernel in a different stream, and that
                        kernel needs to use the results of curandGenerate(), you must either call <samp class="ph codeph">cudaThreadSynchronize()</samp> or use the stream management/event management routines, to ensure that the random generation kernel has finished execution
                        before the new kernel is launched.
                     </p>
                     <p class="p">Note that it is not valid to pass a host memory pointer to a generator that is running on the device, and it is not valid
                        to pass a device memory pointer to a generator that is running on the CPU. Behavior in these cases is undefined.
                     </p>
                  </div>
                  <div class="topic concept nested1" xml:lang="en-us" id="generator-types"><a name="generator-types" shape="rect">
                        <!-- --></a><h3 class="topictitle3">2.1.&nbsp;Generator Types</h3>
                     <div class="body conbody">
                        <p class="p">Random number generators are created by passing a type to <samp class="ph codeph">curandCreateGenerator()</samp>. There are nine types of random number generators in cuRAND, that fall into two categories. <samp class="ph codeph">CURAND_RNG_PSEUDO_XORWOW</samp>, <samp class="ph codeph">CURAND_RNG_PSEUDO_MRG32K3A</samp>, <samp class="ph codeph">CURAND_RNG_PSEUDO_MTGP32</samp>, <samp class="ph codeph">CURAND_RNG_PSEUDO_PHILOX4_32_10</samp> and <samp class="ph codeph">CURAND_RNG_PSEUDO_MT19937</samp> are pseudorandom number generators. <samp class="ph codeph">CURAND_RNG_PSEUDO_XORWOW</samp> is implemented using the XORWOW algorithm, a member of the xor-shift family of pseudorandom number generators. <samp class="ph codeph">CURAND_RNG_PSEUDO_MRG32K3A</samp> is a member of the Combined Multiple Recursive family of pseudorandom number generators. <samp class="ph codeph">CURAND_RNG_PSEUDO_MT19937</samp> and <samp class="ph codeph">CURAND_RNG_PSEUDO_MTGP32</samp> are members of the Mersenne Twister family of pseudorandom number generators. <samp class="ph codeph">CURAND_RNG_PSEUDO_MTGP32</samp> has parameters customized for operation on the GPU. <samp class="ph codeph">CURAND_RNG_PSEUDO_MT19937</samp> has the same parameters as CPU version, but ordering is different. <samp class="ph codeph">CURNAD_RNG_PSEUDO_MT19937</samp> supports only HOST API and can be used only on architecture sm_35 or higher. <samp class="ph codeph">CURAND_RNG_PHILOX4_32_10</samp> is a member of Philox family, which is one of the three non-cryptographic Counter Based Random Number Generators presented
                           on SC11 conference by D E Shaw Research. There are 4 variants of the basic SOBOL’ quasi random number generator. All of the
                           variants generate sequences in up to 20,000 dimensions. <samp class="ph codeph">CURAND_RNG_QUASI_SOBOL32</samp>, <samp class="ph codeph">CURAND_RNG_QUASI_SCRAMBLED_SOBOL32</samp>, <samp class="ph codeph">CURAND_RNG_QUASI_SOBOL64</samp>, and <samp class="ph codeph">CURAND_RNG_QUASI_SCRAMBLED_SOBOL64</samp> are quasirandom number generator types. <samp class="ph codeph">CURAND_RNG_QUASI_SOBOL32</samp> is a Sobol’ generator of 32-bit sequences. <samp class="ph codeph">CURAND_RNG_QUASI_SCRAMBLED_SOBOL32</samp> is a scrambled Sobol’ generator of 32-bit sequences. <samp class="ph codeph">CURAND_RNG_QUASI_SOBOL64</samp> is a Sobol’ generator of 64-bit sequences. <samp class="ph codeph">CURAND_RNG_QUASI_SCRAMBLED_SOBOL64</samp> is a scrambled Sobol’ generator of 64-bit sequences.
                        </p>
                     </div>
                  </div>
                  <div class="topic concept nested1" xml:lang="en-us" id="generator-options"><a name="generator-options" shape="rect">
                        <!-- --></a><h3 class="topictitle3">2.2.&nbsp;Generator Options</h3>
                     <div class="body conbody">
                        <p class="p">Once created, random number generators can be defined using the general options seed, offset, and order.</p>
                     </div>
                     <div class="topic concept nested2" xml:lang="en-us" id="seed"><a name="seed" shape="rect">
                           <!-- --></a><h4 class="topictitle4">2.2.1.&nbsp;Seed</h4>
                        <div class="body conbody">
                           <p class="p">The seed parameter is a 64-bit integer that initializes the starting state of a pseudorandom number generator. The same seed
                              always produces the same sequence of results.
                           </p>
                        </div>
                     </div>
                     <div class="topic concept nested2" xml:lang="en-us" id="offset"><a name="offset" shape="rect">
                           <!-- --></a><h4 class="topictitle4">2.2.2.&nbsp;Offset</h4>
                        <div class="body conbody">
                           <p class="p">The offset parameter is used to skip ahead in the sequence. If offset = 100, the first random number generated will be the
                              100th in the sequence. This allows multiple runs of the same program to continue generating results from the same sequence
                              without overlap. Note that the skip ahead function is not available for the <samp class="ph codeph">CURAND_RNG_PSEUDO_MTGP32</samp> and <samp class="ph codeph">CURAND_RNG_PSEUDO_MT19937</samp> generators.
                           </p>
                        </div>
                     </div>
                     <div class="topic concept nested2" xml:lang="en-us" id="order"><a name="order" shape="rect">
                           <!-- --></a><h4 class="topictitle4">2.2.3.&nbsp;Order</h4>
                        <div class="body conbody">
                           <p class="p">The order parameter is used to choose how the results are ordered in global memory. There are three ordering choices for pseudorandom
                              sequences: <samp class="ph codeph">CURAND_ORDERING_PSEUDO_DEFAULT</samp>, <samp class="ph codeph">CURAND_ORDERING_PSEUDO_BEST</samp>, and <samp class="ph codeph">CURAND_ORDERING_PSEUDO_SEEDED</samp>. There is one ordering choice for quasirandom numbers, <samp class="ph codeph">CURAND_ORDERING_QUASI_DEFAULT</samp>. The default ordering for pseudorandom number generators is <samp class="ph codeph">CURAND_ORDERING_PSEUDO_DEFAULT</samp>, while the default ordering for quasirandom number generators is <samp class="ph codeph">CURAND_ORDERING_QUASI_DEFAULT</samp>.
                           </p>
                           <p class="p">The two pseudorandom orderings <samp class="ph codeph">CURAND_ORDERING_PSEUDO_DEFAULT</samp> and <samp class="ph codeph">CURAND_ORDERING_PSEUDO_BEST</samp> produce the same output ordering for all pseudo-random generators, except MT19937 for which <samp class="ph codeph">CURAND_ORDERING_BEST</samp> may generate different output on different models of GPUs. Future releases of cuRAND may change the ordering associated with
                              <samp class="ph codeph">CURAND_ORDERING_PSEUDO_BEST</samp> to improve either performance or the quality of the results. It will always be the case that the ordering obtained with <samp class="ph codeph">CURAND_ORDERING_PSEUDO_BEST</samp> is deterministic and is the same for each run of the program. The ordering returned by <samp class="ph codeph">CURAND_ORDERING_PSEUDO_DEFAULT</samp> is guaranteed to remain the same for all cuRAND releases. In the current release, only XORWOW and MT19937 generators have
                              more than one ordering.
                           </p>
                           <p class="p">The behavior of the ordering parameters for each generator type is outlined below:</p>
                           <ul class="ul">
                              <li class="li">
                                 <p class="p">XORWOW pseudorandom generator</p>
                                 <ul class="ul">
                                    <li class="li">
                                       <p class="p"><samp class="ph codeph">CURAND_ORDERING_PSEUDO_BEST</samp></p>
                                       <p class="p">The output ordering of <samp class="ph codeph">CURAND_ORDERING_PSEUDO_BEST</samp> is the same as <samp class="ph codeph">CURAND_ORDERING_PSEUDO_DEFAULT</samp> in the current release.
                                       </p>
                                    </li>
                                    <li class="li">
                                       <p class="p"><samp class="ph codeph">CURAND_ORDERING_PSEUDO_DEFAULT</samp></p>
                                       <p class="p">The result at offset 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mn>n</mn>
                                          </math> in global memory is from position
                                       </p>
                                       <math xmlns="http://www.w3.org/1998/Math/MathML">
                                          <mo stretchy="false">(</mo>
                                          <mi>n</mi>
                                          <mo>mod</mo>
                                          <mn>4096</mn>
                                          <mo stretchy="false">)</mo>
                                          <mo>⋅</mo>
                                          <msup>
                                             <mn>2</mn>
                                             <mn>67</mn>
                                          </msup>
                                          <mo>+</mo>
                                          <mo fence="false" stretchy="false">⌊</mo>
                                          <mi>n</mi>
                                          <mo>/</mo>
                                          <mn>4096</mn>
                                          <mo fence="false" stretchy="false">⌋</mo>
                                       </math>
                                       <p class="p">in the original XORWOW sequence.</p>
                                    </li>
                                    <li class="li">
                                       <p class="p"><samp class="ph codeph">CURAND_ORDERING_PSEUDO_SEEDED</samp></p>
                                       <p class="p">The result at offset 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mn>n</mn>
                                          </math> in global memory is from position 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>n</mi>
                                             <mo>/</mo>
                                             <mn>4096</mn>
                                             <mo fence="false" stretchy="false">⌋</mo>
                                          </math> in the XORWOW sequence seeded with a combination of the user seed and the number 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>n</mi>
                                             <mo>mod</mo>
                                             <mn>4096</mn>
                                          </math>. In other words, each of 4096 threads uses a different seed. This seeding method reduces state setup time but may result
                                          in statistical weaknesses of the pseudorandom output for some user seed values.
                                       </p>
                                    </li>
                                 </ul>
                                 <p class="p">MRG32k3a pseudorandom generator</p>
                                 <ul class="ul">
                                    <li class="li">
                                       <p class="p"><samp class="ph codeph">CURAND_ORDERING_PSEUDO_BEST</samp></p>
                                       <p class="p">The output ordering of <samp class="ph codeph">CURAND_ORDERING_PSEUDO_BEST</samp> is the same as <samp class="ph codeph">CURAND_ORDERING_PSEUDO_DEFAULT</samp> in the current release.
                                       </p>
                                    </li>
                                    <li class="li">
                                       <p class="p"><samp class="ph codeph">CURAND_ORDERING_PSEUDO_DEFAULT</samp></p>
                                       <p class="p">The result at offset 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mn>n</mn>
                                          </math> in global memory is from position
                                       </p>
                                       <math xmlns="http://www.w3.org/1998/Math/MathML">
                                          <mo stretchy="false">(</mo>
                                          <mi>n</mi>
                                          <mo>mod</mo>
                                          <mn>4096</mn>
                                          <mo stretchy="false">)</mo>
                                          <mo>⋅</mo>
                                          <msup>
                                             <mn>2</mn>
                                             <mn>76</mn>
                                          </msup>
                                          <mo>+</mo>
                                          <mo fence="false" stretchy="false">⌊</mo>
                                          <mi>n</mi>
                                          <mo>/</mo>
                                          <mn>4096</mn>
                                          <mo fence="false" stretchy="false">⌋</mo>
                                       </math>
                                       <p class="p">in the original MRG32k3a sequence. (Note that the stride between subsequent samples for MRG32k3a is not the same as for XORWOW)</p>
                                    </li>
                                 </ul>
                                 <p class="p">MTGP32 pseudorandom generator</p>
                                 <ul class="ul">
                                    <li class="li">
                                       <p class="p"><samp class="ph codeph">CURAND_ORDERING_PSEUDO_BEST</samp></p>
                                       <p class="p">The output ordering of <samp class="ph codeph">CURAND_ORDERING_PSEUDO_BEST</samp> is the same as <samp class="ph codeph">CURAND_ORDERING_PSEUDO_DEFAULT</samp> in the current release.
                                       </p>
                                    </li>
                                    <li class="li">
                                       <p class="p"><samp class="ph codeph">CURAND_ORDERING_PSEUDO_DEFAULT</samp></p>
                                       <p class="p">The MTGP32 generator actually generates 64 distinct sequences based on different parameter sets for the basic algorithm. Let
                                          
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>S</mi>
                                             <mo stretchy="false">(</mo>
                                             <mi>p</mi>
                                             <mo stretchy="false">)</mo>
                                          </math> be the sequence for parameter set 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>p</mi>
                                          </math>.
                                       </p>
                                       <p class="p">The result at offset 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mn>n</mn>
                                          </math> in global memory is from position 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>n</mi>
                                             <mo>mod</mo>
                                             <mn>256</mn>
                                          </math> from the sequence
                                       </p>
                                       <math xmlns="http://www.w3.org/1998/Math/MathML">
                                          <mi>S</mi>
                                          <mo stretchy="false">(</mo>
                                          <mo fence="false" stretchy="false">⌊</mo>
                                          <mi>n</mi>
                                          <mo>/</mo>
                                          <mn>256</mn>
                                          <mo fence="false" stretchy="false">⌋</mo>
                                          <mo>mod</mo>
                                          <mn>64</mn>
                                          <mo stretchy="false">)</mo>
                                       </math>
                                       <p class="p">In other words 256 samples from 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>S</mi>
                                             <mo stretchy="false">(</mo>
                                             <mn>0</mn>
                                             <mo stretchy="false">)</mo>
                                          </math> are followed by 256 samples from 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>S</mi>
                                             <mo stretchy="false">(</mo>
                                             <mn>1</mn>
                                             <mo stretchy="false">)</mo>
                                          </math> and so-on, up to 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>S</mi>
                                             <mo stretchy="false">(</mo>
                                             <mn>63</mn>
                                             <mo stretchy="false">)</mo>
                                          </math>. This pattern repeats, so the subsequent 256 samples are from 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>S</mi>
                                             <mo stretchy="false">(</mo>
                                             <mn>0</mn>
                                             <mo stretchy="false">)</mo>
                                          </math>, followed by 256 samples from 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>S</mi>
                                             <mo stretchy="false">(</mo>
                                             <mn>1</mn>
                                             <mo stretchy="false">)</mo>
                                          </math>, ands so on.
                                       </p>
                                    </li>
                                 </ul>
                                 <p class="p">MT19937 pseudorandom generator</p>
                                 <ul class="ul">
                                    <li class="li">
                                       <p class="p"><samp class="ph codeph">CURAND_ORDERING_PSEUDO_DEFAULT</samp></p>
                                       <p class="p">Ordering is based heavily on the standard MT19937 CPU implementation. Output is generated by 8192 independent generators.
                                          Each generator generates consecutive subsequence of the original sequence. Length of each subsequence is 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <msup>
                                                <mn>2</mn>
                                                <mn>1000</mn>
                                             </msup>
                                          </math>. Random numbers are generated by eights thus first 8 elements come from first subsequence, next 8 elements come form second
                                          subsequence and so on.
                                          Results are permuted differently than originally to achieve higher performance. Ordering is independent of the hardware that
                                          you are using. For more information please see <a class="xref" href="bibliography.html#bibliography__tredak" shape="rect">[18]</a>.
                                          
                                       </p>
                                    </li>
                                    <li class="li">
                                       <p class="p"><samp class="ph codeph">CURAND_ORDERING_PSEUDO_BEST</samp></p>
                                       <p class="p">The output ordering of <samp class="ph codeph">CURAND_ORDERING_PSEUDO_BEST</samp> to achieve better performance depends on number of SMs that composed your GPU. Random numbers are generated in the same way
                                          as with <samp class="ph codeph">CURAND_ORDERING_PSEUDO_DEFAULT</samp> but the number of generators may be different to achieve better performance. Generating seeds is much faster using this ordering.
                                       </p>
                                    </li>
                                 </ul>
                                 <p class="p">Philox_4x32_10 pseudorandom generator</p>
                                 <ul class="ul">
                                    <li class="li">
                                       <p class="p"><samp class="ph codeph">CURAND_ORDERING_PSEUDO_BEST</samp></p>
                                       <p class="p">The output ordering of <samp class="ph codeph">CURAND_ORDERING_PSEUDO_BEST</samp> is the same as <samp class="ph codeph">CURAND_ORDERING_PSEUDO_DEFAULT</samp> in the current release.
                                       </p>
                                    </li>
                                    <li class="li">
                                       <p class="p"><samp class="ph codeph">CURAND_ORDERING_PSEUDO_DEFAULT</samp></p>
                                       <p class="p">Each thread in Philox_4x32_10 generator generates distinct sequences based on different parameter sets for the basic algorithm.
                                          In host API there are 8192 different sequences. Each four values from one sequence are followed by four values from next sequence.
                                       </p>
                                    </li>
                                 </ul>
                                 <p class="p">32 and 64 bit SOBOL and Scrambled SOBOL quasirandom generators</p>
                                 <ul class="ul">
                                    <li class="li">
                                       <p class="p"><samp class="ph codeph">CURAND_ORDERING_QUASI_DEFAULT</samp></p>
                                       <p class="p">When generating 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>n</mi>
                                          </math> results in 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>d</mi>
                                          </math> dimensions, the output will consist of 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>n</mi>
                                             <mo>/</mo>
                                             <mi>d</mi>
                                          </math> results from dimension 1, followed by 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>n</mi>
                                             <mo>/</mo>
                                             <mi>d</mi>
                                          </math> results from dimension 2, and so on up to dimension 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>d</mi>
                                          </math>. Only exact multiples of the dimension size may be generated. The dimension parameter 
                                          <math xmlns="http://www.w3.org/1998/Math/MathML">
                                             <mi>d</mi>
                                          </math> is set with <samp class="ph codeph">curandSetQuasiRandomGeneratorDimensions()</samp> and defaults to 1.
                                       </p>
                                    </li>
                                 </ul>
                              </li>
                           </ul>
                        </div>
                     </div>
                  </div>
                  <div class="topic concept nested1" xml:lang="en-us" id="return-values"><a name="return-values" shape="rect">
                        <!-- --></a><h3 class="topictitle3">2.3.&nbsp;Return Values</h3>
                     <div class="body conbody">
                        <p class="p">All cuRAND host library calls have a return value of <samp class="ph codeph">curandStatus_t</samp>. Calls that succeed without errors return <samp class="ph codeph">CURAND_STATUS_SUCCESS</samp>. If errors occur, other values are returned depending on the error. Because CUDA allows kernels to execute asynchronously
                           from CPU code, it is possible that errors in a non-cuRAND kernel will be detected during a call to a library function. In
                           this case, <samp class="ph codeph">CURAND_STATUS_PREEXISTING_ERROR</samp> is returned.
                        </p>
                     </div>
                  </div>
                  <div class="topic concept nested1" xml:lang="en-us" id="generation-functions"><a name="generation-functions" shape="rect">
                        <!-- --></a><h3 class="topictitle3">2.4.&nbsp;Generation Functions</h3>
                     <div class="body conbody"><pre xml:space="preserve">
curandStatus_t 
curandGenerate(
    curandGenerator_t generator, 
    unsigned int *outputPtr, size_t num)
</pre><p class="p">The <samp class="ph codeph">curandGenerate()</samp> function is used to generate pseudo- or quasirandom bits of output. For XORWOW, MRG32k3a, MTGP32, MT19937, Philox_4x32_10
                           and SOBOL32 generators, each output element is a 32-bit unsigned int where all bits are random. For SOBOL64 generators, each
                           output element is a 64-bit unsigned long long where all bits are random.
                        </p><pre xml:space="preserve">
curandStatus_t 
curandGenerateUniform(
    curandGenerator_t generator, 
    float *outputPtr, size_t num)
</pre><p class="p">The <samp class="ph codeph">curandGenerateUniform()</samp> function is used to generate uniformly distributed floating point values between 0.0 and 1.0, where 0.0 is excluded and 1.0
                           is included.
                        </p><pre xml:space="preserve">
curandStatus_t 
curandGenerateNormal(
    curandGenerator_t generator, 
    float *outputPtr, size_t n, 
    float mean, float stddev)
</pre><p class="p">The <samp class="ph codeph">curandGenerateNormal()</samp> function is used to generate normally distributed floating point values with the given mean and standard deviation.
                        </p><pre xml:space="preserve">
curandStatus_t 
curandGenerateLogNormal(
    curandGenerator_t generator, 
    float *outputPtr, size_t n, 
    float mean, float stddev)
</pre><p class="p">The <samp class="ph codeph">curandGenerateLogNormal()</samp> function is used to generate log-normally distributed floating point values based on a normal distribution with the given
                           mean and standard deviation.
                        </p><pre xml:space="preserve">
curandStatus_t 
curandGeneratePoisson(
    curandGenerator_t generator, 
    unsigned int *outputPtr, size_t n, 
    double lambda)
</pre><p class="p">The <samp class="ph codeph">curandGeneratePoisson()</samp> function is used to generate Poisson-distributed integer values based on a Poisson distribution with the given lambda.
                        </p><pre xml:space="preserve">
curandStatus_t
curandGenerateUniformDouble(
    curandGenerator_t generator, 
    double *outputPtr, size_t num)
</pre><p class="p">The <samp class="ph codeph">curandGenerateUniformDouble()</samp> function generates uniformly distributed random numbers in double precision.
                        </p><pre xml:space="preserve">
curandStatus_t
curandGenerateNormalDouble(
    curandGenerator_t generator,
    double *outputPtr, size_t n, 
    double mean, double stddev)
</pre><p class="p"><samp class="ph codeph">curandGenerateNormalDouble()</samp> generates normally distributed results in double precision with the given mean and standard deviation. Double precision results
                           can only be generated on devices of compute capability 1.3 or above, and the host.
                        </p><pre xml:space="preserve">
curandStatus_t
curandGenerateLogNormalDouble(
    curandGenerator_t generator,
    double *outputPtr, size_t n, 
    double mean, double stddev)
</pre><p class="p"><samp class="ph codeph">curandGenerateLogNormalDouble()</samp> generates log-normally distributed results in double precision, based on a normal distribution with the given mean and standard
                           deviation.
                        </p>
                        <p class="p">For quasirandom generation, the number of results returned must be a multiple of the dimension of the generator.</p>
                        <p class="p">Generation functions can be called multiple times on the same generator to generate successive blocks of results. For pseudorandom
                           generators, multiple calls to generation functions will yield the same result as a single call with a large size. For quasirandom
                           generators, because of the ordering of dimensions in memory, many shorter calls will not produce the same results in memory
                           as one larger call; however the generated 
                           <math xmlns="http://www.w3.org/1998/Math/MathML">
                              <mi>n</mi>
                           </math>-dimensional vectors will be the same.
                        </p>
                        <p class="p">Double precision results can only be generated on devices of compute capability 1.3 or above, and the host.</p>
                     </div>
                  </div>
                  <div class="topic concept nested1" xml:lang="en-us" id="host-api-example"><a name="host-api-example" shape="rect">
                        <!-- --></a><h3 class="topictitle3">2.5.&nbsp;Host API Example</h3>
                     <div class="body conbody"><pre xml:space="preserve">

/*
 * This program uses the host CURAND API to generate 100 
 * pseudorandom floats.
 */
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;cuda.h&gt;
#include &lt;curand.h&gt;

#define CUDA_CALL(x) do { if((x)!=cudaSuccess) { \
    printf("Error at %s:%d\n",__FILE__,__LINE__);\
    return EXIT_FAILURE;}} while(0)
#define CURAND_CALL(x) do { if((x)!=CURAND_STATUS_SUCCESS) { \
    printf("Error at %s:%d\n",__FILE__,__LINE__);\
    return EXIT_FAILURE;}} while(0)

int main(int argc, char *argv[])
{
    size_t n = 100;
    size_t i;
    curandGenerator_t gen;
    float *devData, *hostData;

    /* Allocate n floats on host */
    hostData = (float *)calloc(n, sizeof(float));

    /* Allocate n floats on device */
    CUDA_CALL(cudaMalloc((void **)&amp;devData, n*sizeof(float)));

    /* Create pseudo-random number generator */
    CURAND_CALL(curandCreateGenerator(&amp;gen, 
                CURAND_RNG_PSEUDO_DEFAULT));
    
    /* Set seed */
    CURAND_CALL(curandSetPseudoRandomGeneratorSeed(gen, 
                1234ULL));

    /* Generate n floats on device */
    CURAND_CALL(curandGenerateUniform(gen, devData, n));

    /* Copy device memory to host */
    CUDA_CALL(cudaMemcpy(hostData, devData, n * sizeof(float),
        cudaMemcpyDeviceToHost));

    /* Show result */
    for(i = 0; i &lt; n; i++) {
        printf("%1.4f ", hostData[i]);
    }
    printf("\n");

    /* Cleanup */
    CURAND_CALL(curandDestroyGenerator(gen));
    CUDA_CALL(cudaFree(devData));
    free(hostData);    
    return EXIT_SUCCESS;
}

</pre></div>
                  </div>
                  <div class="topic concept nested1" xml:lang="en-us" id="static-library"><a name="static-library" shape="rect">
                        <!-- --></a><h3 class="topictitle3">2.6.&nbsp;Static Library support</h3>
                     <div class="body conbody">
                        <p class="p">Starting with release 6.5, the cuRAND Library is also delivered in a static form as libcurand_static.a on Linux and Mac and
                           as curand_static.lib on Windows.
                           The static cuRAND library depends on a common thread abstraction layer library called libcuos.a on Linux and Mac and cuos.lib
                           on Windows.
                           
                        </p>
                        <p class="p">For example, on Linux, to compile a small application using cuRAND, against the dynamic library, the following command can
                           be used:
                        </p><pre xml:space="preserve">
    gcc myCurandApp.c    -lcurand  -o myCurandApp 
</pre><p class="p">Whereas to compile against the static cuRAND library, the following command has to be used: </p><pre xml:space="preserve">     
    gcc myCurandApp.c    libcurand_static.a   libcuos.a  -o myCurandApp
</pre></div>
                  </div>
                  <div class="topic concept nested1" xml:lang="en-us" id="performance-notes2"><a name="performance-notes2" shape="rect">
                        <!-- --></a><h3 class="topictitle3">2.7.&nbsp;Performance Notes</h3>
                     <div class="body conbody">
                        <p class="p">In general you will get the best performance from the cuRAND library by generating blocks of random numbers that are as large
                           as possible. Fewer calls to generate many random numbers is more efficient than many calls generating only a few random numbers.
                           The default pseudorandom generator, XORWOW, with the default ordering takes some time to setup the first time it is called.
                           Subsequent generation calls do not require this setup. To avoid this setup time, use the <samp class="ph codeph">CURAND_ORDERING_PSEUDO_SEEDED</samp> ordering.
                        </p>
                        <p class="p">The MTGP32 Mersenne Twister algorithm is closely tied to the thread and block count. The state structure for MTGP32 actually
                           contains the state for 256 consecutive samples from a given sequence, as determined by a specific parameter set. Each of 64
                           blocks uses a different parameter set and each of 256 threads generates one sample from the state, and updates the state.
                           Hence the most efficient use of MTGP32 is to generate a multiple of 16384 samples.
                        </p>
                        <p class="p">The MT19937 algorithm performance depends on number of samples generated during the single call. Peak performance can be achieved
                           while generating more than 2GB of data, but 80% of peak performance can be achieved while generating only 80MB. Please see
                           <a class="xref" href="bibliography.html#bibliography__tredak" shape="rect">[18]</a> for reference.
                        </p>
                        <p class="p">The Philox_4x32_10 algorithm is closely tied to the thread and block count. Each thread computes 4 random numbers in the same
                           time thus the most efficient use of Philox_4x32_10 is to generate a multiple of 4 times number of threads. 
                        </p>
                     </div>
                  </div>
               </div>
               
               <hr id="contents-end"></hr>
               
            </article>
         </div>
      </div>
      <script language="JavaScript" type="text/javascript" charset="utf-8" src="../common/formatting/common.min.js"></script>
      <script language="JavaScript" type="text/javascript" charset="utf-8" src="../common/scripts/google-analytics/google-analytics-write.js"></script>
      <script language="JavaScript" type="text/javascript" charset="utf-8" src="../common/scripts/google-analytics/google-analytics-tracker.js"></script>
      <script type="text/javascript">var switchTo5x=true;</script><script type="text/javascript" src="http://w.sharethis.com/button/buttons.js"></script><script type="text/javascript">stLight.options({publisher: "998dc202-a267-4d8e-bce9-14debadb8d92", doNotHash: false, doNotCopy: false, hashAddressBar: false});</script></body>
</html>