<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Tutorial Introduction — PyCUDA 2012.1 documentation</title> <link rel="stylesheet" href="_static/default.css" type="text/css" /> <link rel="stylesheet" href="_static/pygments.css" type="text/css" /> <script type="text/javascript"> var DOCUMENTATION_OPTIONS = { URL_ROOT: '', VERSION: '2012.1', COLLAPSE_INDEX: false, FILE_SUFFIX: '.html', HAS_SOURCE: true }; </script> <script type="text/javascript" src="_static/jquery.js"></script> <script type="text/javascript" src="_static/underscore.js"></script> <script type="text/javascript" src="_static/doctools.js"></script> <link rel="top" title="PyCUDA 2012.1 documentation" href="index.html" /> <link rel="next" title="Device Interface Reference Documentation" href="driver.html" /> <link rel="prev" title="Installation" href="install.html" /> </head> <body> <div class="related"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="genindex.html" title="General Index" accesskey="I">index</a></li> <li class="right" > <a href="py-modindex.html" title="Python Module Index" >modules</a> |</li> <li class="right" > <a href="driver.html" title="Device Interface Reference Documentation" accesskey="N">next</a> |</li> <li class="right" > <a href="install.html" title="Installation" accesskey="P">previous</a> |</li> <li><a href="index.html">PyCUDA 2012.1 documentation</a> »</li> </ul> </div> <div class="document"> <div class="documentwrapper"> <div class="bodywrapper"> <div class="body"> <div class="section" id="tutorial-introduction"> <h1>Tutorial Introduction<a class="headerlink" href="#tutorial-introduction" title="Permalink to this headline">¶</a></h1> <div class="section" id="getting-started"> <h2>Getting started<a class="headerlink" href="#getting-started" title="Permalink to this headline">¶</a></h2> <p>Before you can use PyCuda, you have to import and initialize it:</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">pycuda.driver</span> <span class="kn">as</span> <span class="nn">cuda</span> <span class="kn">import</span> <span class="nn">pycuda.autoinit</span> <span class="kn">from</span> <span class="nn">pycuda.compiler</span> <span class="kn">import</span> <span class="n">SourceModule</span> </pre></div> </div> <p>Note that you do not <em>have</em> to use <a class="reference internal" href="util.html#module-pycuda.autoinit" title="pycuda.autoinit"><tt class="xref py py-mod docutils literal"><span class="pre">pycuda.autoinit</span></tt></a>– initialization, context creation, and cleanup can also be performed manually, if desired.</p> </div> <div class="section" id="transferring-data"> <h2>Transferring Data<a class="headerlink" href="#transferring-data" title="Permalink to this headline">¶</a></h2> <p>The next step in most programs is to transfer data onto the device. In PyCuda, you will mostly transfer data from <a class="reference external" href="http://docs.scipy.org/doc/numpy/reference/index.html#numpy" title="(in NumPy v1.8)"><tt class="xref py py-mod docutils literal"><span class="pre">numpy</span></tt></a> arrays on the host. (But indeed, everything that satisfies the Python buffer interface will work, even a <a class="reference external" href="http://docs.python.org/dev/library/stdtypes.html#str" title="(in Python v3.4)"><tt class="xref py py-class docutils literal"><span class="pre">str</span></tt></a>.) Let’s make a 4x4 array of random numbers:</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="n">a</span> <span class="o">=</span> <span class="n">numpy</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span><span class="mi">4</span><span class="p">)</span> </pre></div> </div> <p>But wait–<em>a</em> consists of double precision numbers, but most nVidia devices only support single precision:</p> <div class="highlight-python"><div class="highlight"><pre><span class="n">a</span> <span class="o">=</span> <span class="n">a</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">numpy</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span> </pre></div> </div> <p>Finally, we need somewhere to transfer data to, so we need to allocate memory on the device:</p> <div class="highlight-python"><div class="highlight"><pre><span class="n">a_gpu</span> <span class="o">=</span> <span class="n">cuda</span><span class="o">.</span><span class="n">mem_alloc</span><span class="p">(</span><span class="n">a</span><span class="o">.</span><span class="n">nbytes</span><span class="p">)</span> </pre></div> </div> <p>As a last step, we need to transfer the data to the GPU:</p> <div class="highlight-python"><div class="highlight"><pre><span class="n">cuda</span><span class="o">.</span><span class="n">memcpy_htod</span><span class="p">(</span><span class="n">a_gpu</span><span class="p">,</span> <span class="n">a</span><span class="p">)</span> </pre></div> </div> </div> <div class="section" id="executing-a-kernel"> <h2>Executing a Kernel<a class="headerlink" href="#executing-a-kernel" title="Permalink to this headline">¶</a></h2> <p>For this tutorial, we’ll stick to something simple: We will write code to double each entry in <em>a_gpu</em>. To this end, we write the corresponding CUDA C code, and feed it into the constructor of a <a class="reference internal" href="driver.html#pycuda.compiler.SourceModule" title="pycuda.compiler.SourceModule"><tt class="xref py py-class docutils literal"><span class="pre">pycuda.compiler.SourceModule</span></tt></a>:</p> <div class="highlight-python"><div class="highlight"><pre><span class="n">mod</span> <span class="o">=</span> <span class="n">SourceModule</span><span class="p">(</span><span class="s">"""</span> <span class="s"> __global__ void doublify(float *a)</span> <span class="s"> {</span> <span class="s"> int idx = threadIdx.x + threadIdx.y*4;</span> <span class="s"> a[idx] *= 2;</span> <span class="s"> }</span> <span class="s"> """</span><span class="p">)</span> </pre></div> </div> <p>If there aren’t any errors, the code is now compiled and loaded onto the device. We find a reference to our <a class="reference internal" href="driver.html#pycuda.driver.Function" title="pycuda.driver.Function"><tt class="xref py py-class docutils literal"><span class="pre">pycuda.driver.Function</span></tt></a> and call it, specifying <em>a_gpu</em> as the argument, and a block size of 4x4:</p> <div class="highlight-python"><div class="highlight"><pre><span class="n">func</span> <span class="o">=</span> <span class="n">mod</span><span class="o">.</span><span class="n">get_function</span><span class="p">(</span><span class="s">"doublify"</span><span class="p">)</span> <span class="n">func</span><span class="p">(</span><span class="n">a_gpu</span><span class="p">,</span> <span class="n">block</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span> </pre></div> </div> <p>Finally, we fetch the data back from the GPU and display it, together with the original <em>a</em>:</p> <div class="highlight-python"><div class="highlight"><pre><span class="n">a_doubled</span> <span class="o">=</span> <span class="n">numpy</span><span class="o">.</span><span class="n">empty_like</span><span class="p">(</span><span class="n">a</span><span class="p">)</span> <span class="n">cuda</span><span class="o">.</span><span class="n">memcpy_dtoh</span><span class="p">(</span><span class="n">a_doubled</span><span class="p">,</span> <span class="n">a_gpu</span><span class="p">)</span> <span class="k">print</span> <span class="n">a_doubled</span> <span class="k">print</span> <span class="n">a</span> </pre></div> </div> <p>This will print something like this:</p> <div class="highlight-python"><pre>[[ 0.51360393 1.40589952 2.25009012 3.02563429] [-0.75841576 -1.18757617 2.72269917 3.12156057] [ 0.28826082 -2.92448163 1.21624792 2.86353827] [ 1.57651746 0.63500965 2.21570683 -0.44537592]] [[ 0.25680196 0.70294976 1.12504506 1.51281714] [-0.37920788 -0.59378809 1.36134958 1.56078029] [ 0.14413041 -1.46224082 0.60812396 1.43176913] [ 0.78825873 0.31750482 1.10785341 -0.22268796]]</pre> </div> <p>It worked! That completes our walkthrough. Thankfully, PyCuda takes over from here and does all the cleanup for you, so you’re done. Stick around for some bonus material in the next section, though.</p> <p>(You can find the code for this demo as <tt class="file docutils literal"><span class="pre">examples/demo.py</span></tt> in the PyCuda source distribution.)</p> <div class="section" id="shortcuts-for-explicit-memory-copies"> <h3>Shortcuts for Explicit Memory Copies<a class="headerlink" href="#shortcuts-for-explicit-memory-copies" title="Permalink to this headline">¶</a></h3> <p>The <a class="reference internal" href="driver.html#pycuda.driver.In" title="pycuda.driver.In"><tt class="xref py py-class docutils literal"><span class="pre">pycuda.driver.In</span></tt></a>, <a class="reference internal" href="driver.html#pycuda.driver.Out" title="pycuda.driver.Out"><tt class="xref py py-class docutils literal"><span class="pre">pycuda.driver.Out</span></tt></a>, and <a class="reference internal" href="driver.html#pycuda.driver.InOut" title="pycuda.driver.InOut"><tt class="xref py py-class docutils literal"><span class="pre">pycuda.driver.InOut</span></tt></a> argument handlers can simplify some of the memory transfers. For example, instead of creating <em>a_gpu</em>, if replacing <em>a</em> is fine, the following code can be used:</p> <div class="highlight-python"><div class="highlight"><pre><span class="n">func</span><span class="p">(</span><span class="n">cuda</span><span class="o">.</span><span class="n">InOut</span><span class="p">(</span><span class="n">a</span><span class="p">),</span> <span class="n">block</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span> </pre></div> </div> </div> <div class="section" id="prepared-invocations"> <h3>Prepared Invocations<a class="headerlink" href="#prepared-invocations" title="Permalink to this headline">¶</a></h3> <p>Function invocation using the built-in <a class="reference internal" href="driver.html#pycuda.driver.Function.__call__" title="pycuda.driver.Function.__call__"><tt class="xref py py-meth docutils literal"><span class="pre">pycuda.driver.Function.__call__()</span></tt></a> method incurs overhead for type identification (see <a class="reference internal" href="driver.html#reference-doc"><em>Device Interface Reference Documentation</em></a>). To achieve the same effect as above without this overhead, the function is bound to argument types (as designated by Python’s standard library <a class="reference external" href="http://docs.python.org/dev/library/struct.html#struct" title="(in Python v3.4)"><tt class="xref py py-mod docutils literal"><span class="pre">struct</span></tt></a> module), and then called. This also avoids having to assign explicit argument sizes using the <cite>numpy.number</cite> classes:</p> <div class="highlight-python"><div class="highlight"><pre><span class="n">func</span><span class="o">.</span><span class="n">prepare</span><span class="p">(</span><span class="s">"P"</span><span class="p">,</span> <span class="n">block</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span> <span class="n">func</span><span class="o">.</span><span class="n">prepared_call</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">a_gpu</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="section" id="bonus-abstracting-away-the-complications"> <h2>Bonus: Abstracting Away the Complications<a class="headerlink" href="#bonus-abstracting-away-the-complications" title="Permalink to this headline">¶</a></h2> <p>Using a <a class="reference internal" href="array.html#pycuda.gpuarray.GPUArray" title="pycuda.gpuarray.GPUArray"><tt class="xref py py-class docutils literal"><span class="pre">pycuda.gpuarray.GPUArray</span></tt></a>, the same effect can be achieved with much less writing:</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">pycuda.gpuarray</span> <span class="kn">as</span> <span class="nn">gpuarray</span> <span class="kn">import</span> <span class="nn">pycuda.driver</span> <span class="kn">as</span> <span class="nn">cuda</span> <span class="kn">import</span> <span class="nn">pycuda.autoinit</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="n">a_gpu</span> <span class="o">=</span> <span class="n">gpuarray</span><span class="o">.</span><span class="n">to_gpu</span><span class="p">(</span><span class="n">numpy</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span><span class="mi">4</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">numpy</span><span class="o">.</span><span class="n">float32</span><span class="p">))</span> <span class="n">a_doubled</span> <span class="o">=</span> <span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">a_gpu</span><span class="p">)</span><span class="o">.</span><span class="n">get</span><span class="p">()</span> <span class="k">print</span> <span class="n">a_doubled</span> <span class="k">print</span> <span class="n">a_gpu</span> </pre></div> </div> </div> <div class="section" id="advanced-topics"> <h2>Advanced Topics<a class="headerlink" href="#advanced-topics" title="Permalink to this headline">¶</a></h2> <div class="section" id="structures"> <h3>Structures<a class="headerlink" href="#structures" title="Permalink to this headline">¶</a></h3> <p>(contributed by Nicholas Tung, find the code in <tt class="file docutils literal"><span class="pre">examples/demo_struct.py</span></tt>)</p> <p>Suppose we have the following structure, for doubling a number of variable length arrays:</p> <div class="highlight-python"><div class="highlight"><pre><span class="n">mod</span> <span class="o">=</span> <span class="n">SourceModule</span><span class="p">(</span><span class="s">"""</span> <span class="s"> struct DoubleOperation {</span> <span class="s"> int datalen, __padding; // so 64-bit ptrs can be aligned</span> <span class="s"> float *ptr;</span> <span class="s"> };</span> <span class="s"> __global__ void double_array(DoubleOperation *a) {</span> <span class="s"> a = &a[blockIdx.x];</span> <span class="s"> for (int idx = threadIdx.x; idx < a->datalen; idx += blockDim.x) {</span> <span class="s"> a->ptr[idx] *= 2;</span> <span class="s"> }</span> <span class="s"> }</span> <span class="s"> """</span><span class="p">)</span> </pre></div> </div> <p>Each block in the grid (see CUDA documentation) will double one of the arrays. The <cite>for</cite> loop allows for more data elements than threads to be doubled, though is not efficient if one can guarantee that there will be a sufficient number of threads. Next, a wrapper class for the structure is created, and two arrays are instantiated:</p> <div class="highlight-python"><div class="highlight"><pre><span class="k">class</span> <span class="nc">DoubleOpStruct</span><span class="p">:</span> <span class="n">mem_size</span> <span class="o">=</span> <span class="mi">8</span> <span class="o">+</span> <span class="n">numpy</span><span class="o">.</span><span class="n">intp</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">nbytes</span> <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">array</span><span class="p">,</span> <span class="n">struct_arr_ptr</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">cuda</span><span class="o">.</span><span class="n">to_device</span><span class="p">(</span><span class="n">array</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">dtype</span> <span class="o">=</span> <span class="n">array</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">array</span><span class="o">.</span><span class="n">dtype</span> <span class="n">cuda</span><span class="o">.</span><span class="n">memcpy_htod</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">struct_arr_ptr</span><span class="p">),</span> <span class="n">numpy</span><span class="o">.</span><span class="n">int32</span><span class="p">(</span><span class="n">array</span><span class="o">.</span><span class="n">size</span><span class="p">))</span> <span class="n">cuda</span><span class="o">.</span><span class="n">memcpy_htod</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">struct_arr_ptr</span><span class="p">)</span> <span class="o">+</span> <span class="mi">8</span><span class="p">,</span> <span class="n">numpy</span><span class="o">.</span><span class="n">intp</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="p">)))</span> <span class="k">def</span> <span class="nf">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="nb">str</span><span class="p">(</span><span class="n">cuda</span><span class="o">.</span><span class="n">from_device</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">dtype</span><span class="p">))</span> <span class="n">struct_arr</span> <span class="o">=</span> <span class="n">cuda</span><span class="o">.</span><span class="n">mem_alloc</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">DoubleOpStruct</span><span class="o">.</span><span class="n">mem_size</span><span class="p">)</span> <span class="n">do2_ptr</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">struct_arr</span><span class="p">)</span> <span class="o">+</span> <span class="n">DoubleOpStruct</span><span class="o">.</span><span class="n">mem_size</span> <span class="n">array1</span> <span class="o">=</span> <span class="n">DoubleOpStruct</span><span class="p">(</span><span class="n">numpy</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">numpy</span><span class="o">.</span><span class="n">float32</span><span class="p">),</span> <span class="n">struct_arr</span><span class="p">)</span> <span class="n">array2</span> <span class="o">=</span> <span class="n">DoubleOpStruct</span><span class="p">(</span><span class="n">numpy</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">numpy</span><span class="o">.</span><span class="n">float32</span><span class="p">),</span> <span class="n">do2_ptr</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s">"original arrays"</span><span class="p">,</span> <span class="n">array1</span><span class="p">,</span> <span class="n">array2</span><span class="p">)</span> </pre></div> </div> <p>This code uses the <a class="reference internal" href="driver.html#pycuda.driver.to_device" title="pycuda.driver.to_device"><tt class="xref py py-func docutils literal"><span class="pre">pycuda.driver.to_device()</span></tt></a> and <a class="reference internal" href="driver.html#pycuda.driver.from_device" title="pycuda.driver.from_device"><tt class="xref py py-func docutils literal"><span class="pre">pycuda.driver.from_device()</span></tt></a> functions to allocate and copy values, and demonstrates how offsets to an allocated block of memory can be used. Finally, the code can be executed; the following demonstrates doubling both arrays, then only the second:</p> <div class="highlight-python"><div class="highlight"><pre><span class="n">func</span> <span class="o">=</span> <span class="n">mod</span><span class="o">.</span><span class="n">get_function</span><span class="p">(</span><span class="s">"double_array"</span><span class="p">)</span> <span class="n">func</span><span class="p">(</span><span class="n">struct_arr</span><span class="p">,</span> <span class="n">block</span> <span class="o">=</span> <span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">grid</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span> <span class="k">print</span><span class="p">(</span><span class="s">"doubled arrays"</span><span class="p">,</span> <span class="n">array1</span><span class="p">,</span> <span class="n">array2</span><span class="p">)</span> <span class="n">func</span><span class="p">(</span><span class="n">numpy</span><span class="o">.</span><span class="n">intp</span><span class="p">(</span><span class="n">do2_ptr</span><span class="p">),</span> <span class="n">block</span> <span class="o">=</span> <span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">grid</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span> <span class="k">print</span><span class="p">(</span><span class="s">"doubled second only"</span><span class="p">,</span> <span class="n">array1</span><span class="p">,</span> <span class="n">array2</span><span class="p">,</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="section" id="where-to-go-from-here"> <h2>Where to go from here<a class="headerlink" href="#where-to-go-from-here" title="Permalink to this headline">¶</a></h2> <p>Once you feel sufficiently familiar with the basics, feel free to dig into the <a class="reference internal" href="driver.html#reference-doc"><em>Device Interface Reference Documentation</em></a>. For more examples, check the in the <tt class="file docutils literal"><span class="pre">examples/</span></tt> subdirectory of the distribution. This folder also contains several benchmarks to see the difference between GPU and CPU based calculations. As a reference for how stuff is done, PyCuda’s test suite in the <tt class="file docutils literal"><span class="pre">test/</span></tt> subdirectory of the distribution may also be of help.</p> </div> </div> </div> </div> </div> <div class="sphinxsidebar"> <div class="sphinxsidebarwrapper"> <h3><a href="index.html">Table Of Contents</a></h3> <ul> <li><a class="reference internal" href="#">Tutorial Introduction</a><ul> <li><a class="reference internal" href="#getting-started">Getting started</a></li> <li><a class="reference internal" href="#transferring-data">Transferring Data</a></li> <li><a class="reference internal" href="#executing-a-kernel">Executing a Kernel</a><ul> <li><a class="reference internal" href="#shortcuts-for-explicit-memory-copies">Shortcuts for Explicit Memory Copies</a></li> <li><a class="reference internal" href="#prepared-invocations">Prepared Invocations</a></li> </ul> </li> <li><a class="reference internal" href="#bonus-abstracting-away-the-complications">Bonus: Abstracting Away the Complications</a></li> <li><a class="reference internal" href="#advanced-topics">Advanced Topics</a><ul> <li><a class="reference internal" href="#structures">Structures</a></li> </ul> </li> <li><a class="reference internal" href="#where-to-go-from-here">Where to go from here</a></li> </ul> </li> </ul> <h4>Previous topic</h4> <p class="topless"><a href="install.html" title="previous chapter">Installation</a></p> <h4>Next topic</h4> <p class="topless"><a href="driver.html" title="next chapter">Device Interface Reference Documentation</a></p> <h3>This Page</h3> <ul class="this-page-menu"> <li><a href="_sources/tutorial.txt" rel="nofollow">Show Source</a></li> </ul> <div id="searchbox" style="display: none"> <h3>Quick search</h3> <form class="search" action="search.html" method="get"> <input type="text" name="q" /> <input type="submit" value="Go" /> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> <p class="searchtip" style="font-size: 90%"> Enter search terms or a module, class or function name. </p> </div> <script type="text/javascript">$('#searchbox').show(0);</script> </div> </div> <div class="clearer"></div> </div> <div class="related"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="genindex.html" title="General Index" >index</a></li> <li class="right" > <a href="py-modindex.html" title="Python Module Index" >modules</a> |</li> <li class="right" > <a href="driver.html" title="Device Interface Reference Documentation" >next</a> |</li> <li class="right" > <a href="install.html" title="Installation" >previous</a> |</li> <li><a href="index.html">PyCUDA 2012.1 documentation</a> »</li> </ul> </div> <div class="footer"> © Copyright 2008, Andreas Kloeckner. Last updated on Dec 09, 2013. Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.1.3. </div> </body> </html>