Sophie

Sophie

distrib > Mageia > 4 > x86_64 > by-pkgid > b0aa6cd23b567cd0e312b072b2e3b0bf > files > 1089

nvidia-cuda-toolkit-devel-5.5.22-2.mga4.nonfree.x86_64.rpm

<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-us" xml:lang="en-us">
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta>
      <meta http-equiv="X-UA-Compatible" content="IE=edge"></meta>
      <meta name="copyright" content="(C) Copyright 2005"></meta>
      <meta name="DC.rights.owner" content="(C) Copyright 2005"></meta>
      <meta name="DC.Type" content="concept"></meta>
      <meta name="DC.Title" content="CUDA-GDB"></meta>
      <meta name="abstract" content="The user manual for CUDA-GDB, the NVIDIA tool for debugging CUDA applications on Linux and Mac systems."></meta>
      <meta name="description" content="The user manual for CUDA-GDB, the NVIDIA tool for debugging CUDA applications on Linux and Mac systems."></meta>
      <meta name="DC.Coverage" content="Tools"></meta>
      <meta name="DC.subject" content="CUDA GDB, CUDA GDB debug, CUDA GDB installation, CUDA GDB compiling, CUDA GDB extensions, CUDA GDB kernel focus, CUDA GDB program execution, CUDA GDB breakpoints, CUDA GDB inspection, CUDA GDB memory errors"></meta>
      <meta name="keywords" content="CUDA GDB, CUDA GDB debug, CUDA GDB installation, CUDA GDB compiling, CUDA GDB extensions, CUDA GDB kernel focus, CUDA GDB program execution, CUDA GDB breakpoints, CUDA GDB inspection, CUDA GDB memory errors"></meta>
      <meta name="DC.Format" content="XHTML"></meta>
      <meta name="DC.Identifier" content="abstract"></meta>
      <link rel="stylesheet" type="text/css" href="../common/formatting/commonltr.css"></link>
      <link rel="stylesheet" type="text/css" href="../common/formatting/site.css"></link>
      <title>CUDA-GDB :: CUDA Toolkit Documentation</title>
      <!--[if lt IE 9]>
      <script src="../common/formatting/html5shiv-printshiv.min.js"></script>
      <![endif]-->
      <script type="text/javascript" charset="utf-8" src="../common/formatting/jquery.min.js"></script>
      <script type="text/javascript" charset="utf-8" src="../common/formatting/jquery.ba-hashchange.min.js"></script>
      <link rel="canonical" href="http://docs.nvidia.com/cuda/cuda-gdb/index.html"></link>
      <link rel="stylesheet" type="text/css" href="../common/formatting/qwcode.highlight.css"></link>
   </head>
   <body>
      
      <article id="contents">
         <div id="release-info">CUDA-GDB
            (<a href="../../pdf/cuda-gdb.pdf">PDF</a>)
            -
            CUDA Toolkit v5.5
            (<a href="https://developer.nvidia.com/cuda-toolkit-archive">older</a>)
            -
            Last updated 
            July 19, 2013
            -
            <a href="mailto:cudatools@nvidia.com?subject=CUDA Tools Documentation Feedback: cuda-gdb">Send Feedback</a></div>
         <div class="topic nested0" id="abstract"><a name="abstract" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#abstract" name="abstract" shape="rect">CUDA-GDB</a></h2>
            <div class="body conbody"></div>
         </div>
         <div class="topic concept nested0" id="introduction"><a name="introduction" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#introduction" name="introduction" shape="rect">1.&nbsp;Introduction</a></h2>
            <div class="body conbody">
               <p class="p">This document introduces CUDA-GDB, the NVIDIA<sup>®</sup> CUDA™ debugger for Linux and Mac OS.
               </p>
            </div>
            <div class="topic concept nested1" id="what-is-cuda-gdb"><a name="what-is-cuda-gdb" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#what-is-cuda-gdb" name="what-is-cuda-gdb" shape="rect">1.1.&nbsp;What is CUDA-GDB?</a></h3>
               <div class="body conbody">
                  <p class="p">CUDA-GDB is the NVIDIA tool for debugging CUDA applications running on Linux and Mac.
                     CUDA-GDB is an extension to the x86-64 port of GDB, the GNU Project debugger. The tool
                     provides developers with a mechanism for debugging CUDA applications running on actual
                     hardware. This enables developers to debug applications without the potential variations
                     introduced by simulation and emulation environments.
                  </p>
                  <p class="p">CUDA-GDB runs on Linux and Mac OS X, 32-bit and 64-bit. CUDA-GDB is based on GDB 7.2 on
                     both Linux and Mac OS X.
                  </p>
               </div>
            </div>
            <div class="topic concept nested1" id="supported-features"><a name="supported-features" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#supported-features" name="supported-features" shape="rect">1.2.&nbsp;Supported Features</a></h3>
               <div class="body conbody">
                  <p class="p">CUDA-GDB is designed to present the user with a seamless debugging environment that
                     allows simultaneous debugging of both GPU and CPU code within the same application. Just
                     as programming in CUDA C is an extension to C programming, debugging with CUDA-GDB is a
                     natural extension to debugging with GDB. The existing GDB debugging features are
                     inherently present for debugging the host code, and additional features have been
                     provided to support debugging CUDA device code.
                  </p>
                  <p class="p">CUDA-GDB supports C and C++ CUDA applications. All the C++ features supported by the NVCC
                     compiler can be debugged by CUDA-GDB.
                  </p>
                  <p class="p">CUDA-GDB allows the user to set breakpoints, to single-step CUDA applications, and also
                     to inspect and modify the memory and variables of any given thread running on the
                     hardware.
                  </p>
                  <p class="p">CUDA-GDB supports debugging all CUDA applications, whether they use the CUDA driver API,
                     the CUDA runtime API, or both.
                  </p>
                  <p class="p">CUDA-GDB supports debugging kernels that have been compiled for specific CUDA
                     architectures, such as <samp class="ph codeph">sm_10</samp> or <samp class="ph codeph">sm_20</samp>, but also
                     supports debugging kernels compiled at runtime, referred to as just-in-time compilation,
                     or JIT compilation for short.
                  </p>
               </div>
            </div>
            <div class="topic concept nested1" id="about-this-document"><a name="about-this-document" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#about-this-document" name="about-this-document" shape="rect">1.3.&nbsp;About This Document</a></h3>
               <div class="body conbody">
                  <p class="p">This document is the main documentation for CUDA-GDB and is organized more as a user
                     manual than a reference manual. The rest of the document will describe how to install
                     and use CUDA-GDB to debug CUDA kernels and how to use the new CUDA commands that have
                     been added to GDB. Some walk-through examples are also provided. It is assumed that the
                     user already knows the basic GDB commands used to debug host applications.
                  </p>
               </div>
            </div>
         </div>
         <div class="topic reference nested0" id="release-notes"><a name="release-notes" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#release-notes" name="release-notes" shape="rect">2.&nbsp;Release Notes</a></h2>
            <div class="body refbody">
               <div class="section">
                  <h2 class="title sectiontitle">5.5 Release</h2>
                  <dl class="dl">
                     <dt class="dt dlterm">Kernel Launch Stack</dt>
                     <dd class="dd">Two new commands, <samp class="ph codeph">info cuda launch stack</samp> and <samp class="ph codeph">info cuda launch children</samp>, are introduced to display the kernel launch stack and the children kernel of a given kernel when Dynamic Parallelism is
                        used.
                        <p class="p"></p>
                     </dd>
                     <dt class="dt dlterm">Single-GPU Debugging (BETA)</dt>
                     <dd class="dd">CUDA-GDB can now be used to debug a CUDA application on the same GPU that is rendering the desktop GUI.  This feature also
                        enables debugging of long-running or indefinite CUDA kernels that would otherwise encounter a launch timeout.  In addition,
                        multiple CUDA-GDB sessions can debug CUDA applications context-switching on the same GPU.  This feature is available on Linux
                        with SM3.5 devices.  For information on enabling this, please see <a class="xref" href="index.html#single-gpu-debugging-with-desktop-manager-running" shape="rect">Single-GPU Debugging with the Desktop Manager Running</a> and <a class="xref" href="index.html#multiple-debuggers" shape="rect">Multiple Debuggers</a>.
                        <p class="p"></p>
                     </dd>
                     <dt class="dt dlterm">Remote GPU Debugging</dt>
                     <dd class="dd">CUDA-GDB in conjunction with CUDA-GDBSERVER can now be used to debug a CUDA application running on the remote host.
                        <p class="p"></p>
                     </dd>
                  </dl>
               </div>
               <div class="section">
                  <h2 class="title sectiontitle">5.0 Release</h2>
                  <dl class="dl">
                     <dt class="dt dlterm">Dynamic Parallelism Support</dt>
                     <dd class="dd">
                        <p class="p">CUDA-GDB fully supports Dynamic Parallelism, a new feature introduced with the 5.0 toolkit. The debugger is able to track
                           the kernels launched from another kernel and to inspect and modify variables like any other CPU-launched kernel.
                        </p>
                     </dd>
                     <dt class="dt dlterm">Attach/Detach</dt>
                     <dd class="dd">
                        <p class="p">It is now possible to attach to a CUDA application that is already running. It is also possible to detach from the application
                           before letting it run to completion. When attached, all the usual features of the debugger are available to the user, as if
                           the application had been launched from the debugger.  This feature is also supported with applications using Dynamic Parallelism.
                        </p>
                     </dd>
                     <dt class="dt dlterm">Attach on exception</dt>
                     <dd class="dd">
                        <p class="p">Using the environment variable <samp class="ph codeph">CUDA_DEVICE_WAITS_ON_EXCEPTION</samp>, the application will run normally until a device exception occurs. Then the application will wait for the debugger to attach
                           itself to it for further debugging.
                        </p>
                     </dd>
                     <dt class="dt dlterm">API Error Reporting</dt>
                     <dd class="dd">
                        <p class="p">Checking the error code of all the CUDA driver API and CUDA runtime API function calls is vital to ensure the correctness
                           of a CUDA application. Now the debugger is able to report, and even stop, when any API call returns an error. See <samp class="ph codeph">set cuda api_failures</samp> for more information.
                        </p>
                     </dd>
                     <dt class="dt dlterm">Inlined Subroutine Support</dt>
                     <dd class="dd">
                        <p class="p">Inlined subroutines are now accessible from the debugger on SM 2.0 and above.  The user can inspect the local variables of
                           those subroutines and visit the call frame stack as if the routines were not inlined.
                        </p>
                     </dd>
                  </dl>
               </div>
               <div class="section">
                  <h2 class="title sectiontitle">4.2 Release</h2>
                  <dl class="dl">
                     <dt class="dt dlterm">Kepler Support</dt>
                     <dd class="dd">
                        <p class="p">The primary change in Release 4.2 of CUDA-GDB is the addition of support for the new Kepler architecture. There are no other
                           user-visible changes in this release.
                        </p>
                     </dd>
                  </dl>
               </div>
               <div class="section">
                  <h2 class="title sectiontitle">4.1 Release</h2>
                  <dl class="dl">
                     <dt class="dt dlterm">Source Base Upgraded to GDB 7.2</dt>
                     <dd class="dd">
                        <p class="p">Until now, CUDA-GDB was based on GDB 6.6 on Linux, and GDB 6.3.5 on Darwin (the
                           Apple branch). Now, both versions of CUDA-GDB are using the same 7.2 source base.
                        </p>
                        <p class="p">Now CUDA-GDB supports newer versions of GCC (tested up to GCC 4.5), has better
                           support for DWARF3 debug information, and better C++ debugging support.
                        </p>
                     </dd>
                     <dt class="dt dlterm">Simultaneous Sessions Support</dt>
                     <dd class="dd">
                        <p class="p">With the 4.1 release, the single CUDA-GDB process restriction is lifted. Now,
                           multiple CUDA-GDB sessions are allowed to co-exist as long as the GPUs are not
                           shared between the applications being processed. For instance, one CUDA-GDB
                           process can debug process foo using GPU 0 while another CUDA-GDB process debugs
                           process bar using GPU 1. The exclusive of GPUs can be enforced with the
                           CUDA_VISIBLE_DEVICES environment variable.
                        </p>
                     </dd>
                     <dt class="dt dlterm">New Autostep Command</dt>
                     <dd class="dd">
                        <p class="p">A new 'autostep' command was added. The command increases the precision of CUDA
                           exceptions by automatically single-stepping through portions of code.
                        </p>
                        <p class="p">Under normal execution, the thread and instruction where an exception occurred
                           may be imprecisely reported. However, the exact instruction that generates the
                           exception can be determined if the program is being single-stepped when the
                           exception occurs.
                        </p>
                        <p class="p">Manually single-stepping through a program is a slow and tedious process.
                           Therefore 'autostep' aides the user by allowing them to specify sections of
                           code where they suspect an exception could occur. These sections are
                           automatically single-stepped through when the program is running, and any
                           exception that occurs within these sections is precisely reported.  
                        </p>
                        <p class="p">Type 'help autostep' from CUDA-GDB for the syntax and usage of the command.</p>
                     </dd>
                     <dt class="dt dlterm">Multiple Context Support</dt>
                     <dd class="dd">
                        <p class="p">On GPUs with compute capability of SM20 or higher, debugging multiple contexts
                           on the same GPU is now supported. It was a known limitation until now.
                        </p>
                     </dd>
                     <dt class="dt dlterm">Device Assertions Support</dt>
                     <dd class="dd">
                        <p class="p">The R285 driver released with the 4.1 version of the toolkit supports device
                           assertions. CUDA_GDB supports the assertion call and stops the execution of the
                           application when the assertion is hit. Then the variables and memory can be
                           inspected as usual. The application can also be resumed past the assertion if
                           needed. Use the 'set cuda hide_internal_frames' option to expose/hide the
                           system call frames (hidden by default).
                        </p>
                     </dd>
                     <dt class="dt dlterm">Temporary Directory</dt>
                     <dd class="dd">
                        <p class="p">By default, the debugger API will use /tmp as the directory to store temporary
                           files. To select a different directory, the $TMPDIR environment variable and
                           the API CUDBG_APICLIENT_PID variable must be set. 
                        </p>
                     </dd>
                  </dl>
               </div>
            </div>
         </div>
         <div class="topic concept nested0" id="getting-started"><a name="getting-started" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#getting-started" name="getting-started" shape="rect">3.&nbsp;Getting Started</a></h2>
            <div class="body conbody">
               <p class="p">Included in this chapter are instructions for installing CUDA-GDB and for using NVCC, the
                  NVIDIA CUDA compiler driver, to compile CUDA programs for debugging.
               </p>
            </div>
            <div class="topic task nested1" id="installation-instructions"><a name="installation-instructions" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#installation-instructions" name="installation-instructions" shape="rect">3.1.&nbsp;Installation Instructions</a></h3>
               <div class="body taskbody">
                  <div class="section context">
                     <p class="p">Follow these steps to install CUDA-GDB.</p>
                  </div>
                  <ol class="ol steps">
                     <li class="li step"><span class="ph cmd">Visit the NVIDIA CUDA Zone download page:</span><a class="xref" href="http://www.nvidia.com/object/cuda_get.html" target="_blank" shape="rect">http://www.nvidia.com/object/cuda_get.html</a></li>
                     <li class="li step"><span class="ph cmd">Select the appropriate operating system–MacOS X or Linux.</span>
                        (See <a class="xref" href="index.html#host-platform-requirements" shape="rect">Host Platform Requirements</a>.)
                        
                     </li>
                     <li class="li step"><span class="ph cmd">Download and install the CUDA Driver.</span></li>
                     <li class="li step"><span class="ph cmd">Download and install the CUDA Toolkit.</span></li>
                  </ol>
               </div>
            </div>
            <div class="topic concept nested1" id="setting-up-the-debugger-environment"><a name="setting-up-the-debugger-environment" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#setting-up-the-debugger-environment" name="setting-up-the-debugger-environment" shape="rect">3.2.&nbsp;Setting Up the Debugger Environment</a></h3>
               <div class="topic concept nested2" id="linux"><a name="linux" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#linux" name="linux" shape="rect">3.2.1.&nbsp;Linux</a></h3>
                  <div class="body conbody">
                     <p class="p">Set up the PATH and LD_LIBRARY_PATH environment variables:</p><pre class="pre screen" xml:space="preserve"><strong class="ph b">export PATH=/usr/local/cuda-<span class="ph">5.5</span>/bin:$PATH

export LD_LIBRARY_PATH=/usr/local/cuda-<span class="ph">5.5</span>/lib64:/usr/local/cuda-<span class="ph">5.5</span>/
lib:$LD_LIBRARY_PATH</strong></pre></div>
               </div>
               <div class="topic concept nested2" id="mac-os-x"><a name="mac-os-x" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#mac-os-x" name="mac-os-x" shape="rect">3.2.2.&nbsp;Mac OS X</a></h3>
                  <div class="body conbody">
                     <div class="section">
                        <h4 class="title sectiontitle">Set up environment variables</h4><pre class="pre screen" xml:space="preserve"><strong class="ph b">$</strong> export PATH=/Developer/NVIDIA/CUDA-<span class="ph">5.5</span>/bin:$PATH
<strong class="ph b">$</strong> export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-<span class="ph">5.5</span>/lib:$DYLD_LIBRARY_PATH</pre></div>
                     <div class="section">
                        <h4 class="title sectiontitle">Set permissions</h4>
                        <p class="p">
                           The first time <samp class="ph codeph">cuda-gdb</samp> is executed, a pop-up dialog window will appear to allow the debugger to take control of another process.
                           The user must have Administrator priviledges to allow it.
                           It is a required step.
                           
                        </p>
                        <p class="p">
                           Another solution used in the past is to add the <samp class="ph codeph">cuda-binary-gdb</samp> to the <samp class="ph codeph">procmod</samp> group and set the <samp class="ph codeph">taskgated</samp> daemon to let such processes take control of other processes.
                           It used to be the solution to fix the <samp class="ph codeph">Unable to find Mach task port for processid</samp> error.
                           
                        </p><pre class="pre screen" xml:space="preserve"><strong class="ph b">$</strong> sudo chgrp procmod /Developer/NVIDIA/CUDA-<span class="ph">5.5</span>/bin/cuda-binary-gdb 
<strong class="ph b">$</strong> sudo chmod 2755 /Developer/NVIDIA/CUDA-<span class="ph">5.5</span>/bin/cuda-binary-gdb 
<strong class="ph b">$</strong> sudo chmod 755 /Developer/NVIDIA/CUDA-<span class="ph">5.5</span>/bin/cuda-gdb</pre><p class="p">To set the <samp class="ph codeph">taskgated</samp> daemon to allow the processes in the <samp class="ph codeph">procmod</samp> group to access Task Ports, <samp class="ph codeph">taskgated</samp> must be launched with the -p option.
                           To make it a permanent option, edit <samp class="ph codeph">/System/Library/LaunchDaemons/com.apple.taskgated.plist</samp>.
                           See <samp class="ph codeph">man taskgated</samp> for more information.
                           Here is an example:
                           
                        </p><pre xml:space="preserve"><span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-directive">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</span>
<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-doctype">&lt;!DOCTYPE plist PUBLIC "-//Apple Inc.//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"&gt;</span>
<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;plist</span> <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">version</span>=<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-value">"1.0"</span><span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&gt;</span>
<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;dict&gt;</span>
        <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;key&gt;</span>Label<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;/key&gt;</span>
        <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;string&gt;</span>com.apple.taskgated<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;/string&gt;</span>
        <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;key&gt;</span>MachServices<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;/key&gt;</span>
        <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;dict&gt;</span>
                <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;key&gt;</span>com.apple.taskgated<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;/key&gt;</span>
                <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;dict&gt;</span>
                        <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;key&gt;</span>TaskSpecialPort<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;/key&gt;</span>
                        <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;integer&gt;</span>9<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;/integer&gt;</span>
                <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;/dict&gt;</span>
        <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;/dict&gt;</span>
        <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;key&gt;</span>ProgramArguments<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;/key&gt;</span>
        <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;array&gt;</span>
                <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;string&gt;</span>/usr/libexec/taskgated<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;/string&gt;</span>
                <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;string&gt;</span>-p<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;/string&gt;</span>
                <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;string&gt;</span>-s<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;/string&gt;</span>
        <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;/array&gt;</span>
<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;/dict&gt;</span>
<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-tag">&lt;/plist&gt;</span>
</pre><p class="p">
                           After editing the file, the system must be rebooted or the daemon stopped and relaunched for the change to take effect.
                           
                        </p>
                        <div class="note note"><span class="notetitle">Note:</span> 
                           Using the <samp class="ph codeph">taskgated</samp>, as every application in the procmod group will have higher priviledges, adding the -p option to the taskgated daemon is
                           a possible security risk.
                           
                        </div>
                     </div>
                     <div class="section">
                        <h4 class="title sectiontitle">Debugging in the console mode</h4>
                        <div class="p">
                           While debugging the application in console mode, it is not uncommon to encounter kernel warnings about unnesting DYLD shared
                           regions for a debugger or a debugged process that look as follows:
                           <pre class="pre screen" xml:space="preserve">cuda-binary-gdb (map: 0xffffff8038644658) triggered DYLD shared region unnest for map: 0xffffff8038644bc8, region 0x7fff95e00000-&gt;0x7fff96000000. While not abnormal for debuggers, this increases system memory footprint until the target exits.
</pre>
                           
                           To prevent such messages from appearing, make sure that the <samp class="ph codeph">vm.shared_region_unnest_logging</samp> kernel parameter is set to zero, for example, by using the following command:
                           <pre class="pre screen" xml:space="preserve"><strong class="ph b">$</strong> sudo sysctl -w vm.shared_region_unnest_logging=0</pre></div>
                     </div>
                  </div>
               </div>
               <div class="topic concept nested2" id="temporary-directory"><a name="temporary-directory" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#temporary-directory" name="temporary-directory" shape="rect">3.2.3.&nbsp;Temporary Directory</a></h3>
                  <div class="body conbody">
                     <p class="p">
                        By default, CUDA-GDB uses <samp class="ph codeph">/tmp</samp> as the directory to store temporary files.
                        To select a different directory, set the <samp class="ph codeph">$TMPDIR</samp> environment variable.
                        
                     </p>
                     <div class="note note"><span class="notetitle">Note:</span> 
                        The user must have write and execute permission to the temporary directory used by CUDA-GDB.
                        Otherwise, the debugger will fail with an internal error.
                        
                     </div>
                  </div>
               </div>
            </div>
            <div class="topic concept nested1" id="compiling-application"><a name="compiling-application" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#compiling-application" name="compiling-application" shape="rect">3.3.&nbsp;Compiling the Application</a></h3>
               <div class="topic concept nested2" id="debug-compilation"><a name="debug-compilation" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#debug-compilation" name="debug-compilation" shape="rect">3.3.1.&nbsp;Debug Compilation</a></h3>
                  <div class="body conbody">
                     <div class="p">NVCC, the NVIDIA CUDA compiler driver, provides a mechanism for generating the debugging information necessary for CUDA-GDB
                        to work properly. The <samp class="ph codeph">-g -G</samp> option pair must be passed to NVCC when an application is compiled in order to debug with CUDA-GDB; for example,
                        <pre class="pre screen" xml:space="preserve"><strong class="ph b">nvcc -g -G foo.cu -o foo</strong></pre></div>
                     <div class="p">Using this line to compile the CUDA application <samp class="ph codeph">foo.cu</samp><ul class="ul">
                           <li class="li">forces <samp class="ph codeph">-O0</samp> compilation, with the exception of very limited dead-code eliminations and register-spilling optimizations.
                           </li>
                           <li class="li">makes the compiler include debug information in the executable</li>
                        </ul>
                     </div>
                  </div>
               </div>
               <div class="topic concept nested2" id="compiling-for-specific-gpus"><a name="compiling-for-specific-gpus" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#compiling-for-specific-gpus" name="compiling-for-specific-gpus" shape="rect">3.3.2.&nbsp;Compiling For Specific GPU architectures</a></h3>
                  <div class="body conbody">
                     <p class="p">By default, the compiler will only generate PTX code for the compute_10
                        virtual architecture. Then, at runtime, the kernels are recompiled for the GPU
                        architecture of the target GPU(s). Compiling for a specific virtual
                        architecture guarantees that the application will work for any GPU architecture
                        after that, for a trade-off in performance. This is done for
                        forward-compatibility.
                     </p>
                     <p class="p"> It is highly recommended to compile the application once and for
                        all for the GPU architectures targeted by the application, and to generate the
                        PTX code for the latest virtual architecture for forward compatibility.
                     </p>
                     <p class="p">A GPU architecture is defined by its compute capability. The list of GPUs
                        and their respective compute capability, see <a class="xref" href="https://developer.nvidia.com/cuda-gpus" shape="rect">https://developer.nvidia.com/cuda-gpus</a>.
                        The same application can be compiled for multiple GPU architectures. Use the
                        <samp class="ph codeph">-gencode</samp> compilation option to dictacte which GPU architecture
                        to compile for. The option can be specified multiple times.
                     </p>
                     <div class="p">For instance, to compile an application for a GPU with compute capability
                        3.0, add the following flag to the compilation command:
                        <pre class="pre screen" xml:space="preserve"><strong class="ph b">-gencode arch=compute_30,code=sm_30</strong></pre></div>
                     <div class="p">To compile PTX code for any future architecture past the compute capability
                        3.5, add the following flag to the compilation command:
                        <pre class="pre screen" xml:space="preserve"><strong class="ph b">-gencode arch=compute_35,code=compute_35</strong></pre></div>
                     <p class="p">
                        For additional information, please consult the compiler documentation at 
                        <a class="xref" href="http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#extended-notation" shape="rect">http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#extended-notation
                           </a></p>
                  </div>
               </div>
            </div>
            <div class="topic concept nested1" id="using-debugger"><a name="using-debugger" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#using-debugger" name="using-debugger" shape="rect">3.4.&nbsp;Using the Debugger</a></h3>
               <div class="body conbody">
                  <p class="p">Debugging a CUDA GPU involves pausing that GPU. When the graphics desktop manager is running on the same GPU, then debugging
                     that GPU freezes the GUI and makes the desktop unusable. To avoid this, use CUDA-GDB in the following system configurations:
                  </p>
               </div>
               <div class="topic concept nested2" id="single-gpu-debugging"><a name="single-gpu-debugging" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#single-gpu-debugging" name="single-gpu-debugging" shape="rect">3.4.1.&nbsp;Single-GPU Debugging</a></h3>
                  <div class="body conbody">
                     <p class="p">In a single GPU system, CUDA-GDB can be used to debug CUDA applications only if no X11 server (on Linux) or no Aqua desktop
                        manager (on Mac OS X) is running on that system.
                     </p>
                     <div class="section">
                        <h4 class="title sectiontitle">On Linux</h4>
                        <p class="p">On Linux you can stop the X11 server by stopping the <samp class="ph codeph">gdm</samp> service.
                        </p>
                     </div>
                     <div class="section">
                        <h4 class="title sectiontitle">On Mac OS X</h4>
                        <p class="p">
                           On Mac OS X you can log in with <samp class="ph codeph">&gt;console</samp> as the user name in the desktop UI login screen.
                           
                        </p>
                        <p class="p">
                           To enable console login option, open the System Prerences-&gt;Users &amp; Group-&gt;Login Options tab,
                           set automatic login option to Off, 
                           and set <samp class="ph codeph">Display login window as</samp> to <samp class="ph codeph">Name and password</samp>.
                           
                        </p>
                        <p class="p">
                           To launch/debug cuda applications in console mode on systems with an integrated GPU and a discrete GPU,
                           also make sure that the <samp class="ph codeph">Automatic Graphics Switching</samp> option in the System Settings-&gt;Energy Saver tab is unchecked.
                           
                        </p>
                     </div>
                  </div>
               </div>
               <div class="topic concept nested2" id="single-gpu-debugging-with-desktop-manager-running"><a name="single-gpu-debugging-with-desktop-manager-running" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#single-gpu-debugging-with-desktop-manager-running" name="single-gpu-debugging-with-desktop-manager-running" shape="rect">3.4.2.&nbsp;Single-GPU Debugging with the Desktop Manager Running</a></h3>
                  <div class="body conbody">
                     <div class="section">
                        <p class="p">CUDA-GDB can be used to debug CUDA applications on the same GPU that is running the desktop GUI.</p>
                        <div class="note note"><span class="notetitle">Note:</span> This is a BETA feature available on Linux and supports devices with SM3.5 compute capability.
                        </div>
                     </div>
                     <div class="section">
                        <p class="p">There are two ways to enable this functionality:</p>
                        <ul class="ul">
                           <li class="li"> Use the following command:  <pre class="pre screen" xml:space="preserve"><samp class="ph codeph">set cuda software_preemption on</samp></pre></li>
                           <li class="li"> Export the following environment variable:  <pre class="pre screen" xml:space="preserve"><samp class="ph codeph"><strong class="ph b">CUDA_DEBUGGER_SOFTWARE_PREEMPTION=1</strong></samp></pre></li>
                        </ul>
                        <p class="p">Either of the options above will activate software preemption.  These options must be set <strong class="ph b">prior</strong> to running the application.  When the GPU hits a breakpoint or any other event that would normally cause the GPU to freeze,
                           CUDA-GDB releases the GPU for use by the desktop or other applications.  This enables CUDA-GDB to debug a CUDA application
                           on the same GPU that is running the desktop GUI, and also enables debugging of multiple CUDA applications context-switching
                           on the same GPU.
                        </p>
                        <div class="note note"><span class="notetitle">Note:</span> The options listed above are ignored for GPUs with less than SM3.5 compute capability.
                        </div>
                     </div>
                  </div>
               </div>
               <div class="topic concept nested2" id="multi-gpu-debugging"><a name="multi-gpu-debugging" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#multi-gpu-debugging" name="multi-gpu-debugging" shape="rect">3.4.3.&nbsp;Multi-GPU Debugging</a></h3>
                  <div class="body conbody">
                     <p class="p">
                        Multi-GPU debugging designates the scenario where the application is running on more than one CUDA-capable device.
                        Multi-GPU debugging is not much different than single-GPU debugging except for a few additional CUDA-GDB commands that let
                        you switch between the GPUs.
                        
                     </p>
                     <p class="p">Any GPU hitting a breakpoint will pause all the GPUs running CUDA on that system. Once paused, you can use <samp class="ph codeph">info cuda kernels</samp> to view all the active kernels and the GPUs they are running on. When any GPU is resumed, all the GPUs are resumed.
                     </p>
                     <div class="note note"><span class="notetitle">Note:</span> If the <samp class="ph codeph">CUDA_VISIBLE_DEVICES</samp> environment is used, only the specified devices are suspended and resumed.
                     </div>
                     <p class="p">All CUDA-capable GPUs may run one or more kernels. To switch to an active kernel, use <samp class="ph codeph">cuda kernel &lt;n&gt;</samp>, where <samp class="ph codeph">n</samp> is the ID of the kernel retrieved from <samp class="ph codeph">info cuda kernels</samp>.
                     </p>
                     <div class="p">
                        <div class="note note"><span class="notetitle">Note:</span> The same kernel can be loaded and used by different contexts and devices at the same time. When a breakpoint is set in such
                           a kernel, by either name or file name and line number, it will be resolved arbitrarily to only one instance of that kernel.
                           With the runtime API, the exact instance to which the breakpoint will be resolved cannot be controlled. With the driver API,
                           the user can control the instance to which the breakpoint will be resolved to by setting the breakpoint <em class="ph i">right after</em> its module is loaded.
                        </div>
                     </div>
                  </div>
               </div>
               <div class="topic concept nested2" id="multi-gpu-debugging-in-console-mode"><a name="multi-gpu-debugging-in-console-mode" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#multi-gpu-debugging-in-console-mode" name="multi-gpu-debugging-in-console-mode" shape="rect">3.4.4.&nbsp;Multi-GPU Debugging in Console Mode</a></h3>
                  <div class="body conbody">
                     <p class="p">CUDA-GDB allows simultaneous debugging of applications running CUDA kernels on multiple GPUs. In console mode, CUDA-GDB can
                        be used to pause and debug every GPU in the system. You can enable console mode as described above for the single GPU console
                        mode.
                     </p>
                  </div>
               </div>
               <div class="topic concept nested2" id="multi-gpu-debugging-with-desktop-manager-running"><a name="multi-gpu-debugging-with-desktop-manager-running" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#multi-gpu-debugging-with-desktop-manager-running" name="multi-gpu-debugging-with-desktop-manager-running" shape="rect">3.4.5.&nbsp;Multi-GPU Debugging with the Desktop Manager Running</a></h3>
                  <div class="body conbody">
                     <div class="section">
                        <p class="p">This can be achieved by running the desktop GUI on one GPU and CUDA on the other GPU to avoid hanging the desktop GUI.</p>
                     </div>
                     <div class="section">
                        <h4 class="title sectiontitle">On Linux</h4>
                        <p class="p">The CUDA driver automatically excludes the GPU used by X11 from being visible to the application being debugged. This might
                           alter the behavior of the application since, if there are <em class="ph i">n</em> GPUs in the system, then only <em class="ph i">n-1</em> GPUs will be visible to the application.
                        </p>
                     </div>
                     <div class="section">
                        <h4 class="title sectiontitle">On Mac OS X</h4>
                        <p class="p">The CUDA driver exposes every CUDA-capable GPU in the system, including the one used by the Aqua desktop manager. To determine
                           which GPU should be used for CUDA, run the <samp class="ph codeph">1_Utilities/deviceQuery</samp> CUDA sample. A truncated example output of <samp class="ph codeph">deviceQuery</samp> is shown below. 
                        </p><pre class="pre screen" xml:space="preserve">
Detected 2 CUDA Capable device(s)

Device 0: "GeForce GT 330M"
  CUDA Driver Version / Runtime Version          5.5 / 5.5
  CUDA Capability Major/Minor version number:    1.2
  Total amount of global memory:                 512 MBytes (536543232 bytes)
  ( 6) Multiprocessors x (  8) CUDA Cores/MP:    48 CUDA Cores
  [... truncated output ...]
 
Device 1: "Quadro K5000"
  CUDA Driver Version / Runtime Version          5.5 / 5.5
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 4096 MBytes (4294508544 bytes)
  ( 8) Multiprocessors x (192) CUDA Cores/MP:    1536 CUDA Cores
  [... truncated output ...]

deviceQuery, CUDA Driver = CUDART, \
   CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, \
   NumDevs = 2, Device0 = GeForce GT 330M, Device1 = Quadro K5000
</pre><p class="p">If Device 0 is rendering the desktop, then Device 1 must be selected for running and debugging the CUDA application. This
                           exclusion of a device can be achieved by setting the <samp class="ph codeph">CUDA_VISIBLE_DEVICES</samp> environment variable to the index of the device that will be used for CUDA. In this particular example, the value would be
                           1:
                        </p><pre class="pre screen" xml:space="preserve">export CUDA_VISIBLE_DEVICES=1</pre><p class="p">
                           As a safeguard mechanism, cuda-gdb will detect if a visible device is also used for display and return an error.
                           To turn off the safeguard mechanism, the <samp class="ph codeph">set cuda gpu_busy_check</samp> should be set to <samp class="ph codeph">off</samp>.
                           
                        </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">set cuda gpu_busy_check off</strong></pre></div>
                  </div>
               </div>
               <div class="topic concept nested2" id="remote-debugging"><a name="remote-debugging" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#remote-debugging" name="remote-debugging" shape="rect">3.4.6.&nbsp;Remote Debugging</a></h3>
                  <div class="body conbody">
                     <p class="p">
                        There are multiple methods to remote debug an application with CUDA_GDB.
                        In addition to using SSH or VNC from the host system to connect to the target system, it is also possible to use the <samp class="ph codeph">target remote</samp> GDB feature.
                        Using this option, the local <samp class="ph codeph">cuda-gdb</samp> (client) connects to the <samp class="ph codeph">cuda-gdbserver</samp> process (the server) running on the target system.
                        This option is supported with a Linux or Mac OS X client and a Linux server. It is not possible to remotely debug a CUDA application
                        running on Mac OS X.
                        
                     </p>
                     <p class="p">
                        Setting remote debugging that way is a 2-step process:
                        
                     </p>
                     <div class="section">
                        <h4 class="title sectiontitle">Launch the cuda-gdbserver on the remote host</h4>
                        <p class="p">cuda-gdbserver can be launched on the remote host in different operation modes.</p>
                        <ul class="ul">
                           <li class="li">Option 1: Launch a new application in debug mode.
                              
                              <p class="p">To launch a new application in debug mode, invoke cuda-gdb server as follows:</p><pre class="pre screen" xml:space="preserve"><strong class="ph b">$</strong> cuda-gdbserver :1234 app_invocation</pre><p class="p">Where <samp class="ph codeph">1234</samp> is the TCP port number that
                                 <samp class="ph codeph">cuda-gdbserver</samp> will listen to for incoming connections from
                                 <samp class="ph codeph">cuda-gdb</samp>, and <samp class="ph codeph">app-invocation</samp> is the
                                 invocation command to launch the application, arguments included.
                              </p>
                           </li>
                           <li class="li"> Option 2: Attach <samp class="ph codeph">cuda-gdbserver</samp> to the running process
                              
                              <p class="p">To attach cuda-gdbserver to an already running process, the
                                 <samp class="ph codeph">--attach</samp> option followed by process identification number
                                 (PID) must be used:
                              </p><pre class="pre screen" xml:space="preserve"><strong class="ph b">$</strong> cuda-gdbserver :1234 --attach 5678</pre><p class="p">Where <samp class="ph codeph">1234</samp> is the TCP port number and
                                 <samp class="ph codeph">5678</samp> is process identifier of the application cuda-gdbserver
                                 must be attached to.
                              </p>
                           </li>
                        </ul>
                        <p class="p">When debugging a 32-bit application on a 64-bit server, <samp class="ph codeph">cuda-gdbserver</samp> must also be 32-bit. 
                        </p>
                     </div>
                     <div class="section">
                        <h4 class="title sectiontitle">Launch <samp class="ph codeph">cuda-gdb</samp> on the client
                        </h4>
                        <p class="p">Configure <samp class="ph codeph">cuda-gdb</samp> to connect to the remote target using either:
                        </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">target remote</strong></pre><p class="p">or</p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">target extended-remote</strong></pre><p class="p">It is recommended to use <samp class="ph codeph">set sysroot</samp> command if libraries installed on the debug target might differ from the ones installed on the debug host.
                           For example, cuda-gdb could be configured to connect to remote target as follows:
                        </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">target remote 192.168.0.2:1234</strong>
(cuda-gdb) <strong class="ph b">set sysroot remote://</strong></pre><p class="p">Where <samp class="ph codeph">192.168.0.2</samp> is the IP address or domain name of the remote target, and <samp class="ph codeph">1234</samp> is the TCP port previously previously opened by <samp class="ph codeph">cuda-gdbserver</samp>.  
                        </p>
                     </div>
                  </div>
               </div>
               <div class="topic concept nested2" id="multiple-debuggers"><a name="multiple-debuggers" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#multiple-debuggers" name="multiple-debuggers" shape="rect">3.4.7.&nbsp;Multiple Debuggers</a></h3>
                  <div class="body conbody">
                     <p class="p">In a multi-GPU environment, several debugging sessions may take place simultaneously as long as the CUDA devices are used
                        exclusively. For instance, one instance of CUDA-GDB can debug a first application that uses the first GPU while another instance
                        of CUDA-GDB debugs a second application that uses the second GPU. The exclusive use of a GPU is achieved by specifying which
                        GPU is visible to the application by using the <samp class="ph codeph">CUDA_VISIBLE_DEVICES</samp> environment variable.
                     </p><pre class="pre screen" xml:space="preserve"><strong class="ph b">$</strong> CUDA_VISIBLE_DEVICES=1 cuda-gdb my_app</pre><p class="p">With software preemption enabled (<samp class="ph codeph">set cuda software_preemption on</samp>), multiple CUDA-GDB instances can be used to debug CUDA applications context-switching on the same GPU.  The <samp class="ph codeph">--cuda-use-lockfile=0</samp> option must be used when starting each debug session, as mentioned in <a class="xref" href="index.html#lock-file" shape="rect">Lock File</a>.
                     </p><pre class="pre screen" xml:space="preserve"><strong class="ph b">$</strong> cuda-gdb --cuda-use-lockfile=0 my_app</pre></div>
               </div>
               <div class="topic concept nested2" id="attaching"><a name="attaching" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#attaching" name="attaching" shape="rect">3.4.8.&nbsp;Attaching/Detaching</a></h3>
                  <div class="body conbody">
                     <p class="p">CUDA-GDB can attach to and detach from a CUDA application running on GPUs with compute capability 2.0 and beyond, using GDB's
                        built-in commands for attaching to or detaching from a process.
                     </p>
                     <p class="p"> Additionally, if the environment variable CUDA_DEVICE_WAITS_ON_EXCEPTION is set to 1 prior to running the CUDA application,
                        the application will run normally until a device exception occurs. The application will then wait for CUDA-GDB to attach itself
                        to it for further debugging.
                     </p>
                  </div>
               </div>
               <div class="topic task nested2" id="cuda-opengl-interop-applications-on-linux"><a name="cuda-opengl-interop-applications-on-linux" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#cuda-opengl-interop-applications-on-linux" name="cuda-opengl-interop-applications-on-linux" shape="rect">3.4.9.&nbsp;CUDA/OpenGL Interop Applications on Linux</a></h3>
                  <div class="body taskbody">
                     <div class="section context">
                        <p class="p">Any CUDA application that uses OpenGL interoperability requires an active windows server.  Such applications will fail to
                           run under console mode debugging on both Linux and Mac OS X. However, if the X server is running on Linux, the render GPU
                           will not be enumerated when debugging, so the application could still fail, unless the application uses the OpenGL device
                           enumeration to access the render GPU. But if the X session is running in non-interactive mode while using the debugger, the
                           render GPU will be enumerated correctly.
                        </p>
                     </div>
                     <ol class="ol steps">
                        <li class="li step"><span class="ph cmd">Launch your X session in non-interactive mode.</span><ol type="a" class="ol substeps">
                              <li class="li substep"><span class="ph cmd">Stop your X server.</span></li>
                              <li class="li substep"><span class="ph cmd">Edit <samp class="ph codeph">/etc/X11/xorg.conf</samp> to contain the following line in the Device section corresponding to your display:</span><pre class="pre screen" xml:space="preserve"><strong class="ph b">Option	"Interactive" "off"</strong></pre></li>
                              <li class="li substep"><span class="ph cmd">Restart your X server.</span></li>
                           </ol>
                        </li>
                        <li class="li step"><span class="ph cmd">Log in remotely (SSH, etc.) and launch your application under CUDA-GDB.</span>
                           This setup works properly for single-GPU and multi-GPU configurations.
                           
                        </li>
                        <li class="li step"><span class="ph cmd">Ensure your <samp class="ph codeph">DISPLAY</samp> environment variable is set appropriately.</span>
                           For example: <pre class="pre screen" xml:space="preserve"><strong class="ph b">export DISPLAY=:0.0</strong></pre></li>
                     </ol>
                     <div class="section postreq">
                        <p class="p">While X is in non-interactive mode, interacting with the X session can cause your debugging session to stall or terminate.</p>
                     </div>
                  </div>
               </div>
            </div>
         </div>
         <div class="topic concept nested0" id="cuda-gdb-extensions"><a name="cuda-gdb-extensions" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#cuda-gdb-extensions" name="cuda-gdb-extensions" shape="rect">4.&nbsp;CUDA-GDB Extensions</a></h2>
            <div class="body conbody">
               <p class="p"></p>
            </div>
            <div class="topic concept nested1" id="command-naming-convention"><a name="command-naming-convention" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#command-naming-convention" name="command-naming-convention" shape="rect">4.1.&nbsp;Command Naming Convention</a></h3>
               <div class="body conbody">
                  <div class="p">The existing GDB commands are unchanged. Every new CUDA command or option is prefixed with the CUDA keyword. As much as possible,
                     CUDA-GDB command names will be similar to the equivalent GDB commands used for debugging host code. For instance, the GDB
                     command to display the host threads and switch to host thread 1 are, respectively:
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info threads</strong>
(cuda-gdb) <strong class="ph b">thread 1</strong></pre></div>
                  <div class="p">To display the CUDA threads and switch to cuda thread 1, the user only has to type:
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info cuda threads</strong>
(cuda-gdb) <strong class="ph b">cuda thread 1</strong></pre></div>
               </div>
            </div>
            <div class="topic concept nested1" id="getting-help"><a name="getting-help" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#getting-help" name="getting-help" shape="rect">4.2.&nbsp;Getting Help</a></h3>
               <div class="body conbody">
                  <p class="p">As with GDB commands, the built-in help for the CUDA commands is accessible from the <samp class="ph codeph">cuda-gdb</samp> command line by using the help command: 
                  </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">help cuda name_of_the_cuda_command</strong>
(cuda-gdb) <strong class="ph b">help set cuda name_of_the_cuda_option</strong>
(cuda-gdb) <strong class="ph b">help info cuda name_of_the_info_cuda_command</strong></pre><p class="p">Moreover, all the CUDA commands can be auto-completed by pressing the TAB key, as with any other GDB command.</p>
               </div>
            </div>
            <div class="topic concept nested1" id="initialization-file"><a name="initialization-file" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#initialization-file" name="initialization-file" shape="rect">4.3.&nbsp;Initialization File</a></h3>
               <div class="body conbody">
                  <p class="p">The initialization file for CUDA-GDB is named <samp class="ph codeph">.cuda-gdbinit</samp> and follows the same rules as the standard <samp class="ph codeph">.gdbinit</samp> file used by GDB. The initialization file may contain any CUDA- GDB command. Those commands will be processed in order when
                     CUDA-GDB is launched.
                  </p>
               </div>
            </div>
            <div class="topic concept nested1" id="gui-integration"><a name="gui-integration" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#gui-integration" name="gui-integration" shape="rect">4.4.&nbsp;GUI Integration</a></h3>
               <div class="body conbody">
                  <div class="section">
                     <h3 class="title sectiontitle">Emacs</h3>
                     <p class="p">CUDA-GDB works with GUD in Emacs and XEmacs. No extra step is required other than pointing to the right binary.</p>
                     <p class="p">To use CUDA-GDB, the <samp class="ph codeph">gud-gdb-command-name</samp> variable must be set to <samp class="ph codeph">cuda-gdb annotate=3</samp>. Use <samp class="ph codeph">M-x customize-variable</samp> to set the variable.
                     </p>
                     <p class="p">Ensure that <samp class="ph codeph">cuda-gdb</samp> is present in the Emacs/XEmacs <samp class="ph codeph">$PATH</samp>.
                     </p>
                  </div>
                  <div class="section">
                     <h3 class="title sectiontitle">DDD</h3>
                     <div class="p">CUDA-GDB works with DDD. To use DDD with CUDA-GDB, launch DDD with the following command:
                        <pre class="pre screen" xml:space="preserve"><strong class="ph b">ddd --debugger cuda-gdb</strong></pre><samp class="ph codeph">cuda-gdb</samp> must be in your <samp class="ph codeph">$PATH</samp>. 
                     </div>
                  </div>
               </div>
            </div>
         </div>
         <div class="topic concept nested0" id="kernel-focus"><a name="kernel-focus" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#kernel-focus" name="kernel-focus" shape="rect">5.&nbsp;Kernel Focus</a></h2>
            <div class="body conbody">
               <p class="p">A CUDA application may be running several host threads and many device threads. To
                  simplify the visualization of information about the state of application, commands are
                  applied to the entity in focus.
               </p>
               <p class="p">When the focus is set to a host thread, the commands will apply only to that host thread
                  (unless the application is fully resumed, for instance). On the device side, the focus
                  is always set to the lowest granularity level–the device thread.
               </p>
            </div>
            <div class="topic concept nested1" id="software-coordinates-vs-hardware-coordinates"><a name="software-coordinates-vs-hardware-coordinates" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#software-coordinates-vs-hardware-coordinates" name="software-coordinates-vs-hardware-coordinates" shape="rect">5.1.&nbsp;Software Coordinates vs. Hardware Coordinates</a></h3>
               <div class="body conbody">
                  <p class="p">A device thread belongs to a block, which in turn belongs to a kernel. Thread, block, and kernel are the software coordinates
                     of the focus. A device thread runs on a lane. A lane belongs to a warp, which belongs to an SM, which in turn belongs to a
                     device. Lane, warp, SM, and device are the hardware coordinates of the focus. Software and hardware coordinates can be used
                     interchangeably and simultaneously as long as they remain coherent.
                  </p>
                  <p class="p">Another software coordinate is sometimes used: the grid. The difference between a grid and a kernel is the scope. The grid
                     ID is unique per GPU whereas the kernel ID is unique across all GPUs. Therefore there is a 1:1 mapping between a kernel and
                     a (grid,device) tuple.
                  </p>
                  <p class="p"><strong class="ph b">Note:  If software preemption is enabled (<samp class="ph codeph">set cuda software_preemption on</samp>), hardware coordinates corresponding to a device thread are likely to change upon resuming execution on the device.  However,
                        software coordinates will remain intact and will not change for the lifetime of the device thread.</strong></p>
               </div>
            </div>
            <div class="topic concept nested1" id="current-focus"><a name="current-focus" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#current-focus" name="current-focus" shape="rect">5.2.&nbsp;Current Focus</a></h3>
               <div class="body conbody">
                  <p class="p">To inspect the current focus, use the cuda command followed by the coordinates of interest:</p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">cuda device sm warp lane block thread</strong>
block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0 
(cuda-gdb) <strong class="ph b">cuda kernel block thread</strong>
kernel 1, block (0,0,0), thread (0,0,0)
(cuda-gdb) <strong class="ph b">cuda kernel</strong>
kernel 1</pre></div>
            </div>
            <div class="topic concept nested1" id="switching-focus"><a name="switching-focus" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#switching-focus" name="switching-focus" shape="rect">5.3.&nbsp;Switching Focus</a></h3>
               <div class="body conbody">
                  <div class="p">To switch the current focus, use the cuda command followed by the coordinates to be changed:
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">cuda device 0 sm 1 warp 2 lane 3</strong>
[Switching focus to CUDA kernel 1, grid 2, block (8,0,0), thread
(67,0,0), device 0, sm 1, warp 2, lane 3]
374 int totalThreads = gridDim.x * blockDim.x;</pre></div>
                  <div class="p">If the specified focus is not fully defined by the command, the debugger will assume that the omitted coordinates are set
                     to the coordinates in the current focus, including the subcoordinates of the block and thread.
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">cuda thread (15)</strong>
[Switching focus to CUDA kernel 1, grid 2, block (8,0,0), thread
(15,0,0), device 0, sm 1, warp 0, lane 15]
374 int totalThreads = gridDim.x * blockDim.x;</pre></div>
                  <div class="p">The parentheses for the block and thread arguments are optional.
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">cuda block 1 thread 3</strong>
[Switching focus to CUDA kernel 1, grid 2, block (1,0,0), thread (3,0,0), 
device 0, sm 3, warp 0, lane 3]
374 int totalThreads = gridDim.x * blockDim.</pre></div>
               </div>
            </div>
         </div>
         <div class="topic concept nested0" id="program-execution"><a name="program-execution" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#program-execution" name="program-execution" shape="rect">6.&nbsp;Program Execution</a></h2>
            <div class="body conbody">
               <p class="p">Applications are launched the same way in CUDA-GDB as they are with GDB by using the run command. This chapter describes how
                  to interrupt and single-step CUDA applications
               </p>
            </div>
            <div class="topic concept nested1" id="interrupting-application"><a name="interrupting-application" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#interrupting-application" name="interrupting-application" shape="rect">6.1.&nbsp;Interrupting the Application</a></h3>
               <div class="body conbody">
                  <p class="p">If the CUDA application appears to be hanging or stuck in an infinite loop, it is possible to manually interrupt the application
                     by pressing <span class="ph uicontrol">CTRL+C</span>. When the signal is received, the GPUs are suspended and the <samp class="ph codeph">cuda-gdb</samp> prompt will appear.
                  </p>
                  <p class="p">At that point, the program can be inspected, modified, single-stepped, resumed, or terminated at the user's discretion.</p>
                  <p class="p">This feature is limited to applications running within the debugger. It is not possible to break into and debug applications
                     that have been launched outside the debugger.
                  </p>
               </div>
            </div>
            <div class="topic concept nested1" id="single-stepping"><a name="single-stepping" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#single-stepping" name="single-stepping" shape="rect">6.2.&nbsp;Single Stepping</a></h3>
               <div class="body conbody">
                  <p class="p">Single-stepping device code is supported. However, unlike host code single-stepping, device code single-stepping works at
                     the warp level. This means that single-stepping a device kernel advances all the active threads in the warp currently in focus.
                     The divergent threads in the warp are not single-stepped.
                  </p>
                  <p class="p">In order to advance the execution of more than one warp, a breakpoint must be set at the desired location and then the application
                     must be fully resumed.
                  </p>
                  <p class="p">A special case is single-stepping over a thread barrier call: <samp class="ph codeph">__syncthreads()</samp>. In this case, an implicit temporary breakpoint is set immediately after the barrier and all threads are resumed until the
                     temporary breakpoint is hit.
                  </p>
                  <p class="p">On GPUs with <samp class="ph codeph">sm_type</samp> lower than <samp class="ph codeph">sm_20</samp> it is not possible to step over a subroutine in the device code. Instead, CUDA-GDB always steps into the device function.
                     On GPUs with <samp class="ph codeph">sm_type</samp><samp class="ph codeph">sm_20</samp> and higher, you can step in, over, or out of the device functions as long as they are not inlined. To force a function to
                     not be inlined by the compiler, the <samp class="ph codeph">__noinline__</samp> keyword must be added to the function declaration.
                  </p>
                  <div class="p"> With Dynamic Parallelism on <samp class="ph codeph">sm_35</samp>, several CUDA APIs can now be instantiated from the device.  The following list defines single-step behavior when encountering
                     these APIs:
                     
                     <ul class="ul">
                        <li class="li">When encountering device side kernel launches (denoted by the <samp class="ph codeph">&lt;&lt;&lt;&gt;&gt;&gt;</samp> launch syntax), the <samp class="ph codeph">step</samp> and <samp class="ph codeph">next</samp> commands will have the same behavior, and both will <strong class="ph b">step over</strong> the launch call.  
                        </li>
                        <li class="li">When encountering <samp class="ph codeph">cudaDeviceSynchronize</samp>, the launch synchronization routine, the <samp class="ph codeph">step</samp> and <samp class="ph codeph">next</samp> commands will have the same behavior, and both will <strong class="ph b">step over</strong> the call.  When stepping over the call, the <strong class="ph b">entire device</strong> is resumed until the call has completed, at which point the device is suspended (without user intervention).
                        </li>
                        <li class="li">When stepping a device grid launch to completion, focus will automatically switch back to the CPU.  The <samp class="ph codeph">cuda kernel</samp> focus switching command must be used to switch to another grid of interest (if one is still resident).
                        </li>
                     </ul>
                  </div>
                  <div class="note note"><span class="notetitle">Note:</span> It is not possible to <strong class="ph b">step into</strong> a device launch call (nor the routine launched by the call).
                  </div>
               </div>
            </div>
         </div>
         <div class="topic concept nested0" id="breakpoints"><a name="breakpoints" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#breakpoints" name="breakpoints" shape="rect">7.&nbsp;Breakpoints &amp; Watchpoints</a></h2>
            <div class="body conbody">
               <p class="p">There are multiple ways to set a breakpoint on a CUDA application. Those methods are
                  described below. The commands to set a breakpoint on the device code are the same as the
                  commands used to set a breakpoint on the host code.
               </p>
               <p class="p">If the breakpoint is set on device code, the breakpoint will be marked pending until the
                  ELF image of the kernel is loaded. At that point, the breakpoint will be resolved and
                  its address will be updated.
               </p>
               <p class="p">When a breakpoint is set, it forces all resident GPU threads to stop at this location
                  when it hits that corresponding PC.
               </p>
               <p class="p">When a breakpoint is hit by one thread, there is no guarantee that the other threads will
                  hit the breakpoint at the same time. Therefore the same breakpoint may be hit several
                  times, and the user must be careful with checking which thread(s) actually hit(s) the
                  breakpoint. 
               </p>
            </div>
            <div class="topic concept nested1" id="symbolic-breakpoints"><a name="symbolic-breakpoints" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#symbolic-breakpoints" name="symbolic-breakpoints" shape="rect">7.1.&nbsp;Symbolic Breakpoints</a></h3>
               <div class="body conbody">
                  <div class="p">To set a breakpoint at the entry of a function, use the <samp class="ph codeph">break</samp> command followed by the name of the function or method:
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">break my_function</strong>
(cuda-gdb) <strong class="ph b">break my_class::my_method</strong></pre></div>
                  <div class="p">For templatized functions and methods, the full signature must be given:
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">break int my_templatized_function&lt;int&gt;(int)</strong></pre></div>
                  <div class="p">The mangled name of the function can also be used. To find the mangled name of a function, you can use the following command:
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">set demangle-style none</strong>
(cuda-gdb) <strong class="ph b">info function my_function_name</strong>
(cuda-gdb) <strong class="ph b">set demangle-style auto</strong></pre></div>
               </div>
            </div>
            <div class="topic concept nested1" id="line-breakpoints"><a name="line-breakpoints" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#line-breakpoints" name="line-breakpoints" shape="rect">7.2.&nbsp;Line Breakpoints</a></h3>
               <div class="body conbody">
                  <div class="p">To set a breakpoint on a specific line number, use the following syntax:
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">break my_file.cu:185</strong></pre></div>
                  <p class="p">If the specified line corresponds to an instruction within templatized code, multiple breakpoints will be created, one for
                     each instance of the templatized code.
                  </p>
               </div>
            </div>
            <div class="topic concept nested1" id="unique_1087814378"><a name="unique_1087814378" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#unique_1087814378" name="unique_1087814378" shape="rect">Address Breakpoints</a></h3>
               <div class="body conbody">
                  <div class="p">To set a breakpoint at a specific address, use the <samp class="ph codeph">break</samp> command with the address as argument:
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">break *0x1afe34d0</strong></pre></div>
                  <p class="p">The address can be any address on the device or the host.</p>
               </div>
            </div>
            <div class="topic concept nested1" id="kernel-entry-breakpoints"><a name="kernel-entry-breakpoints" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#kernel-entry-breakpoints" name="kernel-entry-breakpoints" shape="rect">7.4.&nbsp;Kernel Entry Breakpoints</a></h3>
               <div class="body conbody">
                  <div class="p">To break on the first instruction of every launched kernel, set the <samp class="ph codeph">break_on_launch</samp> option to application:
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">set cuda break_on_launch application</strong></pre></div>
                  <div class="p">Possible options are:
                     
                     <dl class="dl">
                        <dt class="dt dlterm"><samp class="ph codeph">application</samp></dt>
                        <dd class="dd">kernel launched by the user application</dd>
                        <dt class="dt dlterm"><samp class="ph codeph">system</samp></dt>
                        <dd class="dd">any kernel launched by the driver, such as memset</dd>
                        <dt class="dt dlterm"><samp class="ph codeph">all</samp></dt>
                        <dd class="dd">any kernel, application and system</dd>
                        <dt class="dt dlterm"><samp class="ph codeph">none</samp></dt>
                        <dd class="dd">no kernel, application or system</dd>
                     </dl>
                  </div>
                  <p class="p">Those automatic breakpoints are not displayed by the info breakpoints command and are managed separately from individual breakpoints.
                     Turning off the option will not delete other individual breakpoints set to the same address and vice-versa.
                  </p>
                  <p class="p"> Setting <samp class="ph codeph">break_on_launch</samp> option to any value other then <samp class="ph codeph">none</samp> would force <a class="xref" href="index.html#kernel-events" shape="rect">kernel_events</a> option to be set to <samp class="ph codeph">show</samp>.
                  </p>
               </div>
            </div>
            <div class="topic concept nested1" id="conditional-breakpoints"><a name="conditional-breakpoints" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#conditional-breakpoints" name="conditional-breakpoints" shape="rect">7.5.&nbsp;Conditional Breakpoints</a></h3>
               <div class="body conbody">
                  <div class="p">To make the breakpoint conditional, use the optional if keyword or the cond command.
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">break foo.cu:23 if threadIdx.x == 1 &amp;&amp; i &lt; 5</strong>
(cuda-gdb) <strong class="ph b">cond 3 threadIdx.x == 1 &amp;&amp; i &lt; 5</strong></pre></div>
                  <p class="p">Conditional expressions may refer any variable, including built-in variables such as <samp class="ph codeph">threadIdx</samp> and <samp class="ph codeph">blockIdx</samp>. Function calls are not allowed in conditional expressions.
                  </p>
                  <p class="p">Note that conditional breakpoints are always hit and evaluated, but the debugger reports the breakpoint as being hit only
                     if the conditional statement is evaluated to TRUE. The process of hitting the breakpoint and evaluating the corresponding
                     conditional statement is time-consuming. Therefore, running applications while using conditional breakpoints may slow down
                     the debugging session. Moreover, if the conditional statement is always evaluated to FALSE, the debugger may appear to be
                     hanging or stuck, although it is not the case. You can interrupt the application with <span class="ph uicontrol">CTRL-C</span> to verify that progress is being made.
                  </p>
                  <p class="p">
                     Conditional breakpoints can be set on code from CUDA modules that are not already loaded.
                     The verification of the condition will then only take place when the ELF image of that module is loaded.
                     Therefore any error in the conditional expression will be deferred from the instantion of the conditional breakpoint to the
                     moment the CUDA module is loaded.
                     If unsure, first set an unconditional breakpoint at the desired location and add the conditional statement the first time
                     the breakpoint is hit by using the <samp class="ph codeph">cond</samp> command.
                     
                  </p>
               </div>
            </div>
            <div class="topic concept nested1" id="watchpoints"><a name="watchpoints" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#watchpoints" name="watchpoints" shape="rect">7.6.&nbsp;Watchpoints</a></h3>
               <div class="body conbody">
                  <p class="p">
                     Watchpoints on CUDA code are not supported.
                     
                  </p>
                  <p class="p">
                     Watchpoints on host code are supported.
                     The user is invited to read the GDB documentation for a tutorial on how to set watchpoints on host code.
                     
                  </p>
               </div>
            </div>
         </div>
         <div class="topic concept nested0" id="inspecting-program-state"><a name="inspecting-program-state" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#inspecting-program-state" name="inspecting-program-state" shape="rect">8.&nbsp;Inspecting Program State</a></h2>
            <div class="body conbody">
               <p class="p"></p>
            </div>
            <div class="topic concept nested1" id="memory-and-variables"><a name="memory-and-variables" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#memory-and-variables" name="memory-and-variables" shape="rect">8.1.&nbsp;Memory and Variables</a></h3>
               <div class="body conbody">
                  <div class="p">The GDB print command has been extended to decipher the location of any program variable
                     and can be used to display the contents of any CUDA program variable including: 
                     <ul class="ul">
                        <li class="li">data allocated via <samp class="ph codeph">cudaMalloc()</samp></li>
                        <li class="li">data that resides in various GPU memory regions, such as shared, local, and
                           global memory
                        </li>
                        <li class="li">special CUDA runtime variables, such as <samp class="ph codeph">threadIdx</samp></li>
                     </ul>
                  </div>
               </div>
            </div>
            <div class="topic concept nested1" id="variable-storage-and-accessibility"><a name="variable-storage-and-accessibility" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#variable-storage-and-accessibility" name="variable-storage-and-accessibility" shape="rect">8.2.&nbsp;Variable Storage and Accessibility</a></h3>
               <div class="body conbody">
                  <p class="p">Depending on the variable type and usage, variables can be stored either in registers or in <samp class="ph codeph">local</samp>, <samp class="ph codeph">shared</samp>, <samp class="ph codeph">const</samp> or <samp class="ph codeph">global</samp> memory. You can print the address of any variable to find out where it is stored and directly access the associated memory.
                  </p>
                  <div class="p">The example below shows how the variable array, which is of type <samp class="ph codeph">shared int *</samp>, can be directly accessed in order to see what the stored values are in the array.
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">print &amp;array</strong>
$1 = (@shared int (*)[0]) 0x20 
(cuda-gdb) <strong class="ph b">print array[0]@4</strong>
$2 = {0, 128, 64, 192}</pre></div>
                  <div class="p">You can also access the shared memory indexed into the starting offset to see what the stored values are:
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">print *(@shared int*)0x20</strong>
$3 = 0
(cuda-gdb) <strong class="ph b">print *(@shared int*)0x24</strong>
$4 = 128
(cuda-gdb) <strong class="ph b">print *(@shared int*)0x28</strong>
$5 = 64</pre></div>
                  <div class="p">The example below shows how to access the starting address of the input parameter to the kernel.
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">print &amp;data</strong>
$6 = (const @global void * const @parameter *) 0x10
(cuda-gdb) <strong class="ph b">print *(@global void * const @parameter *) 0x10</strong>
$7 = (@global void * const @parameter) 0x110000&lt;/&gt;</pre></div>
               </div>
            </div>
            <div class="topic concept nested1" id="inspecting-textures"><a name="inspecting-textures" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#inspecting-textures" name="inspecting-textures" shape="rect">8.3.&nbsp;Inspecting Textures</a></h3>
               <div class="body conbody">
                  <div class="note note"><span class="notetitle">Note:</span><strong class="ph b">The debugger can always read/write the source variables when the PC is on the first assembly instruction of a source instruction.
                        When doing assembly-level debugging, the value of source variables is not always accessible.</strong></div>
                  <div class="p">To inspect a texture, use the print command while de-referencing the texture recast to the type of the array it is bound to.
                     For instance, if texture tex is bound to array A of type <samp class="ph codeph">float*</samp>, use:
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">print *(@texture float *)tex</strong></pre></div>
                  <div class="p">All the array operators, such as <samp class="ph codeph">[]</samp>, can be applied to<samp class="ph codeph"> (@texture float *)tex</samp>:
                     <pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">print ((@texture float *)tex)[2]</strong>
(cuda-gdb) <strong class="ph b">print ((@texture float *)tex)[2]@4</strong></pre></div>
               </div>
            </div>
            <div class="topic concept nested1" id="info-cuda-commands"><a name="info-cuda-commands" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#info-cuda-commands" name="info-cuda-commands" shape="rect">8.4.&nbsp;Info CUDA Commands</a></h3>
               <div class="body conbody">
                  <div class="p">These are commands that display information about the GPU and the application's CUDA state. The available options are:
                     
                     <dl class="dl">
                        <dt class="dt dlterm"><samp class="ph codeph">devices</samp></dt>
                        <dd class="dd">information about all the devices</dd>
                        <dt class="dt dlterm"><samp class="ph codeph">sms</samp></dt>
                        <dd class="dd">information about all the SMs in the current device</dd>
                        <dt class="dt dlterm"><samp class="ph codeph">warps</samp></dt>
                        <dd class="dd">information about all the warps in the current SM</dd>
                        <dt class="dt dlterm"><samp class="ph codeph">lanes</samp></dt>
                        <dd class="dd">information about all the lanes in the current warp</dd>
                        <dt class="dt dlterm"><samp class="ph codeph">kernels</samp></dt>
                        <dd class="dd">information about all the active kernels</dd>
                        <dt class="dt dlterm"><samp class="ph codeph">blocks</samp></dt>
                        <dd class="dd">information about all the active blocks in the current kernel</dd>
                        <dt class="dt dlterm"><samp class="ph codeph">threads</samp></dt>
                        <dd class="dd">information about all the active threads in the current kernel</dd>
                        <dt class="dt dlterm"><samp class="ph codeph">launch trace</samp></dt>
                        <dd class="dd">information about the parent kernels of the kernel in focus</dd>
                        <dt class="dt dlterm"><samp class="ph codeph">launch children</samp></dt>
                        <dd class="dd">information about the kernels launched by the kernels in focus</dd>
                        <dt class="dt dlterm"><samp class="ph codeph">contexts</samp></dt>
                        <dd class="dd">information about all the contexts</dd>
                     </dl>
                  </div>
                  <div class="p">A filter can be applied to every <samp class="ph codeph">info cuda</samp> command. The filter restricts the scope of the command. A filter is composed of one or more restrictions. A restriction can
                     be any of the following:
                     
                     <ul class="ul">
                        <li class="li"><samp class="ph codeph">device n</samp></li>
                        <li class="li"><samp class="ph codeph">sm n</samp></li>
                        <li class="li"><samp class="ph codeph">warp n</samp></li>
                        <li class="li"><samp class="ph codeph">lane n</samp></li>
                        <li class="li"><samp class="ph codeph">kernel n</samp></li>
                        <li class="li"><samp class="ph codeph">grid n</samp></li>
                        <li class="li"><samp class="ph codeph">block x[,y]</samp> or <samp class="ph codeph">block (x[,y])</samp></li>
                        <li class="li"><samp class="ph codeph">thread x[,y[,z]]</samp> or <samp class="ph codeph">thread (x[,y[,z]])</samp></li>
                        <li class="li"><samp class="ph codeph">breakpoint all</samp> and <samp class="ph codeph">breakpoint n</samp></li>
                     </ul>
                     
                     where <samp class="ph codeph">n</samp>, <samp class="ph codeph">x</samp>, <samp class="ph codeph">y</samp>, <samp class="ph codeph">z</samp> are integers, or one of the following special keywords: <samp class="ph codeph">current</samp>, <samp class="ph codeph">any</samp>, and <samp class="ph codeph">all</samp>. <samp class="ph codeph">current</samp> indicates that the corresponding value in the current focus should be used. <samp class="ph codeph">any</samp> and <samp class="ph codeph">all</samp> indicate that any value is acceptable. 
                  </div>
                  <div class="note note"><span class="notetitle">Note:</span> The <samp class="ph codeph">breakpoint all</samp> and <samp class="ph codeph">breakpoint n</samp> filter are only effective for the <samp class="ph codeph">info cuda threads</samp> command.
                  </div>
               </div>
               <div class="topic reference nested2" id="info-cuda-devices"><a name="info-cuda-devices" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#info-cuda-devices" name="info-cuda-devices" shape="rect">8.4.1.&nbsp;info cuda devices</a></h3>
                  <div class="body refbody">
                     <div class="section">
                        <p class="p">This command enumerates all the GPUs in the system sorted by device index. A <samp class="ph codeph">*</samp> indicates the device currently in focus. This command supports filters. The default is <samp class="ph codeph">device all</samp>. This command prints <samp class="ph codeph">No CUDA Devices</samp> if no GPUs are found.
                        </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info cuda devices</strong>
Dev/Description/SM Type/SMs Warps/SM Lanes/Warp Max Regs/Lane/Active SMs Mask
* 0   gt200      sm_13     24        32          32      128    0x00ffffff</pre></div>
                  </div>
               </div>
               <div class="topic reference nested2" id="info-cuda-sms"><a name="info-cuda-sms" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#info-cuda-sms" name="info-cuda-sms" shape="rect">8.4.2.&nbsp;info cuda sms</a></h3>
                  <div class="body refbody">
                     <div class="section">
                        <p class="p">This command shows all the SMs for the device and the associated active warps on the SMs. This command supports filters and
                           the default is <samp class="ph codeph">device current sm all</samp>. A <samp class="ph codeph">*</samp> indicates the SM is focus. The results are grouped per device.
                        </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info cuda sms SM	Active Warps Mask Device 0</strong>
* 0 0xffffffffffffffff
1 0xffffffffffffffff
2 0xffffffffffffffff
3 0xffffffffffffffff
4 0xffffffffffffffff
5 0xffffffffffffffff
6 0xffffffffffffffff
7 0xffffffffffffffff
8 0xffffffffffffffff
...</pre></div>
                  </div>
               </div>
               <div class="topic reference nested2" id="info-cuda-warps"><a name="info-cuda-warps" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#info-cuda-warps" name="info-cuda-warps" shape="rect">8.4.3.&nbsp;info cuda warps</a></h3>
                  <div class="body refbody">
                     <div class="section">
                        <p class="p">This command takes you one level deeper and prints all the warps information for the SM in focus. This command supports filters
                           and the default is <samp class="ph codeph">device current sm current warp all</samp>. The command can be used to display which warp executes what block.
                        </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info cuda warps</strong>
Wp /Active Lanes Mask/ Divergent Lanes Mask/Active Physical PC/Kernel/BlockIdx
Device 0 SM 0
* 0    0xffffffff    0x00000000 0x000000000000001c    0    (0,0,0)
  1    0xffffffff    0x00000000 0x0000000000000000    0    (0,0,0)
  2    0xffffffff    0x00000000 0x0000000000000000    0    (0,0,0)
  3    0xffffffff    0x00000000 0x0000000000000000    0    (0,0,0)
  4    0xffffffff    0x00000000 0x0000000000000000    0    (0,0,0)
  5    0xffffffff    0x00000000 0x0000000000000000    0    (0,0,0)
  6    0xffffffff    0x00000000 0x0000000000000000    0    (0,0,0)
  7    0xffffffff    0x00000000 0x0000000000000000    0    (0,0,0)
 ...
</pre></div>
                  </div>
               </div>
               <div class="topic reference nested2" id="info-cuda-lanes"><a name="info-cuda-lanes" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#info-cuda-lanes" name="info-cuda-lanes" shape="rect">8.4.4.&nbsp;info cuda lanes</a></h3>
                  <div class="body refbody">
                     <div class="section">
                        <p class="p">This command displays all the lanes (threads) for the warp in focus. This command supports filters and the default is <samp class="ph codeph">device current sm current warp current lane all</samp>. In the example below you can see that all the lanes are at the same physical PC. The command can be used to display which
                           lane executes what thread.
                        </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info cuda lanes</strong>
  Ln    State  Physical PC        ThreadIdx
Device 0 SM 0 Warp 0
*  0    active 0x000000000000008c   (0,0,0)
   1    active 0x000000000000008c   (1,0,0)
   2    active 0x000000000000008c   (2,0,0)
   3    active 0x000000000000008c   (3,0,0)
   4    active 0x000000000000008c   (4,0,0)
   5    active 0x000000000000008c   (5,0,0)
   6    active 0x000000000000008c   (6,0,0)
   7    active 0x000000000000008c   (7,0,0)
   8    active 0x000000000000008c   (8,0,0)
   9    active 0x000000000000008c   (9,0,0)
  10    active 0x000000000000008c  (10,0,0)
  11    active 0x000000000000008c  (11,0,0)
  12    active 0x000000000000008c  (12,0,0)
  13    active 0x000000000000008c  (13,0,0)
  14    active 0x000000000000008c  (14,0,0)
  15    active 0x000000000000008c  (15,0,0)
  16    active 0x000000000000008c  (16,0,0)
 ...		</pre></div>
                  </div>
               </div>
               <div class="topic reference nested2" id="info-cuda-kernels"><a name="info-cuda-kernels" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#info-cuda-kernels" name="info-cuda-kernels" shape="rect">8.4.5.&nbsp;info cuda kernels</a></h3>
                  <div class="body refbody">
                     <div class="section">
                        <p class="p">
                           This command displays on all the active kernels on the GPU in focus.
                           It prints the SM mask, kernel ID, and the grid ID for each kernel with the associated dimensions and arguments.
                           The kernel ID is unique across all GPUs whereas the grid ID is unique per GPU.
                           The <samp class="ph codeph">Parent</samp> column shows the kernel ID of the parent grid.
                           This command supports filters and the default is <samp class="ph codeph">kernel all</samp>.
                           
                        </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info cuda kernels</strong>
  Kernel Parent Dev Grid Status   SMs Mask   GridDim  BlockDim      Name Args 
*      1      -   0    2 Active 0x00ffffff (240,1,1) (128,1,1) acos_main parms=...</pre><p class="p">
                           This command will also show grids that have been launched on the GPU with Dynamic Parallelism.
                           Kernels with a negative grid ID have been launched from the GPU, while kernels with a positive grid ID have been launched
                           from the CPU.
                           
                        </p>
                        <div class="note note"><span class="notetitle">Note:</span> With the <samp class="ph codeph">cudaDeviceSynchronize</samp> routine, it is possible to see grid launches disappear from the device and then resume later after all child launches have
                           completed.
                        </div>
                     </div>
                  </div>
               </div>
               <div class="topic reference nested2" id="info-cuda-blocks"><a name="info-cuda-blocks" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#info-cuda-blocks" name="info-cuda-blocks" shape="rect">8.4.6.&nbsp;info cuda blocks</a></h3>
                  <div class="body refbody">
                     <div class="section">
                        <p class="p">This command displays all the active or running blocks for the kernel in focus. The results are grouped per kernel. This command
                           supports filters and the default is <samp class="ph codeph">kernel current block all</samp>. The outputs are coalesced by default.
                        </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info cuda blocks</strong>
   BlockIdx   To BlockIdx  Count  State
Kernel 1
*  (0,0,0)    (191,0,0)    192    running</pre><p class="p">Coalescing can be turned off as follows in which case more information on the Device and the SM get displayed:</p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">set cuda coalescing off</strong></pre><p class="p">The following is the output of the same command when coalescing is turned off.</p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info cuda blocks</strong>
  BlockIdx   State    Dev SM
Kernel 1
*   (0,0,0)   running   0   0
    (1,0,0)   running   0   3
    (2,0,0)   running   0   6
    (3,0,0)   running   0   9
    (4,0,0)   running   0  12
    (5,0,0)   running   0  15
    (6,0,0)   running   0  18
    (7,0,0)   running   0  21
    (8,0,0)   running   0   1
 ...</pre></div>
                  </div>
               </div>
               <div class="topic reference nested2" id="info-cuda-threads"><a name="info-cuda-threads" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#info-cuda-threads" name="info-cuda-threads" shape="rect">8.4.7.&nbsp;info cuda threads</a></h3>
                  <div class="body refbody">
                     <div class="section">
                        <p class="p">
                           This command displays the application's active CUDA blocks and threads with the total count of threads in those blocks.
                           Also displayed are the virtual PC and the associated source file and the line number information.
                           The results are grouped per kernel. The command supports filters with default being <samp class="ph codeph">kernel current block all thread all</samp>.
                           The outputs are coalesced by default as follows:
                           
                        </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info cuda threads</strong>
  BlockIdx ThreadIdx To BlockIdx ThreadIdx Count   Virtual PC    Filename   Line
Device 0 SM 0
* (0,0,0  (0,0,0)    (0,0,0)  (31,0,0)    32  0x000000000088f88c   acos.cu   376 
  (0,0,0)(32,0,0)  (191,0,0) (127,0,0) 24544  0x000000000088f800   acos.cu   374
 ...</pre><p class="p">Coalescing can be turned off as follows in which case more information is displayed with the output.</p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info cuda threads</strong>
   BlockIdx  ThreadIdx  Virtual PC         Dev SM Wp Ln   Filename  Line
Kernel 1
*  (0,0,0)    (0,0,0)  0x000000000088f88c   0  0  0  0    acos.cu    376
   (0,0,0)    (1,0,0)  0x000000000088f88c   0  0  0  1    acos.cu    376
   (0,0,0)    (2,0,0)  0x000000000088f88c   0  0  0  2    acos.cu    376
   (0,0,0)    (3,0,0)  0x000000000088f88c   0  0  0  3    acos.cu    376
   (0,0,0)    (4,0,0)  0x000000000088f88c   0  0  0  4    acos.cu    376
   (0,0,0)    (5,0,0)  0x000000000088f88c   0  0  0  5    acos.cu    376
   (0,0,0)    (6,0,0)  0x000000000088f88c   0  0  0  6    acos.cu    376
   (0,0,0)    (7,0,0)  0x000000000088f88c   0  0  0  7    acos.cu    376
   (0,0,0)    (8,0,0)  0x000000000088f88c   0  0  0  8    acos.cu    376
   (0,0,0)    (9,0,0)  0x000000000088f88c   0  0  0  9    acos.cu    376
 ...</pre><div class="note note"><span class="notetitle">Note:</span> 
                           In coalesced form, threads must be contiguous in order to be coalesced.
                           If some threads are not currently running on the hardware, they will create <dfn class="term">holes</dfn> in the thread ranges.
                           For instance, if a kernel consist of 2 blocks of 16 threads, and only the 8 lowest threads are active, then 2 coalesced ranges
                           will be printed: one range for block 0 thread 0 to 7, and one range for block 1 thread 0 to 7.
                           Because threads 8-15 in block 0 are not running, the 2 ranges cannot be coalesced.
                           
                        </div>
                        <p class="p">
                           The command also supports <samp class="ph codeph">breakpoint all</samp> and <samp class="ph codeph">breakpoint <em class="ph i">breakpoint_number</em></samp> as filters.
                           The former displays the threads that hit all CUDA breakpoints set by the user.
                           The latter displays the threads that hit the CUDA breakpoint <em class="ph i">breakpoint_number</em>.
                           
                        </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info cuda threads breakpoint all</strong>
  BlockIdx ThreadIdx         Virtual PC Dev SM Wp Ln        Filename  Line 
Kernel 0
   (1,0,0)   (0,0,0) 0x0000000000948e58   0 11  0  0 infoCommands.cu    12 
   (1,0,0)   (1,0,0) 0x0000000000948e58   0 11  0  1 infoCommands.cu    12 
   (1,0,0)   (2,0,0) 0x0000000000948e58   0 11  0  2 infoCommands.cu    12 
   (1,0,0)   (3,0,0) 0x0000000000948e58   0 11  0  3 infoCommands.cu    12 
   (1,0,0)   (4,0,0) 0x0000000000948e58   0 11  0  4 infoCommands.cu    12 
   (1,0,0)   (5,0,0) 0x0000000000948e58   0 11  0  5 infoCommands.cu    12 

(cuda-gdb) <strong class="ph b">info cuda threads breakpoint 2 lane 1</strong>
  BlockIdx ThreadIdx         Virtual PC Dev SM Wp Ln        Filename  Line 
Kernel 0
   (1,0,0)   (1,0,0) 0x0000000000948e58   0 11  0  1 infoCommands.cu    12</pre></div>
                  </div>
               </div>
               <div class="topic reference nested2" id="info-cuda-launch-trace"><a name="info-cuda-launch-trace" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#info-cuda-launch-trace" name="info-cuda-launch-trace" shape="rect">8.4.8.&nbsp;info cuda launch trace </a></h3>
                  <div class="body refbody">
                     <div class="section">
                        <p class="p">
                           This command displays the kernel launch trace for the kernel in focus. The
                           first element in the trace is the kernel in focus. The next element is the
                           kernel that launched this kernel. The trace continues until there is no parent
                           kernel. In that case, the kernel is CPU-launched.
                           
                        </p>
                        <p class="p">
                           For each kernel in the trace, the command prints the level of the kernel in the
                           trace, the kernel ID, the device ID, the grid Id, the status, the kernel
                           dimensions, the kernel name, and the kernel arguments.
                           
                        </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info cuda launch trace</strong>
  Lvl Kernel Dev Grid     Status   GridDim  BlockDim Invocation
*   0      3   0   -7     Active  (32,1,1)  (16,1,1) kernel3(c=5)
    1      2   0   -5 Terminated (240,1,1) (128,1,1) kernel2(b=3)
    2      1   0    2     Active (240,1,1) (128,1,1) kernel1(a=1)</pre><p class="p">
                           A kernel that has been launched but that is not running on the GPU will have a
                           <samp class="ph codeph">Pending</samp> status. A kernel currently running on the GPU will be
                           marked as <samp class="ph codeph">Active</samp>. A kernel waiting to become active again will
                           be displayed as <samp class="ph codeph">Sleeping</samp>. When a kernel has terminated, it is
                           marked as <samp class="ph codeph">Terminated</samp>. For the few cases, when the debugger
                           cannot determine if a kernel is pending or terminated, the status is set to
                           <samp class="ph codeph">Undetermined</samp>.
                           
                        </p>
                        <p class="p">
                           This command supports filters and the default is <samp class="ph codeph">kernel all</samp>.
                           
                        </p>
                     </div>
                  </div>
               </div>
               <div class="topic reference nested2" id="info-cuda-launch-children"><a name="info-cuda-launch-children" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#info-cuda-launch-children" name="info-cuda-launch-children" shape="rect">8.4.9.&nbsp;info cuda launch children</a></h3>
                  <div class="body refbody">
                     <div class="section">
                        <p class="p">
                           This command displays the list of non-terminated kernels launched by the kernel in focus.
                           For each kernel, the kernel ID, the device ID, the grid Id, the kernel dimensions, the kernel name, and the kernel parameters
                           are displayed.
                           
                        </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info cuda launch children</strong>
  Kernel Dev Grid GridDim BlockDim Invocation
*      3   0   -7 (1,1,1)  (1,1,1) kernel5(a=3)
      18   0   -8 (1,1,1) (32,1,1) kernel4(b=5)</pre><p class="p">
                           This command supports filters and the default is <samp class="ph codeph">kernel all</samp>.
                           
                        </p>
                     </div>
                  </div>
               </div>
               <div class="topic reference nested2" id="info-cuda-contexts"><a name="info-cuda-contexts" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#info-cuda-contexts" name="info-cuda-contexts" shape="rect">8.4.10.&nbsp;info cuda contexts</a></h3>
                  <div class="body refbody">
                     <div class="section">
                        <p class="p">This command enumerates all the CUDA contexts running on all GPUs. A <samp class="ph codeph">*</samp> indicates the context currently in focus. This command shows whether a context is currently active on a device or not.
                        </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info cuda contexts</strong>
     Context Dev    State 
  0x080b9518   0 inactive 
* 0x08067948   0   active</pre></div>
                  </div>
               </div>
            </div>
            <div class="topic reference nested1" id="disassembly"><a name="disassembly" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#disassembly" name="disassembly" shape="rect">8.5.&nbsp;Disassembly</a></h3>
               <div class="body refbody">
                  <div class="section">
                     <p class="p">The device SASS code can be disassembled using the standard GDB
                        disassembly instructions such as <samp class="ph codeph">x/i</samp> and
                        <samp class="ph codeph">display/i</samp>.
                     </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">x/4 $pc</strong>
=&gt; 0x7a5cf0 &lt;_Z9foo10Params(Params)+752&gt;: IMUL R2, R0, R3
   0x7a5cf8 &lt;_Z9foo10Params(Params)+760&gt;: MOV R3, R4
   0x7a5d00 &lt;_Z9foo10Params(Params)+768&gt;: IMUL R0, R0, R3
   0x7a5d08 &lt;_Z9foo10Params(Params)+776&gt;: IADD R18, R0, R3</pre><div class="note note"><span class="notetitle">Note:</span> For disassembly instruction to work properly,
                        <samp class="ph codeph">cuobjdump</samp> must be installed and present in your
                        <samp class="ph codeph">$PATH</samp>.
                     </div>
                  </div>
               </div>
            </div>
         </div>
         <div class="topic concept nested0" id="event-notifications"><a name="event-notifications" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#event-notifications" name="event-notifications" shape="rect">9.&nbsp;Event Notifications</a></h2>
            <div class="body conbody">
               <p class="p">
                  As the application is making forward progress, CUDA-GDB notifies the users about kernel events and context events.
                  Within CUDA-GDB, <dfn class="term">kernel</dfn> refers to the device code that executes on the GPU, while <dfn class="term">context</dfn> refers to the virtual address space on the GPU for the kernel.
                  You can turn ON or OFF the display of CUDA context and kernel events to review the flow of the active contexts and kernels.
                  
               </p>
            </div>
            <div class="topic reference nested1" id="context-events"><a name="context-events" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#context-events" name="context-events" shape="rect">9.1.&nbsp;Context Events</a></h3>
               <div class="body refbody">
                  <div class="section">
                     <p class="p">
                        By default, any time a CUDA context is created, pushed, popped, or destroyed by the application, CUDA-GDB displays a notification
                        message.
                        The message includes the context id and the device id to which the context belongs.
                        
                     </p><pre class="pre screen" xml:space="preserve">    [Context Create of context 0xad2fe60 on Device 0]
    [Context Destroy of context 0xad2fe60 on Device 0]</pre><p class="p">
                        The context event notification policy is controlled with the <samp class="ph codeph">context_events</samp> option.
                        
                     </p>
                     <ul class="ul">
                        <li class="li"><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">set cuda context_events on</strong></pre>
                           
                           CUDA-GDB displays the context event notification messages (default).
                           </li>
                        <li class="li"><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">set cuda context_events off</strong></pre>
                           
                           CUDA-GDB does not display the context event notification messages.
                           </li>
                     </ul>
                  </div>
               </div>
            </div>
            <div class="topic reference nested1" id="kernel-events"><a name="kernel-events" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#kernel-events" name="kernel-events" shape="rect">9.2.&nbsp;Kernel Events</a></h3>
               <div class="body refbody">
                  <div class="section">
                     <p class="p">
                        By default, when CUDA-GDB is made aware of the launch or the termination of a CUDA kernel launched from the host, a notification
                        message is displayed.
                        The message includes the kernel id, the kernel name, and the device to which the kernel belongs.
                        
                     </p><pre class="pre screen" xml:space="preserve">    [Launch of CUDA Kernel 1 (kernel3) on Device 0]
    [Termination of CUDA Kernel 1 (kernel3) on Device 0]</pre><p class="p">
                        No notification is sent for the kernels launched from the GPU.
                        
                     </p>
                     <div class="p">
                        The kernel event notification policy is controlled with the <samp class="ph codeph">kernel_events</samp> option.
                        
                        <ul class="ul">
                           <li class="li"><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">set cuda kernel_events on</strong></pre><p class="p">CUDA-GDB displays the kernel events (default).</p>
                           </li>
                           <li class="li"><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">set cuda kernel_events off</strong></pre><p class="p">CUDA-GDB does not display the kernel events.</p>
                           </li>
                        </ul>
                     </div>
                     <p class="p">
                        In addition to displaying kernel events, the underlying policy used to notify the debugger about kernel launches can be changed.
                        By default, kernel launches cause events that CUDA-GDB will processs.
                        If the application launches a large number of kernels, it is preferable to defer sending kernel launch notifications until
                        the time the debugger stops the application. At this time only the kernel launch notifications for kernels that are valid
                        on the stopped devices will be displayed. In this mode, the debugging session will run a lot faster.
                        
                     </p>
                     <div class="p">
                        The deferral of such notifications can be controlled with the <samp class="ph codeph">defer_kernel_launch_notifications</samp> option.
                        
                        <ul class="ul">
                           <li class="li"><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">set cuda defer_kernel_launch_notifications off</strong></pre><p class="p">CUDA_GDB receives events on kernel launches (default).</p>
                           </li>
                           <li class="li"><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">set cuda defer_kernel_launch_notifications on</strong></pre><p class="p">CUDA-GDB defers receiving information about kernel launches</p>
                           </li>
                        </ul>
                     </div>
                     <div class="note note"><span class="notetitle">Note:</span> 
                        If the <a class="xref" href="index.html#kernel-entry-breakpoints" shape="rect">break_on_launch</a> option is set to any value other than <samp class="ph codeph">none</samp>, the deferred kernel launch notifications are disabled.
                        
                     </div>
                  </div>
               </div>
            </div>
         </div>
         <div class="topic concept nested0" id="checking-memory-errors"><a name="checking-memory-errors" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#checking-memory-errors" name="checking-memory-errors" shape="rect">10.&nbsp;Checking Memory Errors</a></h2>
            <div class="body conbody">
               <p class="p">The CUDA memcheck feature detects global memory violations and mis-aligned global memory accesses. This feature is off by
                  default and can be enabled using the following variable in CUDA-GDB before the application is run.
               </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">set cuda memcheck on</strong></pre><p class="p">Once CUDA memcheck is enabled, any detection of global memory violations and mis-aligned global memory accesses will be reported.</p>
               <p class="p">When CUDA memcheck is enabled, all the kernel launches are made blocking, as if the environment variable <samp class="ph codeph">CUDA_LAUNCH_BLOCKING</samp> was set to 1. The host thread launching a kernel will therefore wait until the kernel has completed before proceeding.  This
                  may change the behavior of your application.
               </p>
               <p class="p">You can also run the CUDA memory checker as a standalone tool named CUDA-MEMCHECK. This tool is also part of the toolkit.
                  Please read the related documentation for more information.
               </p>
               <p class="p">By default, CUDA-GDB will report any memory error. See <a class="xref" href="index.html#increasing-precision-of-memory-errors-with-autostep" shape="rect">Increasing the Precision of Memory Errors With Autostep</a> for a list of the memory errors. To increase the number of memory errors being reported and to increase the precision of
                  the memory errors, CUDA memcheck must be turned on.
               </p>
            </div>
            <div class="topic concept nested1" id="increasing-precision-of-memory-errors-with-autostep"><a name="increasing-precision-of-memory-errors-with-autostep" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#increasing-precision-of-memory-errors-with-autostep" name="increasing-precision-of-memory-errors-with-autostep" shape="rect">10.1.&nbsp;Increasing the Precision of Memory Errors With Autostep </a></h3>
               <div class="body conbody">
                  <p class="p">Autostep is a command to increase the precision of CUDA exceptions to the exact lane and
                     instruction, when they would not have been otherwise.
                  </p>
                  <p class="p">Under normal execution, an exception may be reported several instructions after the
                     exception occurred, or the exact thread where an exception occurred may not be known
                     unless the exception is a lane error. However, the precise origin of the exception can
                     be determined if the program is being single-stepped when the exception occurs. Single-
                     stepping manually is a slow and tedious process; stepping takes much longer than normal
                     execution and the user has to single-step each warp individually.
                  </p>
                  <p class="p">Autostep aides the user by allowing them to specify sections of code where they suspect
                     an exception could occur, and these sections are automatically and transparently single-
                     stepped the program is running. The rest of the program is executed normally to minimize
                     the slow-down caused by single-stepping. The precise origin of an exception will be
                     reported if the exception occurs within these sections. Thus the exact instruction and
                     thread where an exception occurred can be found quickly and with much less effort by
                     using autostep.
                  </p>
               </div>
               <div class="topic reference nested2" id="usage"><a name="usage" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#usage" name="usage" shape="rect">10.1.1.&nbsp;Usage</a></h3>
                  <div class="body refbody">
                     <div class="section"><pre class="pre screen" xml:space="preserve"><strong class="ph b">autostep [LOCATION]
autostep [LOCATION] for LENGTH [lines|instructions]</strong></pre><ul class="ul">
                           <li class="li"><samp class="ph codeph">LOCATION</samp> may be anything that you use to specify the location of a breakpoint, such as a line number, function name, or an instruction
                              address preceded by an asterisk. If no <samp class="ph codeph">LOCATION</samp> is specified, then the current instruction address is used.
                           </li>
                           <li class="li"><samp class="ph codeph">LENGTH</samp> specifies the size of the autostep window in number of lines or instructions (<em class="ph i">lines</em> and <em class="ph i">instructions</em> can be shortened, e.g., <em class="ph i">l</em> or <em class="ph i">i</em>). If the length type is not specified, then <em class="ph i">lines</em> is the default. If the <samp class="ph codeph">for</samp> clause is omitted, then the default is 1 line.
                           </li>
                           <li class="li"><samp class="ph codeph">astep</samp> can be used as an alias for the <samp class="ph codeph">autostep</samp> command.
                           </li>
                           <li class="li">Calls to functions made during an autostep will be stepped over.</li>
                           <li class="li">
                              <p class="p">In case of divergence, the length of the autostep window is determined by the number of lines or instructions the first active
                                 lane in each warp executes.
                              </p>
                              <p class="p">Divergent lanes are also single stepped, but the instructions they execute do not count towards the length of the autostep
                                 window.
                              </p>
                           </li>
                           <li class="li">If a breakpoint occurs while inside an autostep window, the warp where the breakpoint was hit will not continue autostepping
                              when the program is resumed. However, other warps may continue autostepping.
                           </li>
                           <li class="li">Overlapping autosteps are not supported.</li>
                        </ul>
                        <p class="p">If an autostep is encountered while another autostep is being executed, then the second autostep is ignored.</p>
                        <p class="p">If an autostep is set before the location of a memory error and no memory error is hit, then it is possible that the  chosen
                           window is too small. This may be caused by the presence of function calls between the address of the autostep location and
                           the instruction that triggers the memory error. In that situation, either increase the size of the window to make sure that
                           the faulty instruction is included, or move to the autostep location to an instruction that will be executed closer in time
                           to the faulty instruction.
                        </p>
                        <div class="note note"><span class="notetitle">Note:</span><strong class="ph b">Autostep requires Fermi GPUs or above.</strong></div>
                     </div>
                  </div>
               </div>
               <div class="topic concept nested2" id="related-commands"><a name="related-commands" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#related-commands" name="related-commands" shape="rect">10.1.2.&nbsp;Related Commands</a></h3>
                  <div class="body conbody">
                     <p class="p">Autosteps and breakpoints share the same numbering so most commands that work with breakpoints will also work with autosteps.</p>
                  </div>
                  <div class="topic reference nested3" id="info-autosteps"><a name="info-autosteps" shape="rect">
                        <!-- --></a><h3 class="title topictitle2"><a href="#info-autosteps" name="info-autosteps" shape="rect">10.1.2.1.&nbsp;info autosteps</a></h3>
                     <div class="body refbody">
                        <div class="section">
                           <p class="p">Shows all breakpoints and autosteps. Similar to <samp class="ph codeph">info breakpoints</samp>.
                           </p><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info autosteps</strong>
Num  Type      Disp Enb Address            What
1    autostep  keep y   0x0000000000401234 in merge at sort.cu:30 for 49 instructions
3    autostep  keep y   0x0000000000489913 in bubble at sort.cu:94 for 11 lines</pre></div>
                     </div>
                  </div>
                  <div class="topic reference nested3" id="disable-autosteps-n"><a name="disable-autosteps-n" shape="rect">
                        <!-- --></a><h3 class="title topictitle2"><a href="#disable-autosteps-n" name="disable-autosteps-n" shape="rect">10.1.2.2.&nbsp;disable autosteps n</a></h3>
                     <div class="body refbody">
                        <div class="section">
                           <p class="p">Disables an autostep. Equivalent to <samp class="ph codeph">disable breakpoints n</samp>.
                           </p>
                        </div>
                     </div>
                  </div>
                  <div class="topic reference nested3" id="delete-autosteps-n"><a name="delete-autosteps-n" shape="rect">
                        <!-- --></a><h3 class="title topictitle2"><a href="#delete-autosteps-n" name="delete-autosteps-n" shape="rect">10.1.2.3.&nbsp;delete autosteps n</a></h3>
                     <div class="body refbody">
                        <div class="section">
                           <p class="p">Deletes an autostep. Equivalent to <samp class="ph codeph">delete breakpoints n</samp>.
                           </p>
                        </div>
                     </div>
                  </div>
                  <div class="topic reference nested3" id="ignore-n-i"><a name="ignore-n-i" shape="rect">
                        <!-- --></a><h3 class="title topictitle2"><a href="#ignore-n-i" name="ignore-n-i" shape="rect">10.1.2.4.&nbsp;ignore n i</a></h3>
                     <div class="body refbody">
                        <div class="section">
                           <p class="p">Do not single-step the next <em class="ph i">i</em> times the debugger enters the window for autostep <em class="ph i">n</em>. This command already exists for breakpoints.
                           </p>
                        </div>
                     </div>
                  </div>
               </div>
            </div>
            <div class="topic reference nested1" id="gpu-error-reporting"><a name="gpu-error-reporting" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#gpu-error-reporting" name="gpu-error-reporting" shape="rect">10.2.&nbsp;GPU Error Reporting</a></h3>
               <div class="body refbody">
                  <div class="section">
                     <p class="p">With improved GPU error reporting in CUDA-GDB, application bugs are now easier to identify and easy to fix. The following
                        table shows the new errors that are reported on GPUs with compute capability <samp class="ph codeph">sm_20</samp> and higher.
                     </p>
                     <div class="note note"><span class="notetitle">Note:</span><strong class="ph b">Continuing the execution of your application after these errors are found can lead to application termination or indeterminate
                           results.</strong></div>
                     <div class="tablenoborder">
                        <table cellpadding="4" cellspacing="0" summary="" class="table" frame="border" border="1" rules="all">
                           <caption><span class="tablecap">Table 1. CUDA Exception Codes</span></caption>
                           <thead class="thead" align="left">
                              <tr class="row" valign="top">
                                 <th class="entry" valign="top" width="33.33333333333333%" id="d54e3047" rowspan="1" colspan="1">Exception Code</th>
                                 <th class="entry" valign="top" width="13.333333333333334%" id="d54e3050" rowspan="1" colspan="1">Precision of the Error</th>
                                 <th class="entry" valign="top" width="13.333333333333334%" id="d54e3053" rowspan="1" colspan="1">Scope of the Error</th>
                                 <th class="entry" valign="top" width="40%" id="d54e3056" rowspan="1" colspan="1">Description</th>
                              </tr>
                           </thead>
                           <tbody class="tbody">
                              <tr class="row">
                                 <td class="entry" valign="top" width="33.33333333333333%" headers="d54e3047" rowspan="1" colspan="1"><samp class="ph codeph">CUDA_EXCEPTION_0 : "Device Unknown Exception"</samp></td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3050" rowspan="1" colspan="1">Not precise</td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3053" rowspan="1" colspan="1">Global error on the GPU</td>
                                 <td class="entry" valign="top" width="40%" headers="d54e3056" rowspan="1" colspan="1">This is a global GPU error caused by the application which does not match any of the listed error codes below. This should
                                    be a rare occurrence. Potentially, this may be due to <samp class="ph codeph">Device Hardware Stack</samp> overflows or a kernel generating an exception very close to its termination.
                                 </td>
                              </tr>
                              <tr class="row">
                                 <td class="entry" valign="top" width="33.33333333333333%" headers="d54e3047" rowspan="1" colspan="1"><samp class="ph codeph">CUDA_EXCEPTION_1 : "Lane Illegal Address"</samp></td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3050" rowspan="1" colspan="1">Precise (Requires <samp class="ph codeph">memcheck on</samp>)
                                 </td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3053" rowspan="1" colspan="1">Per lane/thread error </td>
                                 <td class="entry" valign="top" width="40%" headers="d54e3056" rowspan="1" colspan="1">This occurs when a thread accesses an illegal(out of bounds) global address.</td>
                              </tr>
                              <tr class="row">
                                 <td class="entry" valign="top" width="33.33333333333333%" headers="d54e3047" rowspan="1" colspan="1"><samp class="ph codeph">CUDA_EXCEPTION_2 : "Lane User Stack Overflow"</samp></td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3050" rowspan="1" colspan="1">Precise</td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3053" rowspan="1" colspan="1">Per lane/thread error</td>
                                 <td class="entry" valign="top" width="40%" headers="d54e3056" rowspan="1" colspan="1">This occurs when a thread exceeds its stack memory limit.</td>
                              </tr>
                              <tr class="row">
                                 <td class="entry" valign="top" width="33.33333333333333%" headers="d54e3047" rowspan="1" colspan="1"><samp class="ph codeph">CUDA_EXCEPTION_3 : "Device Hardware Stack Overflow"</samp></td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3050" rowspan="1" colspan="1">Not precise</td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3053" rowspan="1" colspan="1">Global error on the GPU</td>
                                 <td class="entry" valign="top" width="40%" headers="d54e3056" rowspan="1" colspan="1">This occurs when the application triggers a global hardware stack overflow. The main cause of this error is large amounts
                                    of divergence in the presence of function calls.
                                 </td>
                              </tr>
                              <tr class="row">
                                 <td class="entry" valign="top" width="33.33333333333333%" headers="d54e3047" rowspan="1" colspan="1"><samp class="ph codeph">CUDA_EXCEPTION_4 : "Warp Illegal Instruction"</samp></td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3050" rowspan="1" colspan="1">Not precise</td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3053" rowspan="1" colspan="1">Warp error</td>
                                 <td class="entry" valign="top" width="40%" headers="d54e3056" rowspan="1" colspan="1">This occurs when any thread within a warp has executed an illegal instruction.</td>
                              </tr>
                              <tr class="row">
                                 <td class="entry" valign="top" width="33.33333333333333%" headers="d54e3047" rowspan="1" colspan="1"><samp class="ph codeph">CUDA_EXCEPTION_5 : "Warp Out-of-range Address"</samp></td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3050" rowspan="1" colspan="1">Not precise</td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3053" rowspan="1" colspan="1">Warp error</td>
                                 <td class="entry" valign="top" width="40%" headers="d54e3056" rowspan="1" colspan="1">This occurs when any thread within a warp accesses an address that is outside the valid range of local or shared memory regions.</td>
                              </tr>
                              <tr class="row">
                                 <td class="entry" valign="top" width="33.33333333333333%" headers="d54e3047" rowspan="1" colspan="1"><samp class="ph codeph">CUDA_EXCEPTION_6 : "Warp Misaligned Address"</samp></td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3050" rowspan="1" colspan="1">Not precise</td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3053" rowspan="1" colspan="1">Warp error</td>
                                 <td class="entry" valign="top" width="40%" headers="d54e3056" rowspan="1" colspan="1">This occurs when any thread within a warp accesses an address in the local or shared memory segments that is not correctly
                                    aligned.
                                 </td>
                              </tr>
                              <tr class="row">
                                 <td class="entry" valign="top" width="33.33333333333333%" headers="d54e3047" rowspan="1" colspan="1"><samp class="ph codeph">CUDA_EXCEPTION_7 : "Warp Invalid Address Space"</samp></td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3050" rowspan="1" colspan="1">Not precise</td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3053" rowspan="1" colspan="1">Warp error</td>
                                 <td class="entry" valign="top" width="40%" headers="d54e3056" rowspan="1" colspan="1">This occurs when any thread within a warp executes an instruction that accesses a memory space not permitted for that instruction.</td>
                              </tr>
                              <tr class="row">
                                 <td class="entry" valign="top" width="33.33333333333333%" headers="d54e3047" rowspan="1" colspan="1"><samp class="ph codeph">CUDA_EXCEPTION_8 : "Warp Invalid PC"</samp></td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3050" rowspan="1" colspan="1">Not precise</td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3053" rowspan="1" colspan="1">Warp error</td>
                                 <td class="entry" valign="top" width="40%" headers="d54e3056" rowspan="1" colspan="1">This occurs when any thread within a warp advances its PC beyond the 40-bit address space.</td>
                              </tr>
                              <tr class="row">
                                 <td class="entry" valign="top" width="33.33333333333333%" headers="d54e3047" rowspan="1" colspan="1"><samp class="ph codeph">CUDA_EXCEPTION_9 : "Warp Hardware Stack Overflow"</samp></td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3050" rowspan="1" colspan="1">Not precise</td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3053" rowspan="1" colspan="1">Warp error</td>
                                 <td class="entry" valign="top" width="40%" headers="d54e3056" rowspan="1" colspan="1">This occurs when any thread in a warp triggers a hardware stack overflow. This should be a rare occurrence.</td>
                              </tr>
                              <tr class="row">
                                 <td class="entry" valign="top" width="33.33333333333333%" headers="d54e3047" rowspan="1" colspan="1"><samp class="ph codeph">CUDA_EXCEPTION_10 : "Device Illegal Address"</samp></td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3050" rowspan="1" colspan="1">Not precise</td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3053" rowspan="1" colspan="1">Global error</td>
                                 <td class="entry" valign="top" width="40%" headers="d54e3056" rowspan="1" colspan="1">This occurs when a thread accesses an illegal(out of bounds) global address. For increased precision, use the cuda memcheck
                                    feature.
                                 </td>
                              </tr>
                              <tr class="row">
                                 <td class="entry" valign="top" width="33.33333333333333%" headers="d54e3047" rowspan="1" colspan="1"><samp class="ph codeph">CUDA_EXCEPTION_11 : "Lane Misaligned Address"</samp></td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3050" rowspan="1" colspan="1">Precise (Requires memcheck on)</td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3053" rowspan="1" colspan="1">Per lane/thread error</td>
                                 <td class="entry" valign="top" width="40%" headers="d54e3056" rowspan="1" colspan="1">This occurs when a thread accesses a global address that is not correctly aligned.</td>
                              </tr>
                              <tr class="row">
                                 <td class="entry" valign="top" width="33.33333333333333%" headers="d54e3047" rowspan="1" colspan="1"><samp class="ph codeph">CUDA_EXCEPTION_12 : "Warp Assert"</samp></td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3050" rowspan="1" colspan="1">Precise</td>
                                 <td class="entry" valign="top" width="13.333333333333334%" headers="d54e3053" rowspan="1" colspan="1">Per warp</td>
                                 <td class="entry" valign="top" width="40%" headers="d54e3056" rowspan="1" colspan="1">
                                    <p class="p">This occurs when any thread in the warp hits a device side assertion.</p>
                                 </td>
                              </tr>
                           </tbody>
                        </table>
                     </div>
                  </div>
               </div>
            </div>
         </div>
         <div class="topic concept nested0" id="checking-api-errors"><a name="checking-api-errors" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#checking-api-errors" name="checking-api-errors" shape="rect">11.&nbsp;Checking API Errors</a></h2>
            <div class="body conbody">
               <p class="p">CUDA-GDB can automatically check the return code of any driver API or runtime API call. If the return code indicates an error,
                  the debugger will stop or warn the user. 
               </p>
               <div class="p">The behavior is controlled with the <samp class="ph codeph">set cuda api_failures</samp> option. Three modes are supported:
                  
                  <ul class="ul">
                     <li class="li"><samp class="ph codeph">hide</samp> will not report any error of any kind
                     </li>
                     <li class="li"><samp class="ph codeph">ignore</samp> will emit a warning but continue the execution of the application (default)
                     </li>
                     <li class="li"><samp class="ph codeph">stop</samp> will emit an error and stop the application
                     </li>
                  </ul>
               </div>
               <div class="note note"><span class="notetitle">Note:</span> The success return code and other non-error return codes are ignored. For the driver API, those are: <samp class="ph codeph">CUDA_SUCCESS</samp> and <samp class="ph codeph">CUDA_ERROR_NOT_READY</samp>. For the runtime API, they are <samp class="ph codeph">cudaSuccess</samp> and <samp class="ph codeph">cudaErrorNotReady</samp>.
               </div>
            </div>
         </div>
         <div class="topic concept nested0" id="walk-through-examples"><a name="walk-through-examples" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#walk-through-examples" name="walk-through-examples" shape="rect">12.&nbsp;Walk-Through Examples</a></h2>
            <div class="body conbody">
               <p class="p">The chapter contains two CUDA-GDB walk-through examples:</p>
               <ul class="ul">
                  <li class="li"><a class="xref" href="index.html#example-1-bitreverse" shape="rect">Example 1: bitreverse</a></li>
                  <li class="li"><a class="xref" href="index.html#example-2-autostep" shape="rect">Example 2: autostep</a></li>
                  <li class="li"><a class="xref" href="index.html#debugging-with-mpi" shape="rect">Example 3: Debugging an MPI CUDA Application</a></li>
               </ul>
            </div>
            <div class="topic reference nested1" id="example-1-bitreverse"><a name="example-1-bitreverse" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#example-1-bitreverse" name="example-1-bitreverse" shape="rect">12.1.&nbsp;Example 1: bitreverse</a></h3>
               <div class="body refbody">
                  <div class="section">
                     <p class="p">This section presents a walk-through of CUDA-GDB by debugging a sample application–called <samp class="ph codeph">bitreverse</samp>–that performs a simple 8 bit reversal on a data set.
                     </p>
                  </div>
                  <div class="example">
                     <h3 class="title sectiontitle">Source Code</h3><pre xml:space="preserve">1  #include &lt;stdio.h&gt;
2  #include &lt;stdlib.h&gt;
3
4  <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-comment">// Simple 8-bit bit reversal Compute test</span>
5
6  #define N 256
7
8  __global__&nbsp;<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">void</span> bitreverse(<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">void</span> *data) {
9     <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">unsigned</span> <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> *idata = (<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">unsigned</span> <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span>*)data;
10    <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">extern</span>&nbsp;__shared__&nbsp;<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> array[];
11
12    array[<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">threadIdx</span>.x] = idata[<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">threadIdx</span>.x];
13
14    array[<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">threadIdx</span>.x] = ((0xf0f0f0f0 &amp; array[<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">threadIdx</span>.x]) &gt;&gt; 4) |
15                        ((0x0f0f0f0f &amp; array[<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">threadIdx</span>.x]) &lt;&lt; 4);
16    array[<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">threadIdx</span>.x] = ((0xcccccccc &amp; array[<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">threadIdx</span>.x]) &gt;&gt; 2) |
17                        ((0x33333333 &amp; array[<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">threadIdx</span>.x]) &lt;&lt; 2);
18    array[<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">threadIdx</span>.x] = ((0xaaaaaaaa &amp; array[<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">threadIdx</span>.x]) &gt;&gt; 1) |
19                         ((0x55555555 &amp; array[<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">threadIdx</span>.x]) &lt;&lt; 1);
20
21    idata[<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">threadIdx</span>.x] = array[<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">threadIdx</span>.x];
22 }
23
24 <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> main(<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">void</span>) {
25     <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">void</span> *d = NULL; <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> i;
26     <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">unsigned</span> <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> idata[N], odata[N];
27
28     <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">for</span> (i = 0; i &lt; N; i++)
29         idata[i] = (<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">unsigned</span> <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span>)i;
30
31     cudaMalloc((<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">void</span>**)&amp;d, <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">sizeof</span>(<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span>)*N);
32     cudaMemcpy(d, idata, <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">sizeof</span>(<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span>)*N,
33                cudaMemcpyHostToDevice);
34
35     bitreverse<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">&lt;&lt;&lt;</span>1, N, N*<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">sizeof</span>(<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span>)<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">&gt;&gt;&gt;</span>(d);
36
37     cudaMemcpy(odata, d, <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">sizeof</span>(<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span>)*N,
38                cudaMemcpyDeviceToHost);
39
40     <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">for</span> (i = 0; i &lt; N; i++)
41        printf(<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-string">"%u -&gt; %u\n"</span>, idata[i], odata[i]);
42
43     cudaFree((<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">void</span>*)d);
44     <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">return</span> 0;
45 }</pre></div>
               </div>
               <div class="topic task nested2" id="walking-through-code"><a name="walking-through-code" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#walking-through-code" name="walking-through-code" shape="rect">12.1.1.&nbsp;Walking through the Code</a></h3>
                  <div class="body taskbody">
                     <ol class="ol steps">
                        <li class="li step"><span class="ph cmd">Begin by compiling the <samp class="ph codeph">bitreverse.cu</samp> CUDA application for debugging by entering the following command at a shell prompt:</span><pre class="pre screen" xml:space="preserve"><strong class="ph b">$ nvcc -g -G bitreverse.cu -o bitreverse</strong></pre>
                           
                           
                           This command assumes that the source file name is <samp class="ph codeph">bitreverse.cu</samp> and that no additional compiler flags are required for compilation. See also <a class="xref" href="index.html#debug-compilation" shape="rect">Debug Compilation</a></li>
                        <li class="li step"><span class="ph cmd">Start the CUDA debugger by entering the following command at a shell prompt:</span><pre class="pre screen" xml:space="preserve">$ <strong class="ph b">cuda-gdb bitreverse</strong></pre></li>
                        <li class="li step" id="walking-through-code__set-breakpoints"><a name="walking-through-code__set-breakpoints" shape="rect">
                              <!-- --></a><span class="ph cmd">Set breakpoints. Set both the host (<samp class="ph codeph">main</samp>) and GPU (<samp class="ph codeph">bitreverse</samp>) breakpoints here. Also, set a breakpoint at a particular line in the device function (<samp class="ph codeph">bitreverse.cu:18</samp>).</span><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">break main</strong>
Breakpoint 1 at 0x18e1: file bitreverse.cu, line 25.
(cuda-gdb) <strong class="ph b">break bitreverse</strong>
Breakpoint 2 at 0x18a1: file bitreverse.cu, line 8.
(cuda-gdb) <strong class="ph b">break 21</strong>
Breakpoint 3 at 0x18ac: file bitreverse.cu, line 21.</pre></li>
                        <li class="li step"><span class="ph cmd">Run the CUDA application, and it executes until it reaches the first breakpoint (<samp class="ph codeph">main</samp>) set in <a class="xref" href="index.html#walking-through-code__set-breakpoints" shape="rect">3</a>.</span><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">run</strong>
Starting program: /Users/CUDA_User1/docs/bitreverse
Reading symbols for shared libraries
..++........................................................... done

Breakpoint 1, main () at bitreverse.cu:25
25 	void *d = NULL; int i;</pre></li>
                        <li class="li step"><span class="ph cmd">At this point, commands can be entered to advance execution or to print the program state. For this walkthrough, let's continue
                              until the device kernel is launched.</span><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">continue</strong>
Continuing.
Reading symbols for shared libraries .. done 
Reading symbols for shared libraries .. done 
[Context Create of context 0x80f200 on Device 0]
[Launch of CUDA Kernel 0 (bitreverse&lt;&lt;&lt;(1,1,1),(256,1,1)&gt;&gt;&gt;) on Device 0] 
Breakpoint 3 at 0x8667b8: file bitreverse.cu, line 21.
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]

Breakpoint 2, bitreverse&lt;&lt;&lt;(1,1,1),(256,1,1)&gt;&gt;&gt; (data=0x110000) at bitreverse.cu:9
9 	unsigned int *idata = (unsigned int*)data;</pre>
                           
                           
                           CUDA‐GDB has detected that a CUDA device kernel has been reached. The debugger prints the current CUDA thread of focus.
                           </li>
                        <li class="li step"><span class="ph cmd">Verify the CUDA thread of focus with the <samp class="ph codeph">info cuda threads</samp> command and switch between host thread and the CUDA threads:</span><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">info cuda threads</strong>
  BlockIdx ThreadIdx To BlockIdx ThreadIdx Count            Virtual PC 
Filename   Line

Kernel 0
*  (0,0,0)    (0,0,0)    (0,0,0) (255,0,0)    256 0x0000000000866400 bitreverse.cu     9
(cuda-gdb) <strong class="ph b">thread</strong>
[Current thread is 1 (process 16738)]
(cuda-gdb) <strong class="ph b">thread 1</strong>
[Switching to thread 1 (process 16738)]
#0  0x000019d5 in main () at bitreverse.cu:34
34    bitreverse&lt;&lt;&lt;1, N, N*sizeof(int)&gt;&gt;&gt;(d);
(cuda-gdb) <strong class="ph b">backtrace</strong>
#0  0x000019d5 in main () at bitreverse.cu:34
(cuda-gdb) <strong class="ph b">info cuda kernels</strong>
Kernel Dev Grid   SMs Mask GridDim  BlockDim        Name Args
     0   0    1 0x00000001 (1,1,1) (256,1,1) bitreverse data=0x110000
(cuda-gdb) <strong class="ph b">cuda kernel 0</strong>
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
9    unsigned int *idata = (unsigned int*)data;
(cuda-gdb) <strong class="ph b">backtrace</strong>
#0   bitreverse&lt;&lt;&lt;(1,1,1),(256,1,1)&gt;&gt;&gt; (data=0x110000) at bitreverse.cu:9</pre></li>
                        <li class="li step"><span class="ph cmd">Corroborate this information by printing the block and thread indexes:</span><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">print blockIdx</strong>
$1 = {x = 0, y = 0}
(cuda-gdb) <strong class="ph b">print threadIdx</strong>
$2 = {x = 0, y = 0, z = 0)</pre></li>
                        <li class="li step"><span class="ph cmd">The grid and block dimensions can also be printed:</span><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">print gridDim</strong>
$3 = {x = 1, y = 1}
(cuda-gdb) <strong class="ph b">print blockDim</strong>
$4 = {x = 256, y = 1, z = 1)</pre></li>
                        <li class="li step"><span class="ph cmd">Advance kernel execution and verify some data:</span><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">next</strong>
12       array[threadIdx.x] = idata[threadIdx.x];
(cuda-gdb) <strong class="ph b">next</strong>
14       array[threadIdx.x] = ((0xf0f0f0f0 &amp; array[threadIdx.x]) &gt;&gt; 4) | 
(cuda-gdb) <strong class="ph b">next</strong>
16       array[threadIdx.x] = ((0xcccccccc &amp; array[threadIdx.x]) &gt;&gt; 2) | 
(cuda-gdb) <strong class="ph b">next</strong>
18       array[threadIdx.x] = ((0xaaaaaaaa &amp; array[threadIdx.x]) &gt;&gt; 1) |
(cuda-gdb) <strong class="ph b">next</strong>

Breakpoint 3, bitreverse &lt;&lt;&lt;(1,1),(256,1,1)&gt;&gt;&gt; (data=0x100000) at bitreverse.cu:21
21             idata[threadIdx.x] = array[threadIdx.x]; 
(cuda-gdb) <strong class="ph b">print array[0]@12</strong>
$7 = {0, 128, 64, 192, 32, 160, 96, 224, 16, 144, 80, 208} 
(cuda-gdb) <strong class="ph b">print/x array[0]@12</strong>
$8 = {0x0, 0x80, 0x40, 0xc0, 0x20, 0xa0, 0x60, 0xe0, 0x10, 0x90, 0x50,
0xd0}

(cuda-gdb) <strong class="ph b">print &amp;data</strong>
$9 = (@global void * @parameter *) 0x10
(cuda-gdb) <strong class="ph b">print *(@global void * @parameter *) 0x10</strong>
$10 = (@global void * @parameter) 0x100000</pre>
                           
                           
                           The resulting output depends on the current content of the memory location.
                           </li>
                        <li class="li step"><span class="ph cmd">Since thread (<samp class="ph codeph">0,0,0</samp>) reverses the value of <samp class="ph codeph">0</samp>, switch to a different thread to show more interesting data:</span><pre class="pre screen" xml:space="preserve">cuda-gdb) <strong class="ph b">cuda thread 170</strong>
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread
(170,0,0), device 0, sm 0, warp 5, lane 10]</pre></li>
                        <li class="li step"><span class="ph cmd">Delete the breakpoints and continue the program to completion:</span><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">delete breakpoints</strong> 
Delete all breakpoints? (y or n) <strong class="ph b">y</strong> 
(cuda-gdb) <strong class="ph b">continue</strong>
Continuing.

Program exited normally. 
(cuda-gdb)
</pre></li>
                     </ol>
                  </div>
               </div>
            </div>
            <div class="topic reference nested1" id="example-2-autostep"><a name="example-2-autostep" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#example-2-autostep" name="example-2-autostep" shape="rect">12.2.&nbsp;Example 2: autostep</a></h3>
               <div class="body refbody">
                  <div class="section">
                     <p class="p">This section shows how to use the autostep command and demonstrates how it helps increase the precision of memory error reporting.</p>
                  </div>
                  <div class="example">
                     <h3 class="title sectiontitle">Source Code</h3><pre xml:space="preserve">1  #define NUM_BLOCKS 8
2  #define THREADS_PER_BLOCK 64
3
4  <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">__global__</span> <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">void</span> example(<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> **data) {
5    <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> value1, value2, value3, value4, value5;
6    <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> idx1, idx2, idx3;
7
8    idx1 = <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">blockIdx</span>.x * <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">blockDim</span>.x;
9    idx2 = <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">threadIdx</span>.x;
10   idx3 = idx1 + idx2;
11   value1 = *(data[idx1]);
12   value2 = *(data[idx2]);
13   value3 = value1 + value2;
14   value4 = value1 * value2;
15   value5 = value3 + value4;
16   *(data[idx3]) = value5;
17   *(data[idx1]) = value3;
18   *(data[idx2]) = value4;
19   idx1 = idx2 = idx3 = 0;
20 }
21
22 <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> main(<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> argc, <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">char</span> *argv[]) {
23   <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> *host_data[NUM_BLOCKS * THREADS_PER_BLOCK];
24   <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> **dev_data;
25   <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">const</span> <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> zero = 0;
26
27   <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-comment">/* Allocate an integer for each thread in each block */</span>
28   <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">for</span> (<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> block = 0; block &amp;lt; NUM_BLOCKS; block++) {
29     <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">for</span> (<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> thread = 0; thread &amp;lt; THREADS_PER_BLOCK; thread++) {
30       <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> idx = thread + block * THREADS_PER_BLOCK;
31       cudaMalloc(&amp;amp;host_data[idx], <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">sizeof</span>(<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span>));
32       cudaMemcpy(host_data[idx], &amp;amp;zero, <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">sizeof</span>(<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span>),
33                  cudaMemcpyHostToDevice);
34     }
35   }
36
37   <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-comment">/* This inserts an error into block 3, thread 39*/</span>
38   host_data[3*THREADS_PER_BLOCK  + 39] = NULL;
39
40   <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-comment">/* Copy the array of pointers to the device */</span>
41   cudaMalloc((<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">void</span>**)&amp;amp;dev_data,  <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">sizeof</span>(host_data));
42   cudaMemcpy(dev_data, host_data, <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">sizeof</span>(host_data), cudaMemcpyHostToDevice);
43
44   <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-comment">/* Execute example */</span>
45   example &amp;lt;&amp;lt;&amp;lt; NUM_BLOCKS, THREADS_PER_BLOCK <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-attribute">&gt;&gt;&gt;</span> (dev_data);
46   cudaThreadSynchronize();
47 }</pre></div>
                  <div class="section">
                     <p class="p">In this small example, we have an array of pointers to integers, and we want to do some operations on the integers. Suppose,
                        however, that one of the pointers is NULL as shown in line 38. This will cause <samp class="ph codeph">CUDA_EXCEPTION_10 "Device Illegal Address"</samp> to be thrown when we try to access the integer that corresponds with block 3, thread 39. This exception should occur at line
                        16 when we try to write to that value.
                     </p>
                  </div>
               </div>
               <div class="topic task nested2" id="debugging-with-autosteps"><a name="debugging-with-autosteps" shape="rect">
                     <!-- --></a><h3 class="title topictitle2"><a href="#debugging-with-autosteps" name="debugging-with-autosteps" shape="rect">12.2.1.&nbsp;Debugging with Autosteps</a></h3>
                  <div class="body taskbody">
                     <ol class="ol steps">
                        <li class="li step"><span class="ph cmd">Compile the example and start CUDA‐GDB as normal. We begin by running the program:</span><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">run</strong>
Starting program: /home/jitud/cudagdb_test/autostep_ex/example
[Thread debugging using libthread_db enabled] [New Thread 0x7ffff5688700 (LWP 9083)]
[Context Create of context 0x617270 on Device 0]
[Launch of CUDA Kernel 0 (example&lt;&lt;&lt;(8,1,1),(64,1,1)&gt;&gt;&gt;) on Device 0]

Program received signal CUDA_EXCEPTION_10, Device Illegal Address.
[Switching focus to CUDA kernel 0, grid 1, block (1,0,0), thread (0,0,0), device 0, sm 1, warp 0, lane 0]
0x0000000000796f60 in example (data=0x200300000) at example.cu:17
17        *(data[idx1]) = value3;</pre>
                           
                           
                           As expected, we received a <samp class="ph codeph">CUDA_EXCEPTION_10</samp>. However, the reported thread is block 1, thread 0 and the line is 17. Since <samp class="ph codeph">CUDA_EXCEPTION_10</samp> is a Global error, there is no thread information that is reported, so we would manually have to inspect all 512 threads.
                           
                        </li>
                        <li class="li step"><span class="ph cmd">Set <samp class="ph codeph">autosteps</samp>. To get more accurate information, we reason that since <samp class="ph codeph">CUDA_EXCEPTION_10</samp> is a memory access error, it must occur on code that accesses memory. This happens on lines 11, 12, 16, 17, and 18, so we
                              set two autostep windows for those areas:</span><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">autostep 11 for 2 lines</strong>
Breakpoint 1 at 0x796d18: file example.cu, line 11. 
Created autostep of length 2 lines
(cuda-gdb) <strong class="ph b">autostep 16 for 3 lines</strong>
Breakpoint 2 at 0x796e90: file example.cu, line 16. 
Created autostep of length 3 lines</pre></li>
                        <li class="li step"><span class="ph cmd">Finally, we run the program again with these autosteps:</span><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">run</strong>
The program being debugged has been started already. 
Start it from the beginning? (y or n) <strong class="ph b">y</strong>
[Termination of CUDA Kernel 0 (example&lt;&lt;&lt;(8,1,1),(64,1,1)&gt;&gt;&gt;) on Device 0]
Starting program: /home/jitud/cudagdb_test/autostep_ex/example
[Thread debugging using libthread_db enabled] 
[New Thread 0x7ffff5688700 (LWP 9089)]
[Context Create of context 0x617270 on Device 0]
[Launch of CUDA Kernel 1 (example&lt;&lt;&lt;(8,1,1),(64,1,1)&gt;&gt;&gt;) on Device 0] 
[Switching focus to CUDA kernel 1, grid 1, block (0,0,0), thread (0,0,0),
device 0, sm 0, warp 0, lane 0]

Program received signal CUDA_EXCEPTION_10, Device Illegal Address.
[Current focus set to CUDA kernel 1, grid 1, block (3,0,0), thread
(32,0,0), device 0, sm 1, warp 3, lane 0]
Autostep precisely caught exception at example.cu:16 (0x796e90)</pre>
                           
                           
                           This time we correctly caught the exception at line 16. Even though <samp class="ph codeph">CUDA_EXCEPTION_10</samp> is a global error, we have now narrowed it down to a warp error, so we now know that the thread that threw the exception
                           must have been in the same warp as block 3, thread 32.
                           
                        </li>
                     </ol>
                     <div class="section result">In this example, we have narrowed down the scope of the error from 512 threads down to 32 threads just by setting two <samp class="ph codeph">autosteps</samp> and re‐running the program. 
                     </div>
                  </div>
               </div>
            </div>
            <div class="topic reference nested1" id="debugging-with-mpi"><a name="debugging-with-mpi" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#debugging-with-mpi" name="debugging-with-mpi" shape="rect">12.3.&nbsp;Example 3: Debugging an MPI CUDA Application</a></h3>
               <div class="body refbody">
                  <div class="section">
                     <p class="p">For doing large MPI CUDA application debugging, NVIDIA recommends using parallel debuggers supplied by our partners Allinea
                        and Totalview.  Both make excellent parallel debuggers with extended support for CUDA.  However, for debugging smaller applications,
                        or for debugging just a few processes in a large application, CUDA-GDB can easily be used.
                     </p>
                     <p class="p">If the cluster nodes have xterm support, then it is quite easy to use CUDA-GDB.  Just launch CUDA-GDB in the same way you
                        would have launched gdb.
                     </p><pre class="pre screen" xml:space="preserve"><strong class="ph b">$ mpirun -np 4 -host nv1,nv2 xterm -e cuda-gdb a.out</strong></pre><p class="p">You may have to export the DISPLAY variable to make sure that the xterm finds its way back to your display.  For example,
                        with Open MPI you would do something like this.
                        	
                     </p><pre class="pre screen" xml:space="preserve"><strong class="ph b">$ mpirun -np 4 -host nv1,nv2 -x DISPLAY=host.nvidia.com:0 xterm -e cuda-gdb a.out</strong></pre><p class="p">Different MPI implementations have different ways of exporting environment variables to the cluster nodes so check your documentation.</p>
                     <p class="p">In the case where you cannot get xterm support, you can insert a spin loop inside your program.  This works in just the same
                        way as when using gdb on a host only program.  Somewhere near the start of your program, add a code snippet like the following.
                     </p><pre xml:space="preserve">
{
    <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> i = 0;
    <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">char</span> host[256];
    printf(<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-string">"PID %d on node %s is ready for attach\n"</span>,
            getpid(), host);
    fflush(stdout);
    <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">while</span> (0 == i) {
        sleep(5);
    }
}
    </pre><p class="p">Then recompile and run the program.  After it starts, ssh to the nodes of interest and attach to the process.  Set the variable
                        i to 1 to break out of the loop.
                        
                     </p><pre class="pre screen" xml:space="preserve"><strong class="ph b">$ mpirun -np 2 -host nv1,nv2 a.out</strong>
PID 20060 on node nv1 is ready for attach
PID 5488 on node nv2 is ready for attach
    </pre><pre class="pre screen" xml:space="preserve"><strong class="ph b">[nv1]$ cuda-gdb --pid 5488</strong></pre><pre class="pre screen" xml:space="preserve"><strong class="ph b">[nv2]$ cuda-gdb --pid 20060</strong></pre><p class="p">For larger applications in the case where you may just want to attach to a few of the processes, you can conditionalize the
                        spin loop based on the rank.  Most MPIs set an environment variable that is the rank of the process. For Open MPI it is OMPI_COMM_WORLD_RANK
                        and for MVAPICH it is MV2_COMM_WORLD_RANK.  Assuming you want to attach to rank 42, you could add a spin loop like this.
                     </p><pre xml:space="preserve">
{
    <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">char</span> *stoprank;
    stoprank = getenv(<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-string">"OMPI_COMM_WORLD_RANK"</span>);
    <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">if</span> (42 == atoi(stoprank) {
        <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">int</span> i = 0;
        <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">char</span> hostname[256];
        printf(<span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-string">"PID %d on %s ready for attach\n"</span>,
               getpid(), hostname);
        fflush(stdout);
        <span xmlns:xslthl="http://xslthl.sf.net" class="xslthl-keyword">while</span> (0 == i) {
            sleep(5);
        }
    }
}
   </pre><p class="p">Note that by default CUDA-GDB allows debugging a single process per node.  The workaround described in <a class="xref" href="index.html#multiple-debuggers" shape="rect">Multiple Debuggers</a> does not work with MPI applications.  If CUDA_VISIBLE_DEVICES is set, it may cause problems with the GPU selection logic
                        in the MPI application.  It may also prevent CUDA IPC working between GPUs on a node.
                        
                        In order to start multiple CUDA-GDB sessions to debug individual MPI processes on the same node, use the <samp class="ph codeph">--cuda-use-lockfile=0</samp> option when starting CUDA-GDB, as described in <a class="xref" href="index.html#lock-file" shape="rect">Lock File</a>.  Each MPI process must guarantee it targets a unique GPU for this to work properly.
                     </p>
                  </div>
               </div>
            </div>
         </div>
         <div class="topic concept nested0" id="advanced-settings"><a name="advanced-settings" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#advanced-settings" name="advanced-settings" shape="rect">13.&nbsp;Advanced Settings</a></h2>
            <div class="body conbody"></div>
            <div class="topic reference nested1" id="set-cuda-notify"><a name="set-cuda-notify" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#set-cuda-notify" name="set-cuda-notify" shape="rect">13.1.&nbsp;set cuda notify</a></h3>
               <div class="body refbody">
                  <div class="section">
                     <div class="p">
                        Any time a CUDA event occurs, the debugger needs to be notified.
                        The notification takes place in the form of a signal being sent to a host thread.
                        The host thread to receive that special signal is determined with the <samp class="ph codeph">set cuda notify</samp> option.
                        
                        
                        <ul class="ul">
                           <li class="li"><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">set cuda notify youngest</strong></pre><p class="p">The host thread with the smallest thread id will receive the notification signal (default).</p>
                           </li>
                           <li class="li"><pre class="pre screen" xml:space="preserve">(cuda-gdb) <strong class="ph b">set cuda notify random</strong></pre><p class="p">An arbitrary host thread will receive the notification signal.</p>
                           </li>
                        </ul>
                     </div>
                  </div>
               </div>
            </div>
            <div class="topic reference nested1" id="lock-file"><a name="lock-file" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#lock-file" name="lock-file" shape="rect">13.2.&nbsp;Lock File</a></h3>
               <div class="body refbody">
                  <div class="section">
                     <p class="p">
                        When debugging an application, CUDA-GDB will suspend all the visible CUDA-capable devices.
                        To avoid any resource conflict, only one CUDA-GDB session is allowed at a time.
                        To enforce this restriction, CUDA-GDB uses a locking mechanism, implemented with a lock file.
                        That lock file prevents 2 CUDA-GDB processes from running simultaneously.
                        
                     </p>
                     <div class="p">
                        However, if the user desires to debug two applications simultaneously through two separate CUDA-GDB sessions, the following
                        solutions exist:
                        
                        <ul class="ul">
                           <li class="li">Use the <samp class="ph codeph">CUDA_VISIBLE_DEVICES</samp> environment variable to target unique GPUs for each CUDA-GDB session.  This is described in more detail in <a class="xref" href="index.html#multiple-debuggers" shape="rect">Multiple Debuggers</a>.
                           </li>
                           <li class="li">Lift the lockfile restriction by using the <samp class="ph codeph">--cuda-use-lockfile</samp> command-line option.
                              
                              <pre class="pre screen" xml:space="preserve"><strong class="ph b">$</strong> cuda-gdb --cuda-use-lockfile=0 my_app</pre>
                              
                              
                              This option is the recommended solution when debugging multiple ranks of an MPI application that uses separate GPUs for each
                              rank.  It is also required when using software preemption (<samp class="ph codeph">set cuda software_preemption on</samp>) to debug multiple CUDA applications context-switching on the same GPU.
                           </li>
                        </ul>
                     </div>
                  </div>
               </div>
            </div>
         </div>
         <div class="topic concept nested0" id="supported-platforms"><a name="supported-platforms" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#supported-platforms" name="supported-platforms" shape="rect">A.&nbsp;Supported Platforms</a></h2>
            <div class="body conbody">
               <p class="p">The general platform and GPU requirements for running NVIDIA CUDA-GDB are described in
                  this section.
               </p>
            </div>
            <div class="topic reference nested1" id="host-platform-requirements"><a name="host-platform-requirements" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#host-platform-requirements" name="host-platform-requirements" shape="rect">A.1.&nbsp;Host Platform Requirements</a></h3>
               <div class="body refbody">
                  <div class="section">
                     <h3 class="title sectiontitle">Mac OS</h3>
                     <p class="p">CUDA-GDB is supported on both 32-bit and 64-bit editions of the following Mac OS versions:</p>
                     <ul class="ul">
                        <li class="li">Mac OS X 10.7</li>
                        <li class="li">Mac OS X 10.8</li>
                     </ul>
                  </div>
                  <div class="section">
                     <h3 class="title sectiontitle">Linux</h3>
                     <p class="p">CUDA-GDB is supported on both 32-bit and 64-bit editions of the following Linux distributions:</p>
                     <ul class="ul">
                        <li class="li">Red Hat Enterprise Linux 5.5+ (64-bit only)</li>
                        <li class="li">Red Hat Enterprise Linux 6.x</li>
                        <li class="li">Ubuntu 12.04 and 12.10</li>
                        <li class="li">Fedora 18</li>
                        <li class="li">OpenSuse 12.2</li>
                        <li class="li">Suse Linux Enterprise Server 11.1 and 11 SP2</li>
                     </ul>
                  </div>
                  <div class="section">
                     <h3 class="title sectiontitle">GPU Requirements</h3>
                     <p class="p">Debugging is supported on all CUDA-capable GPUs with a compute capability of 1.1 or later. <dfn class="term">Compute capability</dfn> is a device attribute that a CUDA application can query about; for more information, see the latest <cite class="cite">NVIDIA CUDA Programming Guide</cite> on the NVIDIA CUDA Zone Web site: <a class="xref" href="http://developer.nvidia.com/object/gpucomputing.html" target="_blank" shape="rect">http://developer.nvidia.com/object/gpucomputing.html</a>.
                     </p>
                     <p class="p">These GPUs have a compute capability of 1.0 and are <em class="ph i">not supported</em>:
                     </p>
                     <ul class="ul">
                        <li class="li">GeForce 8800 GTS</li>
                        <li class="li">GeForce 8800 GTX</li>
                        <li class="li">GeForce 8800 Ultra</li>
                        <li class="li">Quadro Plex 1000 Model IV</li>
                        <li class="li">Quadro Plex 2100 Model S4</li>
                        <li class="li">Quadro FX 4600</li>
                        <li class="li">Quadro FX 5600</li>
                        <li class="li">Tesla C870</li>
                        <li class="li">Tesla D870</li>
                        <li class="li">Tesla S870</li>
                     </ul>
                  </div>
               </div>
            </div>
         </div>
         <div class="topic reference nested0" id="known-issues"><a name="known-issues" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#known-issues" name="known-issues" shape="rect">B.&nbsp;Known Issues</a></h2>
            <div class="body refbody">
               <div class="section">
                  <p class="p">The following are known issues with the current release.</p>
                  <ul class="ul">
                     <li class="li">Setting the <samp class="ph codeph">cuda memcheck</samp> option ON will make all the launches blocking.
                     </li>
                     <li class="li">Device memory allocated via <samp class="ph codeph">cudaMalloc()</samp> is not visible outside of the kernel function.
                     </li>
                     <li class="li">On GPUs with <samp class="ph codeph">sm_type</samp> lower than <samp class="ph codeph">sm_20</samp> it is not possible to step over a subroutine in the device code.
                     </li>
                     <li class="li">Requesting to read or write GPU memory may be unsuccessful if the size is larger than 100MB on Tesla GPUs and larger than
                        32MB on Fermi GPUs.
                     </li>
                     <li class="li">On GPUs with <samp class="ph codeph">sm_20</samp>, if you are debugging code in device functions that get called by multiple kernels, then setting a breakpoint in the device
                        function will insert the breakpoint in only one of the kernels.
                     </li>
                     <li class="li">In a multi-GPU debugging environment on Mac OS X with Aqua running, you may experience some visible delay while single-stepping
                        the application.
                     </li>
                     <li class="li">Setting a breakpoint on a line within a <samp class="ph codeph">__device__</samp> or <samp class="ph codeph">__global__</samp> function before its module is loaded may result in the breakpoint being temporarily set on the first line of a function below
                        in the source code. As soon as the module for the targeted function is loaded, the breakpoint will be reset properly. In the
                        meantime, the breakpoint may be hit, depending on the application. In those situations, the breakpoint can be safely ignored,
                        and the application can be resumed.
                     </li>
                     <li class="li">The <dfn class="term">scheduler-locking</dfn> option cannot be set to <em class="ph i">on</em>.
                     </li>
                     <li class="li">Stepping again after stepping out of a kernel results in undetermined behavior.  It is recommended to use the 'continue' command
                        instead.
                     </li>
                     <li class="li">To debug CUDA application that uses OpenGL, X server may need to be launched in non-interactive mode. See <a class="xref" href="index.html#cuda-opengl-interop-applications-on-linux" shape="rect">CUDA/OpenGL Interop Applications on Linux</a> for details.
                     </li>
                     <li class="li">Pretty-printing is not supported.</li>
                     <li class="li">When remotely debugging 32-bit applications on a 64-bit server, gdbserver must be 32-bit.</li>
                     <li class="li">Attaching to a CUDA application with Software Preemption enabled in cuda-gdb is not supported.</li>
                     <li class="li">Attaching to CUDA application running in MPS client mode is not supported.</li>
                     <li class="li">Attaching to the MPS server process (nvidia-cuda-mps-server) using cuda-gdb, or starting the MPS server with cuda-gdb is not
                        supported.
                     </li>
                     <li class="li">If a CUDA application is started in the MPS client mode with cuda-gdb, the MPS client will wait untill all other MPS clients
                        have terminated, and will then run as non-MPS application.
                     </li>
                  </ul>
               </div>
            </div>
         </div>
         <div class="topic concept nested0" id="notices-header"><a name="notices-header" shape="rect">
               <!-- --></a><h2 class="title topictitle1"><a href="#notices-header" name="notices-header" shape="rect">Notices</a></h2>
            <div class="topic reference nested1" id="notice"><a name="notice" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#notice" name="notice" shape="rect"></a></h3>
               <div class="body refbody">
                  <div class="section">
                     <h3 class="title sectiontitle">Notice</h3>
                     <p class="p">ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND
                        SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE
                        WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS
                        FOR A PARTICULAR PURPOSE. 
                     </p>
                     <p class="p">Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the
                        consequences of use of such information or for any infringement of patents or other rights of third parties that may result
                        from its use. No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation. Specifications
                        mentioned in this publication are subject to change without notice. This publication supersedes and replaces all other information
                        previously supplied. NVIDIA Corporation products are not authorized as critical components in life support devices or systems
                        without express written approval of NVIDIA Corporation.
                     </p>
                  </div>
               </div>
            </div>
            <div class="topic reference nested1" id="trademarks"><a name="trademarks" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#trademarks" name="trademarks" shape="rect"></a></h3>
               <div class="body refbody">
                  <div class="section">
                     <h3 class="title sectiontitle">Trademarks</h3>
                     <p class="p">NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation
                        in the U.S. and other countries.  Other company and product names may be trademarks of
                        the respective companies with which they are associated.
                     </p>
                  </div>
               </div>
            </div>
            <div class="topic reference nested1" id="copyright"><a name="copyright" shape="rect">
                  <!-- --></a><h3 class="title topictitle2"><a href="#copyright" name="copyright" shape="rect"></a></h3>
               <div class="body refbody">
                  <div class="section">
                     <h3 class="title sectiontitle">Copyright</h3>
                     <p class="p">© 2007-<span class="ph">2013</span> NVIDIA Corporation. All rights reserved.
                     </p>
                  </div>
               </div>
            </div>
         </div>
         
         <hr id="contents-end"></hr>
         <div id="release-info">CUDA-GDB
            (<a href="../../pdf/cuda-gdb.pdf">PDF</a>)
            -
            CUDA Toolkit v5.5
            (<a href="https://developer.nvidia.com/cuda-toolkit-archive">older</a>)
            -
            Last updated 
            July 19, 2013
            -
            <a href="mailto:cudatools@nvidia.com?subject=CUDA Tools Documentation Feedback: cuda-gdb">Send Feedback</a></div>
         
      </article>
      
      <header id="header"><span id="company">NVIDIA</span><span id="site-title">CUDA Toolkit Documentation</span><form id="search" method="get" action="search">
            <input type="text" name="search-text"></input><fieldset id="search-location">
               <legend>Search In:</legend>
               <label><input type="radio" name="search-type" value="site"></input>Entire Site</label>
               <label><input type="radio" name="search-type" value="document"></input>Just This Document</label></fieldset>
            <button type="reset">clear search</button>
            <button id="submit" type="submit">search</button></form>
      </header>
      <nav id="site-nav">
         <div class="category closed"><span class="twiddle">▷</span><a href="../index.html" title="The root of the site.">CUDA Toolkit</a></div>
         <ul class="closed">
            <li><a href="../cuda-toolkit-release-notes/index.html" title="The Release Notes for the CUDA Toolkit from v4.0 to today.">Release Notes</a></li>
            <li><a href="../eula/index.html" title="The End User License Agreements for the NVIDIA CUDA Toolkit, the NVIDIA CUDA Samples, the NVIDIA Display Driver, and NVIDIA NSight (Visual Studio Edition).">EULA</a></li>
            <li><a href="../cuda-getting-started-guide-for-linux/index.html" title="This guide discusses how to install and check for correct operation of the CUDA Development Tools on GNU/Linux systems.">Getting Started Linux</a></li>
            <li><a href="../cuda-getting-started-guide-for-mac-os-x/index.html" title="This guide discusses how to install and check for correct operation of the CUDA Development Tools on Mac OS X systems.">Getting Started Mac OS X</a></li>
            <li><a href="../cuda-getting-started-guide-for-microsoft-windows/index.html" title="This guide discusses how to install and check for correct operation of the CUDA Development Tools on Microsoft Windows systems.">Getting Started Windows</a></li>
            <li><a href="../cuda-c-programming-guide/index.html" title="This guide provides a detailed discussion of the CUDA programming model and programming interface. It then describes the hardware implementation, and provides guidance on how to achieve maximum performance. The Appendixes include a list of all CUDA-enabled devices, detailed description of all extensions to the C language, listings of supported mathematical functions, C++ features supported in host and device code, details on texture fetching, technical specifications of various devices, and concludes by introducing the low-level driver API.">Programming Guide</a></li>
            <li><a href="../cuda-c-best-practices-guide/index.html" title="This guide presents established parallelization and optimization techniques and explains coding metaphors and idioms that can greatly simplify programming for CUDA-capable GPU architectures. The intent is to provide guidelines for obtaining the best performance from NVIDIA GPUs using the CUDA Toolkit.">Best Practices Guide</a></li>
            <li><a href="../kepler-compatibility-guide/index.html" title="This application note is intended to help developers ensure that their NVIDIA CUDA applications will run effectively on GPUs based on the NVIDIA Kepler Architecture. This document provides guidance to ensure that your software applications are compatible with Kepler.">Kepler Compatibility Guide</a></li>
            <li><a href="../kepler-tuning-guide/index.html" title="Kepler is NVIDIA's next-generation architecture for CUDA compute applications. Applications that follow the best practices for the Fermi architecture should typically see speedups on the Kepler architecture without any code changes. This guide summarizes the ways that an application can be fine-tuned to gain additional speedups by leveraging Kepler architectural features.">Kepler Tuning Guide</a></li>
            <li><a href="../parallel-thread-execution/index.html" title="This guide provides detailed instructions on the use of PTX, a low-level parallel thread execution virtual machine and instruction set architecture (ISA). PTX exposes the GPU as a data-parallel computing device.">PTX ISA</a></li>
            <li><a href="../optimus-developer-guide/index.html" title="This document explains how CUDA APIs can be used to query for GPU capabilities in NVIDIA Optimus systems.">Developer Guide for Optimus</a></li>
            <li><a href="../video-decoder/index.html" title="This document provides the video decoder API specification and the format conversion and display using DirectX or OpenGL following decode.">Video Decoder</a></li>
            <li><a href="../video-encoder/index.html" title="This document provides the CUDA video encoder specifications, including the C-library API functions and encoder query parameters.">Video Encoder</a></li>
            <li><a href="../inline-ptx-assembly/index.html" title="This document shows how to inline PTX (parallel thread execution) assembly language statements into CUDA code. It describes available assembler statement parameters and constraints, and the document also provides a list of some pitfalls that you may encounter.">Inline PTX Assembly</a></li>
            <li><a href="../cuda-runtime-api/index.html" title="The CUDA runtime API.">CUDA Runtime API</a></li>
            <li><a href="../cuda-driver-api/index.html" title="The CUDA driver API.">CUDA Driver API</a></li>
            <li><a href="../cuda-math-api/index.html" title="The CUDA math API.">CUDA Math API</a></li>
            <li><a href="../cublas/index.html" title="The CUBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the NVIDIA CUDA runtime. It allows the user to access the computational resources of NVIDIA Graphical Processing Unit (GPU), but does not auto-parallelize across multiple GPUs.">CUBLAS</a></li>
            <li><a href="../cufft/index.html" title="The CUFFT library user guide.">CUFFT</a></li>
            <li><a href="../curand/index.html" title="The CURAND library user guide.">CURAND</a></li>
            <li><a href="../cusparse/index.html" title="The CUSPARSE library user guide.">CUSPARSE</a></li>
            <li><a href="../npp/index.html" title="NVIDIA NPP is a library of functions for performing CUDA accelerated processing. The initial set of functionality in the library focuses on imaging and video processing and is widely applicable for developers in these areas. NPP will evolve over time to encompass more of the compute heavy tasks in a variety of problem domains. The NPP library is written to maximize flexibility, while maintaining high performance.">NPP</a></li>
            <li><a href="../thrust/index.html" title="The Thrust getting started guide.">Thrust</a></li>
            <li><a href="../cuda-samples/index.html" title="This document contains a complete listing of the code samples that are included with the NVIDIA CUDA Toolkit. It describes each code sample, lists the minimum GPU specification, and provides links to the source code and white papers if available.">CUDA Samples</a></li>
            <li><a href="../cuda-compiler-driver-nvcc/index.html" title="This document is a reference guide on the use of the CUDA compiler driver nvcc. Instead of being a specific CUDA compilation driver, nvcc mimics the behavior of the GNU compiler gcc, accepting a range of conventional compiler options, such as for defining macros and include/library paths, and for steering the compilation process.">NVCC</a></li>
            <li><a href="../cuda-gdb/index.html" title="The NVIDIA tool for debugging CUDA applications running on Linux and Mac, providing developers with a mechanism for debugging CUDA applications running on actual hardware. CUDA-GDB is an extension to the x86-64 port of GDB, the GNU Project debugger.">CUDA-GDB</a></li>
            <li><a href="../cuda-memcheck/index.html" title="CUDA-MEMCHECK is a suite of run time tools capable of precisely detecting out of bounds and misaligned memory access errors, checking device allocation leaks, reporting hardware errors and identifying shared memory data access hazards.">CUDA-MEMCHECK</a></li>
            <li><a href="../nsight-eclipse-edition-getting-started-guide/index.html" title="Nsight Eclipse Edition getting started guide">Nsight Eclipse Edition</a></li>
            <li><a href="../profiler-users-guide/index.html" title="This is the guide to the Profiler.">Profiler</a></li>
            <li><a href="../cuda-binary-utilities/index.html" title="The application notes for cuobjdump and nvdisasm.">CUDA Binary Utilities</a></li>
            <li><a href="../floating-point/index.html" title="A number of issues related to floating point accuracy and compliance are a frequent source of confusion on both CPUs and GPUs. The purpose of this white paper is to discuss the most common issues related to NVIDIA GPUs and to supplement the documentation in the CUDA C Programming Guide.">Floating Point and IEEE 754</a></li>
            <li><a href="../incomplete-lu-cholesky/index.html" title="In this white paper we show how to use the CUSPARSE and CUBLAS libraries to achieve a 2x speedup over CPU in the incomplete-LU and Cholesky preconditioned iterative methods. We focus on the Bi-Conjugate Gradient Stabilized and Conjugate Gradient iterative methods, that can be used to solve large sparse nonsymmetric and symmetric positive definite linear systems, respectively. Also, we comment on the parallel sparse triangular solve, which is an essential building block in these algorithms.">Incomplete-LU and Cholesky Preconditioned Iterative Methods</a></li>
            <li><a href="../libnvvm-api/index.html" title="The libNVVM API.">libNVVM API</a></li>
            <li><a href="../libdevice-users-guide/index.html" title="The libdevice library is an LLVM bitcode library that implements common functions for GPU kernels.">libdevice User's Guide</a></li>
            <li><a href="../nvvm-ir-spec/index.html" title="NVVM IR is a compiler IR (internal representation) based on the LLVM IR. The NVVM IR is designed to represent GPU compute kernels (for example, CUDA kernels). High-level language front-ends, like the CUDA C compiler front-end, can generate NVVM IR.">NVVM IR</a></li>
            <li><a href="../cupti/index.html" title="The CUPTI API.">CUPTI</a></li>
            <li><a href="../debugger-api/index.html" title="The CUDA debugger API.">Debugger API</a></li>
            <li><a href="../gpudirect-rdma/index.html" title="A tool for Kepler-class GPUs and CUDA 5.0 enabling a direct path for communication between the GPU and a peer device on the PCI Express bus when the devices share the same upstream root complex using standard features of PCI Express. This document introduces the technology and describes the steps necessary to enable a RDMA for GPUDirect connection to NVIDIA GPUs within the Linux device driver model.">RDMA for GPUDirect</a></li>
         </ul>
         <div class="category"><span class="twiddle">▼</span><a href="index.html" title="CUDA-GDB">CUDA-GDB</a></div>
         <ul>
            <li><a href="#introduction">1.&nbsp;Introduction</a><ul>
                  <li><a href="#what-is-cuda-gdb">1.1.&nbsp;What is CUDA-GDB?</a></li>
                  <li><a href="#supported-features">1.2.&nbsp;Supported Features</a></li>
                  <li><a href="#about-this-document">1.3.&nbsp;About This Document</a></li>
               </ul>
            </li>
            <li><a href="#release-notes">2.&nbsp;Release Notes</a></li>
            <li><a href="#getting-started">3.&nbsp;Getting Started</a><ul>
                  <li><a href="#installation-instructions">3.1.&nbsp;Installation Instructions</a></li>
                  <li><a href="#setting-up-the-debugger-environment">3.2.&nbsp;Setting Up the Debugger Environment</a><ul>
                        <li><a href="#linux">3.2.1.&nbsp;Linux</a></li>
                        <li><a href="#mac-os-x">3.2.2.&nbsp;Mac OS X</a></li>
                        <li><a href="#temporary-directory">3.2.3.&nbsp;Temporary Directory</a></li>
                     </ul>
                  </li>
                  <li><a href="#compiling-application">3.3.&nbsp;Compiling the Application</a><ul>
                        <li><a href="#debug-compilation">3.3.1.&nbsp;Debug Compilation</a></li>
                        <li><a href="#compiling-for-specific-gpus">3.3.2.&nbsp;Compiling For Specific GPU architectures</a></li>
                     </ul>
                  </li>
                  <li><a href="#using-debugger">3.4.&nbsp;Using the Debugger</a><ul>
                        <li><a href="#single-gpu-debugging">3.4.1.&nbsp;Single-GPU Debugging</a></li>
                        <li><a href="#single-gpu-debugging-with-desktop-manager-running">3.4.2.&nbsp;Single-GPU Debugging with the Desktop Manager Running</a></li>
                        <li><a href="#multi-gpu-debugging">3.4.3.&nbsp;Multi-GPU Debugging</a></li>
                        <li><a href="#multi-gpu-debugging-in-console-mode">3.4.4.&nbsp;Multi-GPU Debugging in Console Mode</a></li>
                        <li><a href="#multi-gpu-debugging-with-desktop-manager-running">3.4.5.&nbsp;Multi-GPU Debugging with the Desktop Manager Running</a></li>
                        <li><a href="#remote-debugging">3.4.6.&nbsp;Remote Debugging</a></li>
                        <li><a href="#multiple-debuggers">3.4.7.&nbsp;Multiple Debuggers</a></li>
                        <li><a href="#attaching">3.4.8.&nbsp;Attaching/Detaching</a></li>
                        <li><a href="#cuda-opengl-interop-applications-on-linux">3.4.9.&nbsp;CUDA/OpenGL Interop Applications on Linux</a></li>
                     </ul>
                  </li>
               </ul>
            </li>
            <li><a href="#cuda-gdb-extensions">4.&nbsp;CUDA-GDB Extensions</a><ul>
                  <li><a href="#command-naming-convention">4.1.&nbsp;Command Naming Convention</a></li>
                  <li><a href="#getting-help">4.2.&nbsp;Getting Help</a></li>
                  <li><a href="#initialization-file">4.3.&nbsp;Initialization File</a></li>
                  <li><a href="#gui-integration">4.4.&nbsp;GUI Integration</a></li>
               </ul>
            </li>
            <li><a href="#kernel-focus">5.&nbsp;Kernel Focus</a><ul>
                  <li><a href="#software-coordinates-vs-hardware-coordinates">5.1.&nbsp;Software Coordinates vs. Hardware Coordinates</a></li>
                  <li><a href="#current-focus">5.2.&nbsp;Current Focus</a></li>
                  <li><a href="#switching-focus">5.3.&nbsp;Switching Focus</a></li>
               </ul>
            </li>
            <li><a href="#program-execution">6.&nbsp;Program Execution</a><ul>
                  <li><a href="#interrupting-application">6.1.&nbsp;Interrupting the Application</a></li>
                  <li><a href="#single-stepping">6.2.&nbsp;Single Stepping</a></li>
               </ul>
            </li>
            <li><a href="#breakpoints">7.&nbsp;Breakpoints &amp; Watchpoints</a><ul>
                  <li><a href="#symbolic-breakpoints">7.1.&nbsp;Symbolic Breakpoints</a></li>
                  <li><a href="#line-breakpoints">7.2.&nbsp;Line Breakpoints</a></li>
                  <li><a href="#unique_1087814378">7.3.&nbsp;Address Breakpoints</a></li>
                  <li><a href="#kernel-entry-breakpoints">7.4.&nbsp;Kernel Entry Breakpoints</a></li>
                  <li><a href="#conditional-breakpoints">7.5.&nbsp;Conditional Breakpoints</a></li>
                  <li><a href="#watchpoints">7.6.&nbsp;Watchpoints</a></li>
               </ul>
            </li>
            <li><a href="#inspecting-program-state">8.&nbsp;Inspecting Program State</a><ul>
                  <li><a href="#memory-and-variables">8.1.&nbsp;Memory and Variables</a></li>
                  <li><a href="#variable-storage-and-accessibility">8.2.&nbsp;Variable Storage and Accessibility</a></li>
                  <li><a href="#inspecting-textures">8.3.&nbsp;Inspecting Textures</a></li>
                  <li><a href="#info-cuda-commands">8.4.&nbsp;Info CUDA Commands</a><ul>
                        <li><a href="#info-cuda-devices">8.4.1.&nbsp;info cuda devices</a></li>
                        <li><a href="#info-cuda-sms">8.4.2.&nbsp;info cuda sms</a></li>
                        <li><a href="#info-cuda-warps">8.4.3.&nbsp;info cuda warps</a></li>
                        <li><a href="#info-cuda-lanes">8.4.4.&nbsp;info cuda lanes</a></li>
                        <li><a href="#info-cuda-kernels">8.4.5.&nbsp;info cuda kernels</a></li>
                        <li><a href="#info-cuda-blocks">8.4.6.&nbsp;info cuda blocks</a></li>
                        <li><a href="#info-cuda-threads">8.4.7.&nbsp;info cuda threads</a></li>
                        <li><a href="#info-cuda-launch-trace">8.4.8.&nbsp;info cuda launch trace </a></li>
                        <li><a href="#info-cuda-launch-children">8.4.9.&nbsp;info cuda launch children</a></li>
                        <li><a href="#info-cuda-contexts">8.4.10.&nbsp;info cuda contexts</a></li>
                     </ul>
                  </li>
                  <li><a href="#disassembly">8.5.&nbsp;Disassembly</a></li>
               </ul>
            </li>
            <li><a href="#event-notifications">9.&nbsp;Event Notifications</a><ul>
                  <li><a href="#context-events">9.1.&nbsp;Context Events</a></li>
                  <li><a href="#kernel-events">9.2.&nbsp;Kernel Events</a></li>
               </ul>
            </li>
            <li><a href="#checking-memory-errors">10.&nbsp;Checking Memory Errors</a><ul>
                  <li><a href="#increasing-precision-of-memory-errors-with-autostep">10.1.&nbsp;Increasing the Precision of Memory Errors With Autostep </a><ul>
                        <li><a href="#usage">10.1.1.&nbsp;Usage</a></li>
                        <li><a href="#related-commands">10.1.2.&nbsp;Related Commands</a><ul>
                              <li><a href="#info-autosteps">10.1.2.1.&nbsp;info autosteps</a></li>
                              <li><a href="#disable-autosteps-n">10.1.2.2.&nbsp;disable autosteps n</a></li>
                              <li><a href="#delete-autosteps-n">10.1.2.3.&nbsp;delete autosteps n</a></li>
                              <li><a href="#ignore-n-i">10.1.2.4.&nbsp;ignore n i</a></li>
                           </ul>
                        </li>
                     </ul>
                  </li>
                  <li><a href="#gpu-error-reporting">10.2.&nbsp;GPU Error Reporting</a></li>
               </ul>
            </li>
            <li><a href="#checking-api-errors">11.&nbsp;Checking API Errors</a></li>
            <li><a href="#walk-through-examples">12.&nbsp;Walk-Through Examples</a><ul>
                  <li><a href="#example-1-bitreverse">12.1.&nbsp;Example 1: bitreverse</a><ul>
                        <li><a href="#walking-through-code">12.1.1.&nbsp;Walking through the Code</a></li>
                     </ul>
                  </li>
                  <li><a href="#example-2-autostep">12.2.&nbsp;Example 2: autostep</a><ul>
                        <li><a href="#debugging-with-autosteps">12.2.1.&nbsp;Debugging with Autosteps</a></li>
                     </ul>
                  </li>
                  <li><a href="#debugging-with-mpi">12.3.&nbsp;Example 3: Debugging an MPI CUDA Application</a></li>
               </ul>
            </li>
            <li><a href="#advanced-settings">13.&nbsp;Advanced Settings</a><ul>
                  <li><a href="#set-cuda-notify">13.1.&nbsp;set cuda notify</a></li>
                  <li><a href="#lock-file">13.2.&nbsp;Lock File</a></li>
               </ul>
            </li>
            <li><a href="#supported-platforms">A.&nbsp;Supported Platforms</a><ul>
                  <li><a href="#host-platform-requirements">A.1.&nbsp;Host Platform Requirements</a></li>
               </ul>
            </li>
            <li><a href="#known-issues">B.&nbsp;Known Issues</a></li>
         </ul>
      </nav>
      <nav id="search-results">
         <h2>Search Results</h2>
         <ol></ol>
      </nav>
      <script language="JavaScript" type="text/javascript" charset="utf-8" src="../common/formatting/common.min.js"></script>
      <script language="JavaScript" type="text/javascript" charset="utf-8" src="../common/scripts/omniture/s_code_us_dev_aut1-nolinktrackin.js"></script>
      <script language="JavaScript" type="text/javascript" charset="utf-8" src="../common/scripts/omniture/omniture.js"></script>
      <noscript><a href="http://www.omniture.com" title="Web Analytics"><img src="http://omniture.nvidia.com/b/ss/nvidiacudadocs/1/H.17--NS/0" height="1" width="1" border="0" alt=""></img></a></noscript>
      <script language="JavaScript" type="text/javascript" charset="utf-8" src="../common/scripts/google-analytics/google-analytics-write.js"></script>
      <script language="JavaScript" type="text/javascript" charset="utf-8" src="../common/scripts/google-analytics/google-analytics-tracker.js"></script>
      </body>
</html>