Sophie: nvidia-cuda-toolkit-devel-10.1.168-1.2.mga7.nonfree x86

nvidia-cuda-toolkit-devel-10.1.168-1.2.mga7.nonfree.x86_64.rpm

.TH "Occupancy" 3 "24 Apr 2019" "Version 6.0" "Doxygen" \" -*- nroff -*-
.ad l
.nh
.SH NAME
Occupancy \- 
.SS "Functions"

.in +1c
.ti -1c
.RI "\fBCUresult\fP \fBcuOccupancyMaxActiveBlocksPerMultiprocessor\fP (int *numBlocks, \fBCUfunction\fP func, int blockSize, size_t dynamicSMemSize)"
.br
.RI "\fIReturns occupancy of a function. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags\fP (int *numBlocks, \fBCUfunction\fP func, int blockSize, size_t dynamicSMemSize, unsigned int flags)"
.br
.RI "\fIReturns occupancy of a function. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuOccupancyMaxPotentialBlockSize\fP (int *minGridSize, int *blockSize, \fBCUfunction\fP func, \fBCUoccupancyB2DSize\fP blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int blockSizeLimit)"
.br
.RI "\fISuggest a launch configuration with reasonable occupancy. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuOccupancyMaxPotentialBlockSizeWithFlags\fP (int *minGridSize, int *blockSize, \fBCUfunction\fP func, \fBCUoccupancyB2DSize\fP blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int blockSizeLimit, unsigned int flags)"
.br
.RI "\fISuggest a launch configuration with reasonable occupancy. \fP"
.in -1c
.SH "Detailed Description"
.PP 
\\brief occupancy calculation functions of the low-level CUDA driver API (\fBcuda.h\fP)
.PP
This section describes the occupancy calculation functions of the low-level CUDA driver application programming interface. 
.SH "Function Documentation"
.PP 
.SS "\fBCUresult\fP cuOccupancyMaxActiveBlocksPerMultiprocessor (int * numBlocks, \fBCUfunction\fP func, int blockSize, size_t dynamicSMemSize)"
.PP
Returns in \fC*numBlocks\fP the number of the maximum active blocks per streaming multiprocessor.
.PP
\fBParameters:\fP
.RS 4
\fInumBlocks\fP - Returned occupancy 
.br
\fIfunc\fP - Kernel for which occupancy is calculated 
.br
\fIblockSize\fP - Block size the kernel is intended to be launched with 
.br
\fIdynamicSMemSize\fP - Per-block dynamic shared memory usage intended, in bytes
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP, \fBCUDA_ERROR_INVALID_VALUE\fP, \fBCUDA_ERROR_UNKNOWN\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
cudaOccupancyMaxActiveBlocksPerMultiprocessor 
.RE
.PP

.SS "\fBCUresult\fP cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags (int * numBlocks, \fBCUfunction\fP func, int blockSize, size_t dynamicSMemSize, unsigned int flags)"
.PP
Returns in \fC*numBlocks\fP the number of the maximum active blocks per streaming multiprocessor.
.PP
The \fCFlags\fP parameter controls how special cases are handled. The valid flags are:
.PP
.IP "\(bu" 2
\fBCU_OCCUPANCY_DEFAULT\fP, which maintains the default behavior as \fBcuOccupancyMaxActiveBlocksPerMultiprocessor\fP;
.PP
.PP
.IP "\(bu" 2
\fBCU_OCCUPANCY_DISABLE_CACHING_OVERRIDE\fP, which suppresses the default behavior on platform where global caching affects occupancy. On such platforms, if caching is enabled, but per-block SM resource usage would result in zero occupancy, the occupancy calculator will calculate the occupancy as if caching is disabled. Setting \fBCU_OCCUPANCY_DISABLE_CACHING_OVERRIDE\fP makes the occupancy calculator to return 0 in such cases. More information can be found about this feature in the 'Unified L1/Texture Cache' section of the Maxwell tuning guide.
.PP
.PP
\fBParameters:\fP
.RS 4
\fInumBlocks\fP - Returned occupancy 
.br
\fIfunc\fP - Kernel for which occupancy is calculated 
.br
\fIblockSize\fP - Block size the kernel is intended to be launched with 
.br
\fIdynamicSMemSize\fP - Per-block dynamic shared memory usage intended, in bytes 
.br
\fIflags\fP - Requested behavior for the occupancy calculator
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP, \fBCUDA_ERROR_INVALID_VALUE\fP, \fBCUDA_ERROR_UNKNOWN\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags 
.RE
.PP

.SS "\fBCUresult\fP cuOccupancyMaxPotentialBlockSize (int * minGridSize, int * blockSize, \fBCUfunction\fP func, \fBCUoccupancyB2DSize\fP blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int blockSizeLimit)"
.PP
Returns in \fC*blockSize\fP a reasonable block size that can achieve the maximum occupancy (or, the maximum number of active warps with the fewest blocks per multiprocessor), and in \fC*minGridSize\fP the minimum grid size to achieve the maximum occupancy.
.PP
If \fCblockSizeLimit\fP is 0, the configurator will use the maximum block size permitted by the device / function instead.
.PP
If per-block dynamic shared memory allocation is not needed, the user should leave both \fCblockSizeToDynamicSMemSize\fP and \fCdynamicSMemSize\fP as 0.
.PP
If per-block dynamic shared memory allocation is needed, then if the dynamic shared memory size is constant regardless of block size, the size should be passed through \fCdynamicSMemSize\fP, and \fCblockSizeToDynamicSMemSize\fP should be NULL.
.PP
Otherwise, if the per-block dynamic shared memory size varies with different block sizes, the user needs to provide a unary function through \fCblockSizeToDynamicSMemSize\fP that computes the dynamic shared memory needed by \fCfunc\fP for any given block size. \fCdynamicSMemSize\fP is ignored. An example signature is:
.PP
.PP
.nf
    // Take block size, returns dynamic shared memory needed
    size_t blockToSmem(int blockSize);
.fi
.PP
.PP
\fBParameters:\fP
.RS 4
\fIminGridSize\fP - Returned minimum grid size needed to achieve the maximum occupancy 
.br
\fIblockSize\fP - Returned maximum block size that can achieve the maximum occupancy 
.br
\fIfunc\fP - Kernel for which launch configuration is calculated 
.br
\fIblockSizeToDynamicSMemSize\fP - A function that calculates how much per-block dynamic shared memory \fCfunc\fP uses based on the block size 
.br
\fIdynamicSMemSize\fP - Dynamic shared memory usage intended, in bytes 
.br
\fIblockSizeLimit\fP - The maximum block size \fCfunc\fP is designed to handle
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP, \fBCUDA_ERROR_INVALID_VALUE\fP, \fBCUDA_ERROR_UNKNOWN\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
cudaOccupancyMaxPotentialBlockSize 
.RE
.PP

.SS "\fBCUresult\fP cuOccupancyMaxPotentialBlockSizeWithFlags (int * minGridSize, int * blockSize, \fBCUfunction\fP func, \fBCUoccupancyB2DSize\fP blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int blockSizeLimit, unsigned int flags)"
.PP
An extended version of \fBcuOccupancyMaxPotentialBlockSize\fP. In addition to arguments passed to \fBcuOccupancyMaxPotentialBlockSize\fP, \fBcuOccupancyMaxPotentialBlockSizeWithFlags\fP also takes a \fCFlags\fP parameter.
.PP
The \fCFlags\fP parameter controls how special cases are handled. The valid flags are:
.PP
.IP "\(bu" 2
\fBCU_OCCUPANCY_DEFAULT\fP, which maintains the default behavior as \fBcuOccupancyMaxPotentialBlockSize\fP;
.PP
.PP
.IP "\(bu" 2
\fBCU_OCCUPANCY_DISABLE_CACHING_OVERRIDE\fP, which suppresses the default behavior on platform where global caching affects occupancy. On such platforms, the launch configurations that produces maximal occupancy might not support global caching. Setting \fBCU_OCCUPANCY_DISABLE_CACHING_OVERRIDE\fP guarantees that the the produced launch configuration is global caching compatible at a potential cost of occupancy. More information can be found about this feature in the 'Unified L1/Texture Cache' section of the Maxwell tuning guide.
.PP
.PP
\fBParameters:\fP
.RS 4
\fIminGridSize\fP - Returned minimum grid size needed to achieve the maximum occupancy 
.br
\fIblockSize\fP - Returned maximum block size that can achieve the maximum occupancy 
.br
\fIfunc\fP - Kernel for which launch configuration is calculated 
.br
\fIblockSizeToDynamicSMemSize\fP - A function that calculates how much per-block dynamic shared memory \fCfunc\fP uses based on the block size 
.br
\fIdynamicSMemSize\fP - Dynamic shared memory usage intended, in bytes 
.br
\fIblockSizeLimit\fP - The maximum block size \fCfunc\fP is designed to handle 
.br
\fIflags\fP - Options
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP, \fBCUDA_ERROR_INVALID_VALUE\fP, \fBCUDA_ERROR_UNKNOWN\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
cudaOccupancyMaxPotentialBlockSizeWithFlags 
.RE
.PP

.SH "Author"
.PP 
Generated automatically by Doxygen from the source code.