Sophie: nvidia-cuda-toolkit-devel-6.5.14-6.mga5.nonfree x86

nvidia-cuda-toolkit-devel-6.5.14-6.mga5.nonfree.x86_64.rpm

.TH "Context Management" 3 "7 Aug 2014" "Version 6.0" "Doxygen" \" -*- nroff -*-
.ad l
.nh
.SH NAME
Context Management \- 
.SS "Functions"

.in +1c
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxCreate\fP (\fBCUcontext\fP *pctx, unsigned int flags, \fBCUdevice\fP dev)"
.br
.RI "\fICreate a CUDA context. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxDestroy\fP (\fBCUcontext\fP ctx)"
.br
.RI "\fIDestroy a CUDA context. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxGetApiVersion\fP (\fBCUcontext\fP ctx, unsigned int *version)"
.br
.RI "\fIGets the context's API version. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxGetCacheConfig\fP (\fBCUfunc_cache\fP *pconfig)"
.br
.RI "\fIReturns the preferred cache configuration for the current context. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxGetCurrent\fP (\fBCUcontext\fP *pctx)"
.br
.RI "\fIReturns the CUDA context bound to the calling CPU thread. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxGetDevice\fP (\fBCUdevice\fP *device)"
.br
.RI "\fIReturns the device ID for the current context. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxGetLimit\fP (size_t *pvalue, \fBCUlimit\fP limit)"
.br
.RI "\fIReturns resource limits. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxGetSharedMemConfig\fP (\fBCUsharedconfig\fP *pConfig)"
.br
.RI "\fIReturns the current shared memory configuration for the current context. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxGetStreamPriorityRange\fP (int *leastPriority, int *greatestPriority)"
.br
.RI "\fIReturns numerical values that correspond to the least and greatest stream priorities. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxPopCurrent\fP (\fBCUcontext\fP *pctx)"
.br
.RI "\fIPops the current CUDA context from the current CPU thread. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxPushCurrent\fP (\fBCUcontext\fP ctx)"
.br
.RI "\fIPushes a context on the current CPU thread. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxSetCacheConfig\fP (\fBCUfunc_cache\fP config)"
.br
.RI "\fISets the preferred cache configuration for the current context. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxSetCurrent\fP (\fBCUcontext\fP ctx)"
.br
.RI "\fIBinds the specified CUDA context to the calling CPU thread. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxSetLimit\fP (\fBCUlimit\fP limit, size_t value)"
.br
.RI "\fISet resource limits. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxSetSharedMemConfig\fP (\fBCUsharedconfig\fP config)"
.br
.RI "\fISets the shared memory configuration for the current context. \fP"
.ti -1c
.RI "\fBCUresult\fP \fBcuCtxSynchronize\fP (void)"
.br
.RI "\fIBlock for a context's tasks to complete. \fP"
.in -1c
.SH "Detailed Description"
.PP 
\\brief context management functions of the low-level CUDA driver API (\fBcuda.h\fP)
.PP
This section describes the context management functions of the low-level CUDA driver application programming interface. 
.SH "Function Documentation"
.PP 
.SS "\fBCUresult\fP cuCtxCreate (\fBCUcontext\fP * pctx, unsigned int flags, \fBCUdevice\fP dev)"
.PP
Creates a new CUDA context and associates it with the calling thread. The \fCflags\fP parameter is described below. The context is created with a usage count of 1 and the caller of \fBcuCtxCreate()\fP must call \fBcuCtxDestroy()\fP or when done using the context. If a context is already current to the thread, it is supplanted by the newly created context and may be restored by a subsequent call to \fBcuCtxPopCurrent()\fP.
.PP
The three LSBs of the \fCflags\fP parameter can be used to control how the OS thread, which owns the CUDA context at the time of an API call, interacts with the OS scheduler when waiting for results from the GPU. Only one of the scheduling flags can be set when creating a context.
.PP
.IP "\(bu" 2
\fBCU_CTX_SCHED_AUTO\fP: The default value if the \fCflags\fP parameter is zero, uses a heuristic based on the number of active CUDA contexts in the process \fIC\fP and the number of logical processors in the system \fIP\fP. If \fIC\fP > \fIP\fP, then CUDA will yield to other OS threads when waiting for the GPU, otherwise CUDA will not yield while waiting for results and actively spin on the processor.
.PP
.PP
.IP "\(bu" 2
\fBCU_CTX_SCHED_SPIN\fP: Instruct CUDA to actively spin when waiting for results from the GPU. This can decrease latency when waiting for the GPU, but may lower the performance of CPU threads if they are performing work in parallel with the CUDA thread.
.PP
.PP
.IP "\(bu" 2
\fBCU_CTX_SCHED_YIELD\fP: Instruct CUDA to yield its thread when waiting for results from the GPU. This can increase latency when waiting for the GPU, but can increase the performance of CPU threads performing work in parallel with the GPU.
.PP
.PP
.IP "\(bu" 2
\fBCU_CTX_SCHED_BLOCKING_SYNC\fP: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the GPU to finish work.
.PP
.PP
.IP "\(bu" 2
\fBCU_CTX_BLOCKING_SYNC\fP: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the GPU to finish work. 
.br
 \fBDeprecated:\fP This flag was deprecated as of CUDA 4.0 and was replaced with \fBCU_CTX_SCHED_BLOCKING_SYNC\fP.
.PP
.PP
.IP "\(bu" 2
\fBCU_CTX_MAP_HOST\fP: Instruct CUDA to support mapped pinned allocations. This flag must be set in order to allocate pinned host memory that is accessible to the GPU.
.PP
.PP
.IP "\(bu" 2
\fBCU_CTX_LMEM_RESIZE_TO_MAX\fP: Instruct CUDA to not reduce local memory after resizing local memory for a kernel. This can prevent thrashing by local memory allocations when launching many kernels with high local memory usage at the cost of potentially increased memory usage.
.PP
.PP
Context creation will fail with \fBCUDA_ERROR_UNKNOWN\fP if the compute mode of the device is \fBCU_COMPUTEMODE_PROHIBITED\fP. Similarly, context creation will also fail with \fBCUDA_ERROR_UNKNOWN\fP if the compute mode for the device is set to \fBCU_COMPUTEMODE_EXCLUSIVE\fP and there is already an active context on the device. The function \fBcuDeviceGetAttribute()\fP can be used with \fBCU_DEVICE_ATTRIBUTE_COMPUTE_MODE\fP to determine the compute mode of the device. The \fInvidia-smi\fP tool can be used to set the compute mode for devices. Documentation for \fInvidia-smi\fP can be obtained by passing a -h option to it.
.PP
\fBParameters:\fP
.RS 4
\fIpctx\fP - Returned context handle of the new context 
.br
\fIflags\fP - Context creation flags 
.br
\fIdev\fP - Device to create context on
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP, \fBCUDA_ERROR_INVALID_DEVICE\fP, \fBCUDA_ERROR_INVALID_VALUE\fP, \fBCUDA_ERROR_OUT_OF_MEMORY\fP, \fBCUDA_ERROR_UNKNOWN\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuCtxDestroy\fP, \fBcuCtxGetApiVersion\fP, \fBcuCtxGetCacheConfig\fP, \fBcuCtxGetDevice\fP, \fBcuCtxGetLimit\fP, \fBcuCtxPopCurrent\fP, \fBcuCtxPushCurrent\fP, \fBcuCtxSetCacheConfig\fP, \fBcuCtxSetLimit\fP, \fBcuCtxSynchronize\fP 
.RE
.PP

.SS "\fBCUresult\fP cuCtxDestroy (\fBCUcontext\fP ctx)"
.PP
Destroys the CUDA context specified by \fCctx\fP. The context \fCctx\fP will be destroyed regardless of how many threads it is current to. It is the responsibility of the calling function to ensure that no API call issues using \fCctx\fP while \fBcuCtxDestroy()\fP is executing.
.PP
If \fCctx\fP is current to the calling thread then \fCctx\fP will also be popped from the current thread's context stack (as though \fBcuCtxPopCurrent()\fP were called). If \fCctx\fP is current to other threads, then \fCctx\fP will remain current to those threads, and attempting to access \fCctx\fP from those threads will result in the error \fBCUDA_ERROR_CONTEXT_IS_DESTROYED\fP.
.PP
\fBParameters:\fP
.RS 4
\fIctx\fP - Context to destroy
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP, \fBCUDA_ERROR_INVALID_VALUE\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuCtxCreate\fP, \fBcuCtxGetApiVersion\fP, \fBcuCtxGetCacheConfig\fP, \fBcuCtxGetDevice\fP, \fBcuCtxGetLimit\fP, \fBcuCtxPopCurrent\fP, \fBcuCtxPushCurrent\fP, \fBcuCtxSetCacheConfig\fP, \fBcuCtxSetLimit\fP, \fBcuCtxSynchronize\fP 
.RE
.PP

.SS "\fBCUresult\fP cuCtxGetApiVersion (\fBCUcontext\fP ctx, unsigned int * version)"
.PP
Returns a version number in \fCversion\fP corresponding to the capabilities of the context (e.g. 3010 or 3020), which library developers can use to direct callers to a specific API version. If \fCctx\fP is NULL, returns the API version used to create the currently bound context.
.PP
Note that new API versions are only introduced when context capabilities are changed that break binary compatibility, so the API version and driver version may be different. For example, it is valid for the API version to be 3020 while the driver version is 4020.
.PP
\fBParameters:\fP
.RS 4
\fIctx\fP - Context to check 
.br
\fIversion\fP - Pointer to version
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP, \fBCUDA_ERROR_UNKNOWN\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuCtxCreate\fP, \fBcuCtxDestroy\fP, \fBcuCtxGetDevice\fP, \fBcuCtxGetLimit\fP, \fBcuCtxPopCurrent\fP, \fBcuCtxPushCurrent\fP, \fBcuCtxSetCacheConfig\fP, \fBcuCtxSetLimit\fP, \fBcuCtxSynchronize\fP 
.RE
.PP

.SS "\fBCUresult\fP cuCtxGetCacheConfig (\fBCUfunc_cache\fP * pconfig)"
.PP
On devices where the L1 cache and shared memory use the same hardware resources, this function returns through \fCpconfig\fP the preferred cache configuration for the current context. This is only a preference. The driver will use the requested configuration if possible, but it is free to choose a different configuration if required to execute functions.
.PP
This will return a \fCpconfig\fP of \fBCU_FUNC_CACHE_PREFER_NONE\fP on devices where the size of the L1 cache and shared memory are fixed.
.PP
The supported cache configurations are:
.IP "\(bu" 2
\fBCU_FUNC_CACHE_PREFER_NONE\fP: no preference for shared memory or L1 (default)
.IP "\(bu" 2
\fBCU_FUNC_CACHE_PREFER_SHARED\fP: prefer larger shared memory and smaller L1 cache
.IP "\(bu" 2
\fBCU_FUNC_CACHE_PREFER_L1\fP: prefer larger L1 cache and smaller shared memory
.IP "\(bu" 2
\fBCU_FUNC_CACHE_PREFER_EQUAL\fP: prefer equal sized L1 cache and shared memory
.PP
.PP
\fBParameters:\fP
.RS 4
\fIpconfig\fP - Returned cache configuration
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP, \fBCUDA_ERROR_INVALID_VALUE\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuCtxCreate\fP, \fBcuCtxDestroy\fP, \fBcuCtxGetApiVersion\fP, \fBcuCtxGetDevice\fP, \fBcuCtxGetLimit\fP, \fBcuCtxPopCurrent\fP, \fBcuCtxPushCurrent\fP, \fBcuCtxSetCacheConfig\fP, \fBcuCtxSetLimit\fP, \fBcuCtxSynchronize\fP, \fBcuFuncSetCacheConfig\fP 
.RE
.PP

.SS "\fBCUresult\fP cuCtxGetCurrent (\fBCUcontext\fP * pctx)"
.PP
Returns in \fC*pctx\fP the CUDA context bound to the calling CPU thread. If no context is bound to the calling CPU thread then \fC*pctx\fP is set to NULL and \fBCUDA_SUCCESS\fP is returned.
.PP
\fBParameters:\fP
.RS 4
\fIpctx\fP - Returned context handle
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuCtxSetCurrent\fP, \fBcuCtxCreate\fP, \fBcuCtxDestroy\fP 
.RE
.PP

.SS "\fBCUresult\fP cuCtxGetDevice (\fBCUdevice\fP * device)"
.PP
Returns in \fC*device\fP the ordinal of the current context's device.
.PP
\fBParameters:\fP
.RS 4
\fIdevice\fP - Returned device ID for the current context
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP, \fBCUDA_ERROR_INVALID_VALUE\fP, 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuCtxCreate\fP, \fBcuCtxDestroy\fP, \fBcuCtxGetApiVersion\fP, \fBcuCtxGetCacheConfig\fP, \fBcuCtxGetLimit\fP, \fBcuCtxPopCurrent\fP, \fBcuCtxPushCurrent\fP, \fBcuCtxSetCacheConfig\fP, \fBcuCtxSetLimit\fP, \fBcuCtxSynchronize\fP 
.RE
.PP

.SS "\fBCUresult\fP cuCtxGetLimit (size_t * pvalue, \fBCUlimit\fP limit)"
.PP
Returns in \fC*pvalue\fP the current size of \fClimit\fP. The supported \fBCUlimit\fP values are:
.IP "\(bu" 2
\fBCU_LIMIT_STACK_SIZE\fP: stack size in bytes of each GPU thread.
.IP "\(bu" 2
\fBCU_LIMIT_PRINTF_FIFO_SIZE\fP: size in bytes of the FIFO used by the printf() device system call.
.IP "\(bu" 2
\fBCU_LIMIT_MALLOC_HEAP_SIZE\fP: size in bytes of the heap used by the malloc() and free() device system calls.
.IP "\(bu" 2
\fBCU_LIMIT_DEV_RUNTIME_SYNC_DEPTH\fP: maximum grid depth at which a thread can issue the device runtime call cudaDeviceSynchronize() to wait on child grid launches to complete.
.IP "\(bu" 2
\fBCU_LIMIT_DEV_RUNTIME_PENDING_LAUNCH_COUNT\fP: maximum number of outstanding device runtime launches that can be made from this context.
.PP
.PP
\fBParameters:\fP
.RS 4
\fIlimit\fP - Limit to query 
.br
\fIpvalue\fP - Returned size of limit
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_INVALID_VALUE\fP, \fBCUDA_ERROR_UNSUPPORTED_LIMIT\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuCtxCreate\fP, \fBcuCtxDestroy\fP, \fBcuCtxGetApiVersion\fP, \fBcuCtxGetCacheConfig\fP, \fBcuCtxGetDevice\fP, \fBcuCtxPopCurrent\fP, \fBcuCtxPushCurrent\fP, \fBcuCtxSetCacheConfig\fP, \fBcuCtxSetLimit\fP, \fBcuCtxSynchronize\fP 
.RE
.PP

.SS "\fBCUresult\fP cuCtxGetSharedMemConfig (\fBCUsharedconfig\fP * pConfig)"
.PP
This function will return in \fCpConfig\fP the current size of shared memory banks in the current context. On devices with configurable shared memory banks, \fBcuCtxSetSharedMemConfig\fP can be used to change this setting, so that all subsequent kernel launches will by default use the new bank size. When \fBcuCtxGetSharedMemConfig\fP is called on devices without configurable shared memory, it will return the fixed bank size of the hardware.
.PP
The returned bank configurations can be either:
.IP "\(bu" 2
\fBCU_SHARED_MEM_CONFIG_FOUR_BYTE_BANK_SIZE\fP: shared memory bank width is four bytes.
.IP "\(bu" 2
\fBCU_SHARED_MEM_CONFIG_EIGHT_BYTE_BANK_SIZE\fP: shared memory bank width will eight bytes.
.PP
.PP
\fBParameters:\fP
.RS 4
\fIpConfig\fP - returned shared memory configuration 
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP, \fBCUDA_ERROR_INVALID_VALUE\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuCtxCreate\fP, \fBcuCtxDestroy\fP, \fBcuCtxGetApiVersion\fP, \fBcuCtxGetCacheConfig\fP, \fBcuCtxGetDevice\fP, \fBcuCtxGetLimit\fP, \fBcuCtxPopCurrent\fP, \fBcuCtxPushCurrent\fP, \fBcuCtxSetLimit\fP, \fBcuCtxSynchronize\fP, \fBcuCtxGetSharedMemConfig\fP, \fBcuFuncSetCacheConfig\fP, 
.RE
.PP

.SS "\fBCUresult\fP cuCtxGetStreamPriorityRange (int * leastPriority, int * greatestPriority)"
.PP
Returns in \fC*leastPriority\fP and \fC*greatestPriority\fP the numerical values that correspond to the least and greatest stream priorities respectively. Stream priorities follow a convention where lower numbers imply greater priorities. The range of meaningful stream priorities is given by [\fC*greatestPriority\fP, \fC*leastPriority\fP]. If the user attempts to create a stream with a priority value that is outside the meaningful range as specified by this API, the priority is automatically clamped down or up to either \fC*leastPriority\fP or \fC*greatestPriority\fP respectively. See \fBcuStreamCreateWithPriority\fP for details on creating a priority stream. A NULL may be passed in for \fC*leastPriority\fP or \fC*greatestPriority\fP if the value is not desired.
.PP
This function will return '0' in both \fC*leastPriority\fP and \fC*greatestPriority\fP if the current context's device does not support stream priorities (see \fBcuDeviceGetAttribute\fP).
.PP
\fBParameters:\fP
.RS 4
\fIleastPriority\fP - Pointer to an int in which the numerical value for least stream priority is returned 
.br
\fIgreatestPriority\fP - Pointer to an int in which the numerical value for greatest stream priority is returned
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_INVALID_VALUE\fP, 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuStreamCreateWithPriority\fP, \fBcuStreamGetPriority\fP, \fBcuCtxGetDevice\fP, \fBcuCtxSetLimit\fP, \fBcuCtxSynchronize\fP 
.RE
.PP

.SS "\fBCUresult\fP cuCtxPopCurrent (\fBCUcontext\fP * pctx)"
.PP
Pops the current CUDA context from the CPU thread and passes back the old context handle in \fC*pctx\fP. That context may then be made current to a different CPU thread by calling \fBcuCtxPushCurrent()\fP.
.PP
If a context was current to the CPU thread before \fBcuCtxCreate()\fP or \fBcuCtxPushCurrent()\fP was called, this function makes that context current to the CPU thread again.
.PP
\fBParameters:\fP
.RS 4
\fIpctx\fP - Returned new context handle
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuCtxCreate\fP, \fBcuCtxDestroy\fP, \fBcuCtxGetApiVersion\fP, \fBcuCtxGetCacheConfig\fP, \fBcuCtxGetDevice\fP, \fBcuCtxGetLimit\fP, \fBcuCtxPushCurrent\fP, \fBcuCtxSetCacheConfig\fP, \fBcuCtxSetLimit\fP, \fBcuCtxSynchronize\fP 
.RE
.PP

.SS "\fBCUresult\fP cuCtxPushCurrent (\fBCUcontext\fP ctx)"
.PP
Pushes the given context \fCctx\fP onto the CPU thread's stack of current contexts. The specified context becomes the CPU thread's current context, so all CUDA functions that operate on the current context are affected.
.PP
The previous current context may be made current again by calling \fBcuCtxDestroy()\fP or \fBcuCtxPopCurrent()\fP.
.PP
\fBParameters:\fP
.RS 4
\fIctx\fP - Context to push
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP, \fBCUDA_ERROR_INVALID_VALUE\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuCtxCreate\fP, \fBcuCtxDestroy\fP, \fBcuCtxGetApiVersion\fP, \fBcuCtxGetCacheConfig\fP, \fBcuCtxGetDevice\fP, \fBcuCtxGetLimit\fP, \fBcuCtxPopCurrent\fP, \fBcuCtxSetCacheConfig\fP, \fBcuCtxSetLimit\fP, \fBcuCtxSynchronize\fP 
.RE
.PP

.SS "\fBCUresult\fP cuCtxSetCacheConfig (\fBCUfunc_cache\fP config)"
.PP
On devices where the L1 cache and shared memory use the same hardware resources, this sets through \fCconfig\fP the preferred cache configuration for the current context. This is only a preference. The driver will use the requested configuration if possible, but it is free to choose a different configuration if required to execute the function. Any function preference set via \fBcuFuncSetCacheConfig()\fP will be preferred over this context-wide setting. Setting the context-wide cache configuration to \fBCU_FUNC_CACHE_PREFER_NONE\fP will cause subsequent kernel launches to prefer to not change the cache configuration unless required to launch the kernel.
.PP
This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.
.PP
Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point.
.PP
The supported cache configurations are:
.IP "\(bu" 2
\fBCU_FUNC_CACHE_PREFER_NONE\fP: no preference for shared memory or L1 (default)
.IP "\(bu" 2
\fBCU_FUNC_CACHE_PREFER_SHARED\fP: prefer larger shared memory and smaller L1 cache
.IP "\(bu" 2
\fBCU_FUNC_CACHE_PREFER_L1\fP: prefer larger L1 cache and smaller shared memory
.IP "\(bu" 2
\fBCU_FUNC_CACHE_PREFER_EQUAL\fP: prefer equal sized L1 cache and shared memory
.PP
.PP
\fBParameters:\fP
.RS 4
\fIconfig\fP - Requested cache configuration
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP, \fBCUDA_ERROR_INVALID_VALUE\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuCtxCreate\fP, \fBcuCtxDestroy\fP, \fBcuCtxGetApiVersion\fP, \fBcuCtxGetCacheConfig\fP, \fBcuCtxGetDevice\fP, \fBcuCtxGetLimit\fP, \fBcuCtxPopCurrent\fP, \fBcuCtxPushCurrent\fP, \fBcuCtxSetLimit\fP, \fBcuCtxSynchronize\fP, \fBcuFuncSetCacheConfig\fP 
.RE
.PP

.SS "\fBCUresult\fP cuCtxSetCurrent (\fBCUcontext\fP ctx)"
.PP
Binds the specified CUDA context to the calling CPU thread. If \fCctx\fP is NULL then the CUDA context previously bound to the calling CPU thread is unbound and \fBCUDA_SUCCESS\fP is returned.
.PP
If there exists a CUDA context stack on the calling CPU thread, this will replace the top of that stack with \fCctx\fP. If \fCctx\fP is NULL then this will be equivalent to popping the top of the calling CPU thread's CUDA context stack (or a no-op if the calling CPU thread's CUDA context stack is empty).
.PP
\fBParameters:\fP
.RS 4
\fIctx\fP - Context to bind to the calling CPU thread
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuCtxGetCurrent\fP, \fBcuCtxCreate\fP, \fBcuCtxDestroy\fP 
.RE
.PP

.SS "\fBCUresult\fP cuCtxSetLimit (\fBCUlimit\fP limit, size_t value)"
.PP
Setting \fClimit\fP to \fCvalue\fP is a request by the application to update the current limit maintained by the context. The driver is free to modify the requested value to meet h/w requirements (this could be clamping to minimum or maximum values, rounding up to nearest element size, etc). The application can use \fBcuCtxGetLimit()\fP to find out exactly what the limit has been set to.
.PP
Setting each \fBCUlimit\fP has its own specific restrictions, so each is discussed here.
.PP
.IP "\(bu" 2
\fBCU_LIMIT_STACK_SIZE\fP controls the stack size in bytes of each GPU thread. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error \fBCUDA_ERROR_UNSUPPORTED_LIMIT\fP being returned.
.PP
.PP
.IP "\(bu" 2
\fBCU_LIMIT_PRINTF_FIFO_SIZE\fP controls the size in bytes of the FIFO used by the printf() device system call. Setting \fBCU_LIMIT_PRINTF_FIFO_SIZE\fP must be performed before launching any kernel that uses the printf() device system call, otherwise \fBCUDA_ERROR_INVALID_VALUE\fP will be returned. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error \fBCUDA_ERROR_UNSUPPORTED_LIMIT\fP being returned.
.PP
.PP
.IP "\(bu" 2
\fBCU_LIMIT_MALLOC_HEAP_SIZE\fP controls the size in bytes of the heap used by the malloc() and free() device system calls. Setting \fBCU_LIMIT_MALLOC_HEAP_SIZE\fP must be performed before launching any kernel that uses the malloc() or free() device system calls, otherwise \fBCUDA_ERROR_INVALID_VALUE\fP will be returned. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error \fBCUDA_ERROR_UNSUPPORTED_LIMIT\fP being returned.
.PP
.PP
.IP "\(bu" 2
\fBCU_LIMIT_DEV_RUNTIME_SYNC_DEPTH\fP controls the maximum nesting depth of a grid at which a thread can safely call cudaDeviceSynchronize(). Setting this limit must be performed before any launch of a kernel that uses the device runtime and calls cudaDeviceSynchronize() above the default sync depth, two levels of grids. Calls to cudaDeviceSynchronize() will fail with error code cudaErrorSyncDepthExceeded if the limitation is violated. This limit can be set smaller than the default or up the maximum launch depth of 24. When setting this limit, keep in mind that additional levels of sync depth require the driver to reserve large amounts of device memory which can no longer be used for user allocations. If these reservations of device memory fail, \fBcuCtxSetLimit\fP will return \fBCUDA_ERROR_OUT_OF_MEMORY\fP, and the limit can be reset to a lower value. This limit is only applicable to devices of compute capability 3.5 and higher. Attempting to set this limit on devices of compute capability less than 3.5 will result in the error \fBCUDA_ERROR_UNSUPPORTED_LIMIT\fP being returned.
.PP
.PP
.IP "\(bu" 2
\fBCU_LIMIT_DEV_RUNTIME_PENDING_LAUNCH_COUNT\fP controls the maximum number of outstanding device runtime launches that can be made from the current context. A grid is outstanding from the point of launch up until the grid is known to have been completed. Device runtime launches which violate this limitation fail and return cudaErrorLaunchPendingCountExceeded when cudaGetLastError() is called after launch. If more pending launches than the default (2048 launches) are needed for a module using the device runtime, this limit can be increased. Keep in mind that being able to sustain additional pending launches will require the driver to reserve larger amounts of device memory upfront which can no longer be used for allocations. If these reservations fail, \fBcuCtxSetLimit\fP will return \fBCUDA_ERROR_OUT_OF_MEMORY\fP, and the limit can be reset to a lower value. This limit is only applicable to devices of compute capability 3.5 and higher. Attempting to set this limit on devices of compute capability less than 3.5 will result in the error \fBCUDA_ERROR_UNSUPPORTED_LIMIT\fP being returned.
.PP
.PP
\fBParameters:\fP
.RS 4
\fIlimit\fP - Limit to set 
.br
\fIvalue\fP - Size of limit
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_INVALID_VALUE\fP, \fBCUDA_ERROR_UNSUPPORTED_LIMIT\fP, \fBCUDA_ERROR_OUT_OF_MEMORY\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuCtxCreate\fP, \fBcuCtxDestroy\fP, \fBcuCtxGetApiVersion\fP, \fBcuCtxGetCacheConfig\fP, \fBcuCtxGetDevice\fP, \fBcuCtxGetLimit\fP, \fBcuCtxPopCurrent\fP, \fBcuCtxPushCurrent\fP, \fBcuCtxSetCacheConfig\fP, \fBcuCtxSynchronize\fP 
.RE
.PP

.SS "\fBCUresult\fP cuCtxSetSharedMemConfig (\fBCUsharedconfig\fP config)"
.PP
On devices with configurable shared memory banks, this function will set the context's shared memory bank size which is used for subsequent kernel launches.
.PP
Changed the shared memory configuration between launches may insert a device side synchronization point between those launches.
.PP
Changing the shared memory bank size will not increase shared memory usage or affect occupancy of kernels, but may have major effects on performance. Larger bank sizes will allow for greater potential bandwidth to shared memory, but will change what kinds of accesses to shared memory will result in bank conflicts.
.PP
This function will do nothing on devices with fixed shared memory bank size.
.PP
The supported bank configurations are:
.IP "\(bu" 2
\fBCU_SHARED_MEM_CONFIG_DEFAULT_BANK_SIZE\fP: set bank width to the default initial setting (currently, four bytes).
.IP "\(bu" 2
\fBCU_SHARED_MEM_CONFIG_FOUR_BYTE_BANK_SIZE\fP: set shared memory bank width to be natively four bytes.
.IP "\(bu" 2
\fBCU_SHARED_MEM_CONFIG_EIGHT_BYTE_BANK_SIZE\fP: set shared memory bank width to be natively eight bytes.
.PP
.PP
\fBParameters:\fP
.RS 4
\fIconfig\fP - requested shared memory configuration
.RE
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP, \fBCUDA_ERROR_INVALID_VALUE\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuCtxCreate\fP, \fBcuCtxDestroy\fP, \fBcuCtxGetApiVersion\fP, \fBcuCtxGetCacheConfig\fP, \fBcuCtxGetDevice\fP, \fBcuCtxGetLimit\fP, \fBcuCtxPopCurrent\fP, \fBcuCtxPushCurrent\fP, \fBcuCtxSetLimit\fP, \fBcuCtxSynchronize\fP, \fBcuCtxGetSharedMemConfig\fP, \fBcuFuncSetCacheConfig\fP, 
.RE
.PP

.SS "\fBCUresult\fP cuCtxSynchronize (void)"
.PP
Blocks until the device has completed all preceding requested tasks. \fBcuCtxSynchronize()\fP returns an error if one of the preceding tasks failed. If the context was created with the \fBCU_CTX_SCHED_BLOCKING_SYNC\fP flag, the CPU thread will block until the GPU context has finished its work.
.PP
\fBReturns:\fP
.RS 4
\fBCUDA_SUCCESS\fP, \fBCUDA_ERROR_DEINITIALIZED\fP, \fBCUDA_ERROR_NOT_INITIALIZED\fP, \fBCUDA_ERROR_INVALID_CONTEXT\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcuCtxCreate\fP, \fBcuCtxDestroy\fP, \fBcuCtxGetApiVersion\fP, \fBcuCtxGetCacheConfig\fP, \fBcuCtxGetDevice\fP, \fBcuCtxGetLimit\fP, \fBcuCtxPopCurrent\fP, \fBcuCtxPushCurrent\fP, \fBcuCtxSetCacheConfig\fP, \fBcuCtxSetLimit\fP 
.RE
.PP

.SH "Author"
.PP 
Generated automatically by Doxygen from the source code.