Sophie: nvidia-cuda-toolkit-devel-6.5.14-6.mga5.nonfree x86

nvidia-cuda-toolkit-devel-6.5.14-6.mga5.nonfree.x86_64.rpm

.TH "Device Management" 3 "7 Aug 2014" "Version 6.0" "Doxygen" \" -*- nroff -*-
.ad l
.nh
.SH NAME
Device Management \- 
.SS "Functions"

.in +1c
.ti -1c
.RI "\fBcudaError_t\fP \fBcudaChooseDevice\fP (int *device, const struct \fBcudaDeviceProp\fP *prop)"
.br
.RI "\fISelect compute-device which best matches criteria. \fP"
.ti -1c
.RI "__cudart_builtin__ \fBcudaError_t\fP \fBcudaDeviceGetAttribute\fP (int *value, enum \fBcudaDeviceAttr\fP attr, int device)"
.br
.RI "\fIReturns information about the device. \fP"
.ti -1c
.RI "\fBcudaError_t\fP \fBcudaDeviceGetByPCIBusId\fP (int *device, const char *pciBusId)"
.br
.RI "\fIReturns a handle to a compute device. \fP"
.ti -1c
.RI "__cudart_builtin__ \fBcudaError_t\fP \fBcudaDeviceGetCacheConfig\fP (enum \fBcudaFuncCache\fP *pCacheConfig)"
.br
.RI "\fIReturns the preferred cache configuration for the current device. \fP"
.ti -1c
.RI "__cudart_builtin__ \fBcudaError_t\fP \fBcudaDeviceGetLimit\fP (size_t *pValue, enum \fBcudaLimit\fP limit)"
.br
.RI "\fIReturns resource limits. \fP"
.ti -1c
.RI "\fBcudaError_t\fP \fBcudaDeviceGetPCIBusId\fP (char *pciBusId, int len, int device)"
.br
.RI "\fIReturns a PCI Bus Id string for the device. \fP"
.ti -1c
.RI "__cudart_builtin__ \fBcudaError_t\fP \fBcudaDeviceGetSharedMemConfig\fP (enum \fBcudaSharedMemConfig\fP *pConfig)"
.br
.RI "\fIReturns the shared memory configuration for the current device. \fP"
.ti -1c
.RI "__cudart_builtin__ \fBcudaError_t\fP \fBcudaDeviceGetStreamPriorityRange\fP (int *leastPriority, int *greatestPriority)"
.br
.RI "\fIReturns numerical values that correspond to the least and greatest stream priorities. \fP"
.ti -1c
.RI "\fBcudaError_t\fP \fBcudaDeviceReset\fP (void)"
.br
.RI "\fIDestroy all allocations and reset all state on the current device in the current process. \fP"
.ti -1c
.RI "\fBcudaError_t\fP \fBcudaDeviceSetCacheConfig\fP (enum \fBcudaFuncCache\fP cacheConfig)"
.br
.RI "\fISets the preferred cache configuration for the current device. \fP"
.ti -1c
.RI "\fBcudaError_t\fP \fBcudaDeviceSetLimit\fP (enum \fBcudaLimit\fP limit, size_t value)"
.br
.RI "\fISet resource limits. \fP"
.ti -1c
.RI "\fBcudaError_t\fP \fBcudaDeviceSetSharedMemConfig\fP (enum \fBcudaSharedMemConfig\fP config)"
.br
.RI "\fISets the shared memory configuration for the current device. \fP"
.ti -1c
.RI "__cudart_builtin__ \fBcudaError_t\fP \fBcudaDeviceSynchronize\fP (void)"
.br
.RI "\fIWait for compute device to finish. \fP"
.ti -1c
.RI "__cudart_builtin__ \fBcudaError_t\fP \fBcudaGetDevice\fP (int *device)"
.br
.RI "\fIReturns which device is currently being used. \fP"
.ti -1c
.RI "__cudart_builtin__ \fBcudaError_t\fP \fBcudaGetDeviceCount\fP (int *count)"
.br
.RI "\fIReturns the number of compute-capable devices. \fP"
.ti -1c
.RI "__cudart_builtin__ \fBcudaError_t\fP \fBcudaGetDeviceProperties\fP (struct \fBcudaDeviceProp\fP *prop, int device)"
.br
.RI "\fIReturns information about the compute-device. \fP"
.ti -1c
.RI "\fBcudaError_t\fP \fBcudaIpcCloseMemHandle\fP (void *devPtr)"
.br
.RI "\fIClose memory mapped with cudaIpcOpenMemHandle. \fP"
.ti -1c
.RI "\fBcudaError_t\fP \fBcudaIpcGetEventHandle\fP (\fBcudaIpcEventHandle_t\fP *handle, \fBcudaEvent_t\fP event)"
.br
.RI "\fIGets an interprocess handle for a previously allocated event. \fP"
.ti -1c
.RI "\fBcudaError_t\fP \fBcudaIpcGetMemHandle\fP (\fBcudaIpcMemHandle_t\fP *handle, void *devPtr)"
.br
.RI "\fIGets an interprocess memory handle for an existing device memory allocation. \fP"
.ti -1c
.RI "\fBcudaError_t\fP \fBcudaIpcOpenEventHandle\fP (\fBcudaEvent_t\fP *event, \fBcudaIpcEventHandle_t\fP handle)"
.br
.RI "\fIOpens an interprocess event handle for use in the current process. \fP"
.ti -1c
.RI "\fBcudaError_t\fP \fBcudaIpcOpenMemHandle\fP (void **devPtr, \fBcudaIpcMemHandle_t\fP handle, unsigned int flags)"
.br
.RI "\fIOpens an interprocess memory handle exported from another process and returns a device pointer usable in the local process. \fP"
.ti -1c
.RI "\fBcudaError_t\fP \fBcudaSetDevice\fP (int device)"
.br
.RI "\fISet device to be used for GPU executions. \fP"
.ti -1c
.RI "\fBcudaError_t\fP \fBcudaSetDeviceFlags\fP (unsigned int flags)"
.br
.RI "\fISets flags to be used for device executions. \fP"
.ti -1c
.RI "\fBcudaError_t\fP \fBcudaSetValidDevices\fP (int *device_arr, int len)"
.br
.RI "\fISet a list of devices that can be used for CUDA. \fP"
.in -1c
.SH "Detailed Description"
.PP 
CUDART_DEVICE
.PP
\\brief device management functions of the CUDA runtime API (cuda_runtime_api.h)
.PP
This section describes the device management functions of the CUDA runtime application programming interface. 
.SH "Function Documentation"
.PP 
.SS "\fBcudaError_t\fP cudaChooseDevice (int * device, const struct \fBcudaDeviceProp\fP * prop)"
.PP
Returns in \fC*device\fP the device which has properties that best match \fC*prop\fP.
.PP
\fBParameters:\fP
.RS 4
\fIdevice\fP - Device with best match 
.br
\fIprop\fP - Desired device properties
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorInvalidValue\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaGetDeviceCount\fP, \fBcudaGetDevice\fP, \fBcudaSetDevice\fP, \fBcudaGetDeviceProperties\fP 
.RE
.PP

.SS "__cudart_builtin__ \fBcudaError_t\fP cudaDeviceGetAttribute (int * value, enum \fBcudaDeviceAttr\fP attr, int device)"
.PP
Returns in \fC*value\fP the integer value of the attribute \fCattr\fP on device \fCdevice\fP. The supported attributes are:
.IP "\(bu" 2
\fBcudaDevAttrMaxThreadsPerBlock\fP: Maximum number of threads per block;
.IP "\(bu" 2
\fBcudaDevAttrMaxBlockDimX\fP: Maximum x-dimension of a block;
.IP "\(bu" 2
\fBcudaDevAttrMaxBlockDimY\fP: Maximum y-dimension of a block;
.IP "\(bu" 2
\fBcudaDevAttrMaxBlockDimZ\fP: Maximum z-dimension of a block;
.IP "\(bu" 2
\fBcudaDevAttrMaxGridDimX\fP: Maximum x-dimension of a grid;
.IP "\(bu" 2
\fBcudaDevAttrMaxGridDimY\fP: Maximum y-dimension of a grid;
.IP "\(bu" 2
\fBcudaDevAttrMaxGridDimZ\fP: Maximum z-dimension of a grid;
.IP "\(bu" 2
\fBcudaDevAttrMaxSharedMemoryPerBlock\fP: Maximum amount of shared memory available to a thread block in bytes;
.IP "\(bu" 2
\fBcudaDevAttrTotalConstantMemory\fP: Memory available on device for __constant__ variables in a CUDA C kernel in bytes;
.IP "\(bu" 2
\fBcudaDevAttrWarpSize\fP: Warp size in threads;
.IP "\(bu" 2
\fBcudaDevAttrMaxPitch\fP: Maximum pitch in bytes allowed by the memory copy functions that involve memory regions allocated through \fBcudaMallocPitch()\fP;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture1DWidth\fP: Maximum 1D texture width;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture1DLinearWidth\fP: Maximum width for a 1D texture bound to linear memory;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture1DMipmappedWidth\fP: Maximum mipmapped 1D texture width;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture2DWidth\fP: Maximum 2D texture width;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture2DHeight\fP: Maximum 2D texture height;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture2DLinearWidth\fP: Maximum width for a 2D texture bound to linear memory;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture2DLinearHeight\fP: Maximum height for a 2D texture bound to linear memory;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture2DLinearPitch\fP: Maximum pitch in bytes for a 2D texture bound to linear memory;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture2DMipmappedWidth\fP: Maximum mipmapped 2D texture width;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture2DMipmappedHeight\fP: Maximum mipmapped 2D texture height;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture3DWidth\fP: Maximum 3D texture width;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture3DHeight\fP: Maximum 3D texture height;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture3DDepth\fP: Maximum 3D texture depth;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture3DWidthAlt\fP: Alternate maximum 3D texture width, 0 if no alternate maximum 3D texture size is supported;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture3DHeightAlt\fP: Alternate maximum 3D texture height, 0 if no alternate maximum 3D texture size is supported;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture3DDepthAlt\fP: Alternate maximum 3D texture depth, 0 if no alternate maximum 3D texture size is supported;
.IP "\(bu" 2
\fBcudaDevAttrMaxTextureCubemapWidth\fP: Maximum cubemap texture width or height;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture1DLayeredWidth\fP: Maximum 1D layered texture width;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture1DLayeredLayers\fP: Maximum layers in a 1D layered texture;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture2DLayeredWidth\fP: Maximum 2D layered texture width;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture2DLayeredHeight\fP: Maximum 2D layered texture height;
.IP "\(bu" 2
\fBcudaDevAttrMaxTexture2DLayeredLayers\fP: Maximum layers in a 2D layered texture;
.IP "\(bu" 2
\fBcudaDevAttrMaxTextureCubemapLayeredWidth\fP: Maximum cubemap layered texture width or height;
.IP "\(bu" 2
\fBcudaDevAttrMaxTextureCubemapLayeredLayers\fP: Maximum layers in a cubemap layered texture;
.IP "\(bu" 2
\fBcudaDevAttrMaxSurface1DWidth\fP: Maximum 1D surface width;
.IP "\(bu" 2
\fBcudaDevAttrMaxSurface2DWidth\fP: Maximum 2D surface width;
.IP "\(bu" 2
\fBcudaDevAttrMaxSurface2DHeight\fP: Maximum 2D surface height;
.IP "\(bu" 2
\fBcudaDevAttrMaxSurface3DWidth\fP: Maximum 3D surface width;
.IP "\(bu" 2
\fBcudaDevAttrMaxSurface3DHeight\fP: Maximum 3D surface height;
.IP "\(bu" 2
\fBcudaDevAttrMaxSurface3DDepth\fP: Maximum 3D surface depth;
.IP "\(bu" 2
\fBcudaDevAttrMaxSurface1DLayeredWidth\fP: Maximum 1D layered surface width;
.IP "\(bu" 2
\fBcudaDevAttrMaxSurface1DLayeredLayers\fP: Maximum layers in a 1D layered surface;
.IP "\(bu" 2
\fBcudaDevAttrMaxSurface2DLayeredWidth\fP: Maximum 2D layered surface width;
.IP "\(bu" 2
\fBcudaDevAttrMaxSurface2DLayeredHeight\fP: Maximum 2D layered surface height;
.IP "\(bu" 2
\fBcudaDevAttrMaxSurface2DLayeredLayers\fP: Maximum layers in a 2D layered surface;
.IP "\(bu" 2
\fBcudaDevAttrMaxSurfaceCubemapWidth\fP: Maximum cubemap surface width;
.IP "\(bu" 2
\fBcudaDevAttrMaxSurfaceCubemapLayeredWidth\fP: Maximum cubemap layered surface width;
.IP "\(bu" 2
\fBcudaDevAttrMaxSurfaceCubemapLayeredLayers\fP: Maximum layers in a cubemap layered surface;
.IP "\(bu" 2
\fBcudaDevAttrMaxRegistersPerBlock\fP: Maximum number of 32-bit registers available to a thread block;
.IP "\(bu" 2
\fBcudaDevAttrClockRate\fP: Peak clock frequency in kilohertz;
.IP "\(bu" 2
\fBcudaDevAttrTextureAlignment\fP: Alignment requirement; texture base addresses aligned to textureAlign bytes do not need an offset applied to texture fetches;
.IP "\(bu" 2
\fBcudaDevAttrTexturePitchAlignment\fP: Pitch alignment requirement for 2D texture references bound to pitched memory;
.IP "\(bu" 2
\fBcudaDevAttrGpuOverlap\fP: 1 if the device can concurrently copy memory between host and device while executing a kernel, or 0 if not;
.IP "\(bu" 2
\fBcudaDevAttrMultiProcessorCount\fP: Number of multiprocessors on the device;
.IP "\(bu" 2
\fBcudaDevAttrKernelExecTimeout\fP: 1 if there is a run time limit for kernels executed on the device, or 0 if not;
.IP "\(bu" 2
\fBcudaDevAttrIntegrated\fP: 1 if the device is integrated with the memory subsystem, or 0 if not;
.IP "\(bu" 2
\fBcudaDevAttrCanMapHostMemory\fP: 1 if the device can map host memory into the CUDA address space, or 0 if not;
.IP "\(bu" 2
\fBcudaDevAttrComputeMode\fP: Compute mode is the compute mode that the device is currently in. Available modes are as follows:
.IP "  \(bu" 4
\fBcudaComputeModeDefault\fP: Default mode - Device is not restricted and multiple threads can use \fBcudaSetDevice()\fP with this device.
.IP "  \(bu" 4
\fBcudaComputeModeExclusive\fP: Compute-exclusive mode - Only one thread will be able to use \fBcudaSetDevice()\fP with this device.
.IP "  \(bu" 4
\fBcudaComputeModeProhibited\fP: Compute-prohibited mode - No threads can use \fBcudaSetDevice()\fP with this device.
.IP "  \(bu" 4
\fBcudaComputeModeExclusiveProcess\fP: Compute-exclusive-process mode - Many threads in one process will be able to use \fBcudaSetDevice()\fP with this device.
.PP

.IP "\(bu" 2
\fBcudaDevAttrConcurrentKernels\fP: 1 if the device supports executing multiple kernels within the same context simultaneously, or 0 if not. It is not guaranteed that multiple kernels will be resident on the device concurrently so this feature should not be relied upon for correctness;
.IP "\(bu" 2
\fBcudaDevAttrEccEnabled\fP: 1 if error correction is enabled on the device, 0 if error correction is disabled or not supported by the device;
.IP "\(bu" 2
\fBcudaDevAttrPciBusId\fP: PCI bus identifier of the device;
.IP "\(bu" 2
\fBcudaDevAttrPciDeviceId\fP: PCI device (also known as slot) identifier of the device;
.IP "\(bu" 2
\fBcudaDevAttrTccDriver\fP: 1 if the device is using a TCC driver. TCC is only available on Tesla hardware running Windows Vista or later;
.IP "\(bu" 2
\fBcudaDevAttrMemoryClockRate\fP: Peak memory clock frequency in kilohertz;
.IP "\(bu" 2
\fBcudaDevAttrGlobalMemoryBusWidth\fP: Global memory bus width in bits;
.IP "\(bu" 2
\fBcudaDevAttrL2CacheSize\fP: Size of L2 cache in bytes. 0 if the device doesn't have L2 cache;
.IP "\(bu" 2
\fBcudaDevAttrMaxThreadsPerMultiProcessor\fP: Maximum resident threads per multiprocessor;
.IP "\(bu" 2
\fBcudaDevAttrUnifiedAddressing\fP: 1 if the device shares a unified address space with the host, or 0 if not;
.IP "\(bu" 2
\fBcudaDevAttrComputeCapabilityMajor\fP: Major compute capability version number;
.IP "\(bu" 2
\fBcudaDevAttrComputeCapabilityMinor\fP: Minor compute capability version number;
.IP "\(bu" 2
\fBcudaDevAttrStreamPrioritiesSupported\fP: 1 if the device supports stream priorities, or 0 if not;
.IP "\(bu" 2
\fBcudaDevAttrGlobalL1CacheSupported\fP: 1 if device supports caching globals in L1 cache, 0 if not;
.IP "\(bu" 2
\fBcudaDevAttrGlobalL1CacheSupported\fP: 1 if device supports caching locals in L1 cache, 0 if not;
.IP "\(bu" 2
\fBcudaDevAttrMaxSharedMemoryPerMultiprocessor\fP: Maximum amount of shared memory available to a multiprocessor in bytes; this amount is shared by all thread blocks simultaneously resident on a multiprocessor;
.IP "\(bu" 2
\fBcudaDevAttrMaxRegistersPerMultiprocessor\fP: Maximum number of 32-bit registers available to a multiprocessor; this number is shared by all thread blocks simultaneously resident on a multiprocessor;
.IP "\(bu" 2
cudaDevAttrManagedMemSupported: 1 if device supports allocating managed memory, 0 if not;
.IP "\(bu" 2
\fBcudaDevAttrIsMultiGpuBoard\fP: 1 if device is on a multi-GPU board, 0 if not;
.IP "\(bu" 2
\fBcudaDevAttrMultiGpuBoardGroupID\fP: Unique identifier for a group of devices on the same multi-GPU board;
.PP
.PP
\fBParameters:\fP
.RS 4
\fIvalue\fP - Returned device attribute value 
.br
\fIattr\fP - Device attribute to query 
.br
\fIdevice\fP - Device number to query
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorInvalidDevice\fP, \fBcudaErrorInvalidValue\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaGetDeviceCount\fP, \fBcudaGetDevice\fP, \fBcudaSetDevice\fP, \fBcudaChooseDevice\fP, \fBcudaGetDeviceProperties\fP 
.RE
.PP

.SS "\fBcudaError_t\fP cudaDeviceGetByPCIBusId (int * device, const char * pciBusId)"
.PP
Returns in \fC*device\fP a device ordinal given a PCI bus ID string.
.PP
\fBParameters:\fP
.RS 4
\fIdevice\fP - Returned device ordinal
.br
\fIpciBusId\fP - String in one of the following forms: [domain]:[bus]:[device].[function] [domain]:[bus]:[device] [bus]:[device].[function] where \fCdomain\fP, \fCbus\fP, \fCdevice\fP, and \fCfunction\fP are all hexadecimal values
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorInvalidValue\fP, \fBcudaErrorInvalidDevice\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaDeviceGetPCIBusId\fP 
.RE
.PP

.SS "__cudart_builtin__ \fBcudaError_t\fP cudaDeviceGetCacheConfig (enum \fBcudaFuncCache\fP * pCacheConfig)"
.PP
On devices where the L1 cache and shared memory use the same hardware resources, this returns through \fCpCacheConfig\fP the preferred cache configuration for the current device. This is only a preference. The runtime will use the requested configuration if possible, but it is free to choose a different configuration if required to execute functions.
.PP
This will return a \fCpCacheConfig\fP of \fBcudaFuncCachePreferNone\fP on devices where the size of the L1 cache and shared memory are fixed.
.PP
The supported cache configurations are:
.IP "\(bu" 2
\fBcudaFuncCachePreferNone\fP: no preference for shared memory or L1 (default)
.IP "\(bu" 2
\fBcudaFuncCachePreferShared\fP: prefer larger shared memory and smaller L1 cache
.IP "\(bu" 2
\fBcudaFuncCachePreferL1\fP: prefer larger L1 cache and smaller shared memory
.IP "\(bu" 2
\fBcudaFuncCachePreferEqual\fP: prefer equal size L1 cache and shared memory
.PP
.PP
\fBParameters:\fP
.RS 4
\fIpCacheConfig\fP - Returned cache configuration
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorInitializationError\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaDeviceSetCacheConfig\fP, \fBcudaFuncSetCacheConfig (C API)\fP, \fBcudaFuncSetCacheConfig (C++ API)\fP 
.RE
.PP

.SS "__cudart_builtin__ \fBcudaError_t\fP cudaDeviceGetLimit (size_t * pValue, enum \fBcudaLimit\fP limit)"
.PP
Returns in \fC*pValue\fP the current size of \fClimit\fP. The supported \fBcudaLimit\fP values are:
.IP "\(bu" 2
\fBcudaLimitStackSize\fP: stack size in bytes of each GPU thread;
.IP "\(bu" 2
\fBcudaLimitPrintfFifoSize\fP: size in bytes of the shared FIFO used by the printf() and fprintf() device system calls.
.IP "\(bu" 2
\fBcudaLimitMallocHeapSize\fP: size in bytes of the heap used by the malloc() and free() device system calls;
.IP "\(bu" 2
\fBcudaLimitDevRuntimeSyncDepth\fP: maximum grid depth at which a thread can isssue the device runtime call \fBcudaDeviceSynchronize()\fP to wait on child grid launches to complete.
.IP "\(bu" 2
\fBcudaLimitDevRuntimePendingLaunchCount\fP: maximum number of outstanding device runtime launches.
.PP
.PP
\fBParameters:\fP
.RS 4
\fIlimit\fP - Limit to query 
.br
\fIpValue\fP - Returned size of the limit
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorUnsupportedLimit\fP, \fBcudaErrorInvalidValue\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaDeviceSetLimit\fP 
.RE
.PP

.SS "\fBcudaError_t\fP cudaDeviceGetPCIBusId (char * pciBusId, int len, int device)"
.PP
Returns an ASCII string identifying the device \fCdev\fP in the NULL-terminated string pointed to by \fCpciBusId\fP. \fClen\fP specifies the maximum length of the string that may be returned.
.PP
\fBParameters:\fP
.RS 4
\fIpciBusId\fP - Returned identifier string for the device in the following format [domain]:[bus]:[device].[function] where \fCdomain\fP, \fCbus\fP, \fCdevice\fP, and \fCfunction\fP are all hexadecimal values. pciBusId should be large enough to store 13 characters including the NULL-terminator.
.br
\fIlen\fP - Maximum length of string to store in \fCname\fP 
.br
\fIdevice\fP - Device to get identifier string for
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorInvalidValue\fP, \fBcudaErrorInvalidDevice\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaDeviceGetByPCIBusId\fP 
.RE
.PP

.SS "__cudart_builtin__ \fBcudaError_t\fP cudaDeviceGetSharedMemConfig (enum \fBcudaSharedMemConfig\fP * pConfig)"
.PP
This function will return in \fCpConfig\fP the current size of shared memory banks on the current device. On devices with configurable shared memory banks, \fBcudaDeviceSetSharedMemConfig\fP can be used to change this setting, so that all subsequent kernel launches will by default use the new bank size. When \fBcudaDeviceGetSharedMemConfig\fP is called on devices without configurable shared memory, it will return the fixed bank size of the hardware.
.PP
The returned bank configurations can be either:
.IP "\(bu" 2
cudaSharedMemBankSizeFourByte - shared memory bank width is four bytes.
.IP "\(bu" 2
cudaSharedMemBankSizeEightByte - shared memory bank width is eight bytes.
.PP
.PP
\fBParameters:\fP
.RS 4
\fIpConfig\fP - Returned cache configuration
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorInvalidValue\fP, \fBcudaErrorInitializationError\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaDeviceSetCacheConfig\fP, \fBcudaDeviceGetCacheConfig\fP, \fBcudaDeviceSetSharedMemConfig\fP, \fBcudaFuncSetCacheConfig\fP 
.RE
.PP

.SS "__cudart_builtin__ \fBcudaError_t\fP cudaDeviceGetStreamPriorityRange (int * leastPriority, int * greatestPriority)"
.PP
Returns in \fC*leastPriority\fP and \fC*greatestPriority\fP the numerical values that correspond to the least and greatest stream priorities respectively. Stream priorities follow a convention where lower numbers imply greater priorities. The range of meaningful stream priorities is given by [\fC*greatestPriority\fP, \fC*leastPriority\fP]. If the user attempts to create a stream with a priority value that is outside the the meaningful range as specified by this API, the priority is automatically clamped down or up to either \fC*leastPriority\fP or \fC*greatestPriority\fP respectively. See \fBcudaStreamCreateWithPriority\fP for details on creating a priority stream. A NULL may be passed in for \fC*leastPriority\fP or \fC*greatestPriority\fP if the value is not desired.
.PP
This function will return '0' in both \fC*leastPriority\fP and \fC*greatestPriority\fP if the current context's device does not support stream priorities (see \fBcudaDeviceGetAttribute\fP).
.PP
\fBParameters:\fP
.RS 4
\fIleastPriority\fP - Pointer to an int in which the numerical value for least stream priority is returned 
.br
\fIgreatestPriority\fP - Pointer to an int in which the numerical value for greatest stream priority is returned
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorInvalidValue\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaStreamCreateWithPriority\fP, \fBcudaStreamGetPriority\fP 
.RE
.PP

.SS "\fBcudaError_t\fP cudaDeviceReset (void)"
.PP
Explicitly destroys and cleans up all resources associated with the current device in the current process. Any subsequent API call to this device will reinitialize the device.
.PP
Note that this function will reset the device immediately. It is the caller's responsibility to ensure that the device is not being accessed by any other host threads from the process when this function is called.
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaDeviceSynchronize\fP 
.RE
.PP

.SS "\fBcudaError_t\fP cudaDeviceSetCacheConfig (enum \fBcudaFuncCache\fP cacheConfig)"
.PP
On devices where the L1 cache and shared memory use the same hardware resources, this sets through \fCcacheConfig\fP the preferred cache configuration for the current device. This is only a preference. The runtime will use the requested configuration if possible, but it is free to choose a different configuration if required to execute the function. Any function preference set via \fBcudaFuncSetCacheConfig (C API)\fP or \fBcudaFuncSetCacheConfig (C++ API)\fP will be preferred over this device-wide setting. Setting the device-wide cache configuration to \fBcudaFuncCachePreferNone\fP will cause subsequent kernel launches to prefer to not change the cache configuration unless required to launch the kernel.
.PP
This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.
.PP
Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point.
.PP
The supported cache configurations are:
.IP "\(bu" 2
\fBcudaFuncCachePreferNone\fP: no preference for shared memory or L1 (default)
.IP "\(bu" 2
\fBcudaFuncCachePreferShared\fP: prefer larger shared memory and smaller L1 cache
.IP "\(bu" 2
\fBcudaFuncCachePreferL1\fP: prefer larger L1 cache and smaller shared memory
.IP "\(bu" 2
\fBcudaFuncCachePreferEqual\fP: prefer equal size L1 cache and shared memory
.PP
.PP
\fBParameters:\fP
.RS 4
\fIcacheConfig\fP - Requested cache configuration
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorInitializationError\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaDeviceGetCacheConfig\fP, \fBcudaFuncSetCacheConfig (C API)\fP, \fBcudaFuncSetCacheConfig (C++ API)\fP 
.RE
.PP

.SS "\fBcudaError_t\fP cudaDeviceSetLimit (enum \fBcudaLimit\fP limit, size_t value)"
.PP
Setting \fClimit\fP to \fCvalue\fP is a request by the application to update the current limit maintained by the device. The driver is free to modify the requested value to meet h/w requirements (this could be clamping to minimum or maximum values, rounding up to nearest element size, etc). The application can use \fBcudaDeviceGetLimit()\fP to find out exactly what the limit has been set to.
.PP
Setting each \fBcudaLimit\fP has its own specific restrictions, so each is discussed here.
.PP
.IP "\(bu" 2
\fBcudaLimitStackSize\fP controls the stack size in bytes of each GPU thread. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error \fBcudaErrorUnsupportedLimit\fP being returned.
.PP
.PP
.IP "\(bu" 2
\fBcudaLimitPrintfFifoSize\fP controls the size in bytes of the shared FIFO used by the printf() and fprintf() device system calls. Setting \fBcudaLimitPrintfFifoSize\fP must be performed before launching any kernel that uses the printf() or fprintf() device system calls, otherwise \fBcudaErrorInvalidValue\fP will be returned. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error \fBcudaErrorUnsupportedLimit\fP being returned.
.PP
.PP
.IP "\(bu" 2
\fBcudaLimitMallocHeapSize\fP controls the size in bytes of the heap used by the malloc() and free() device system calls. Setting \fBcudaLimitMallocHeapSize\fP must be performed before launching any kernel that uses the malloc() or free() device system calls, otherwise \fBcudaErrorInvalidValue\fP will be returned. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error \fBcudaErrorUnsupportedLimit\fP being returned.
.PP
.PP
.IP "\(bu" 2
\fBcudaLimitDevRuntimeSyncDepth\fP controls the maximum nesting depth of a grid at which a thread can safely call \fBcudaDeviceSynchronize()\fP. Setting this limit must be performed before any launch of a kernel that uses the device runtime and calls \fBcudaDeviceSynchronize()\fP above the default sync depth, two levels of grids. Calls to \fBcudaDeviceSynchronize()\fP will fail with error code \fBcudaErrorSyncDepthExceeded\fP if the limitation is violated. This limit can be set smaller than the default or up the maximum launch depth of 24. When setting this limit, keep in mind that additional levels of sync depth require the runtime to reserve large amounts of device memory which can no longer be used for user allocations. If these reservations of device memory fail, \fBcudaDeviceSetLimit\fP will return \fBcudaErrorMemoryAllocation\fP, and the limit can be reset to a lower value. This limit is only applicable to devices of compute capability 3.5 and higher. Attempting to set this limit on devices of compute capability less than 3.5 will result in the error \fBcudaErrorUnsupportedLimit\fP being returned.
.PP
.PP
.IP "\(bu" 2
\fBcudaLimitDevRuntimePendingLaunchCount\fP controls the maximum number of outstanding device runtime launches that can be made from the current device. A grid is outstanding from the point of launch up until the grid is known to have been completed. Device runtime launches which violate this limitation fail and return \fBcudaErrorLaunchPendingCountExceeded\fP when \fBcudaGetLastError()\fP is called after launch. If more pending launches than the default (2048 launches) are needed for a module using the device runtime, this limit can be increased. Keep in mind that being able to sustain additional pending launches will require the runtime to reserve larger amounts of device memory upfront which can no longer be used for allocations. If these reservations fail, \fBcudaDeviceSetLimit\fP will return \fBcudaErrorMemoryAllocation\fP, and the limit can be reset to a lower value. This limit is only applicable to devices of compute capability 3.5 and higher. Attempting to set this limit on devices of compute capability less than 3.5 will result in the error \fBcudaErrorUnsupportedLimit\fP being returned.
.PP
.PP
\fBParameters:\fP
.RS 4
\fIlimit\fP - Limit to set 
.br
\fIvalue\fP - Size of limit
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorUnsupportedLimit\fP, \fBcudaErrorInvalidValue\fP, \fBcudaErrorMemoryAllocation\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaDeviceGetLimit\fP 
.RE
.PP

.SS "\fBcudaError_t\fP cudaDeviceSetSharedMemConfig (enum \fBcudaSharedMemConfig\fP config)"
.PP
On devices with configurable shared memory banks, this function will set the shared memory bank size which is used for all subsequent kernel launches. Any per-function setting of shared memory set via \fBcudaFuncSetSharedMemConfig\fP will override the device wide setting.
.PP
Changing the shared memory configuration between launches may introduce a device side synchronization point.
.PP
Changing the shared memory bank size will not increase shared memory usage or affect occupancy of kernels, but may have major effects on performance. Larger bank sizes will allow for greater potential bandwidth to shared memory, but will change what kinds of accesses to shared memory will result in bank conflicts.
.PP
This function will do nothing on devices with fixed shared memory bank size.
.PP
The supported bank configurations are:
.IP "\(bu" 2
cudaSharedMemBankSizeDefault: set bank width the device default (currently, four bytes)
.IP "\(bu" 2
cudaSharedMemBankSizeFourByte: set shared memory bank width to be four bytes natively.
.IP "\(bu" 2
cudaSharedMemBankSizeEightByte: set shared memory bank width to be eight bytes natively.
.PP
.PP
\fBParameters:\fP
.RS 4
\fIconfig\fP - Requested cache configuration
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorInvalidValue\fP, \fBcudaErrorInitializationError\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaDeviceSetCacheConfig\fP, \fBcudaDeviceGetCacheConfig\fP, \fBcudaDeviceGetSharedMemConfig\fP, \fBcudaFuncSetCacheConfig\fP 
.RE
.PP

.SS "__cudart_builtin__ \fBcudaError_t\fP cudaDeviceSynchronize (void)"
.PP
Blocks until the device has completed all preceding requested tasks. \fBcudaDeviceSynchronize()\fP returns an error if one of the preceding tasks has failed. If the \fBcudaDeviceScheduleBlockingSync\fP flag was set for this device, the host thread will block until the device has finished its work.
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaDeviceReset\fP 
.RE
.PP

.SS "__cudart_builtin__ \fBcudaError_t\fP cudaGetDevice (int * device)"
.PP
Returns in \fC*device\fP the current device for the calling host thread.
.PP
\fBParameters:\fP
.RS 4
\fIdevice\fP - Returns the device on which the active host thread executes the device code.
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaGetDeviceCount\fP, \fBcudaSetDevice\fP, \fBcudaGetDeviceProperties\fP, \fBcudaChooseDevice\fP 
.RE
.PP

.SS "__cudart_builtin__ \fBcudaError_t\fP cudaGetDeviceCount (int * count)"
.PP
Returns in \fC*count\fP the number of devices with compute capability greater or equal to 1.0 that are available for execution. If there is no such device then \fBcudaGetDeviceCount()\fP will return \fBcudaErrorNoDevice\fP. If no driver can be loaded to determine if any such devices exist then \fBcudaGetDeviceCount()\fP will return \fBcudaErrorInsufficientDriver\fP.
.PP
\fBParameters:\fP
.RS 4
\fIcount\fP - Returns the number of devices with compute capability greater or equal to 1.0
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorNoDevice\fP, \fBcudaErrorInsufficientDriver\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaGetDevice\fP, \fBcudaSetDevice\fP, \fBcudaGetDeviceProperties\fP, \fBcudaChooseDevice\fP 
.RE
.PP

.SS "__cudart_builtin__ \fBcudaError_t\fP cudaGetDeviceProperties (struct \fBcudaDeviceProp\fP * prop, int device)"
.PP
Returns in \fC*prop\fP the properties of device \fCdev\fP. The \fBcudaDeviceProp\fP structure is defined as: 
.PP
.nf
    struct cudaDeviceProp {
        char name[256];
        size_t totalGlobalMem;
        size_t sharedMemPerBlock;
        int regsPerBlock;
        int warpSize;
        size_t memPitch;
        int maxThreadsPerBlock;
        int maxThreadsDim[3];
        int maxGridSize[3];
        int clockRate;
        size_t totalConstMem;
        int major;
        int minor;
        size_t textureAlignment;
        size_t texturePitchAlignment;
        int deviceOverlap;
        int multiProcessorCount;
        int kernelExecTimeoutEnabled;
        int integrated;
        int canMapHostMemory;
        int computeMode;
        int maxTexture1D;
        int maxTexture1DMipmap;
        int maxTexture1DLinear;
        int maxTexture2D[2];
        int maxTexture2DMipmap[2];
        int maxTexture2DLinear[3];
        int maxTexture2DGather[2];
        int maxTexture3D[3];
        int maxTexture3DAlt[3];
        int maxTextureCubemap;
        int maxTexture1DLayered[2];
        int maxTexture2DLayered[3];
        int maxTextureCubemapLayered[2];
        int maxSurface1D;
        int maxSurface2D[2];
        int maxSurface3D[3];
        int maxSurface1DLayered[2];
        int maxSurface2DLayered[3];
        int maxSurfaceCubemap;
        int maxSurfaceCubemapLayered[2];
        size_t surfaceAlignment;
        int concurrentKernels;
        int ECCEnabled;
        int pciBusID;
        int pciDeviceID;
        int pciDomainID;
        int tccDriver;
        int asyncEngineCount;
        int unifiedAddressing;
        int memoryClockRate;
        int memoryBusWidth;
        int l2CacheSize;
        int maxThreadsPerMultiProcessor;
        int streamPrioritiesSupported;
        int globalL1CacheSupported;
        int localL1CacheSupported;
        size_t sharedMemPerMultiprocessor;
        int regsPerMultiprocessor;
        int managedMemSupported;
        int isMultiGpuBoard;
        int multiGpuBoardGroupID;
    }

.fi
.PP
 where:
.IP "\(bu" 2
\fBname[256]\fP is an ASCII string identifying the device;
.IP "\(bu" 2
\fBtotalGlobalMem\fP is the total amount of global memory available on the device in bytes;
.IP "\(bu" 2
\fBsharedMemPerBlock\fP is the maximum amount of shared memory available to a thread block in bytes;
.IP "\(bu" 2
\fBregsPerBlock\fP is the maximum number of 32-bit registers available to a thread block;
.IP "\(bu" 2
\fBwarpSize\fP is the warp size in threads;
.IP "\(bu" 2
\fBmemPitch\fP is the maximum pitch in bytes allowed by the memory copy functions that involve memory regions allocated through \fBcudaMallocPitch()\fP;
.IP "\(bu" 2
\fBmaxThreadsPerBlock\fP is the maximum number of threads per block;
.IP "\(bu" 2
\fBmaxThreadsDim[3]\fP contains the maximum size of each dimension of a block;
.IP "\(bu" 2
\fBmaxGridSize[3]\fP contains the maximum size of each dimension of a grid;
.IP "\(bu" 2
\fBclockRate\fP is the clock frequency in kilohertz;
.IP "\(bu" 2
\fBtotalConstMem\fP is the total amount of constant memory available on the device in bytes;
.IP "\(bu" 2
\fBmajor\fP, \fBminor\fP are the major and minor revision numbers defining the device's compute capability;
.IP "\(bu" 2
\fBtextureAlignment\fP is the alignment requirement; texture base addresses that are aligned to \fBtextureAlignment\fP bytes do not need an offset applied to texture fetches;
.IP "\(bu" 2
\fBtexturePitchAlignment\fP is the pitch alignment requirement for 2D texture references that are bound to pitched memory;
.IP "\(bu" 2
\fBdeviceOverlap\fP is 1 if the device can concurrently copy memory between host and device while executing a kernel, or 0 if not. Deprecated, use instead asyncEngineCount.
.IP "\(bu" 2
\fBmultiProcessorCount\fP is the number of multiprocessors on the device;
.IP "\(bu" 2
\fBkernelExecTimeoutEnabled\fP is 1 if there is a run time limit for kernels executed on the device, or 0 if not.
.IP "\(bu" 2
\fBintegrated\fP is 1 if the device is an integrated (motherboard) GPU and 0 if it is a discrete (card) component.
.IP "\(bu" 2
\fBcanMapHostMemory\fP is 1 if the device can map host memory into the CUDA address space for use with \fBcudaHostAlloc()\fP/\fBcudaHostGetDevicePointer()\fP, or 0 if not;
.IP "\(bu" 2
\fBcomputeMode\fP is the compute mode that the device is currently in. Available modes are as follows:
.IP "  \(bu" 4
cudaComputeModeDefault: Default mode - Device is not restricted and multiple threads can use \fBcudaSetDevice()\fP with this device.
.IP "  \(bu" 4
cudaComputeModeExclusive: Compute-exclusive mode - Only one thread will be able to use \fBcudaSetDevice()\fP with this device.
.IP "  \(bu" 4
cudaComputeModeProhibited: Compute-prohibited mode - No threads can use \fBcudaSetDevice()\fP with this device.
.IP "  \(bu" 4
cudaComputeModeExclusiveProcess: Compute-exclusive-process mode - Many threads in one process will be able to use \fBcudaSetDevice()\fP with this device. 
.br
 If \fBcudaSetDevice()\fP is called on an already occupied \fCdevice\fP with computeMode \fBcudaComputeModeExclusive\fP, \fBcudaErrorDeviceAlreadyInUse\fP will be immediately returned indicating the device cannot be used. When an occupied exclusive mode device is chosen with \fBcudaSetDevice\fP, all subsequent non-device management runtime functions will return \fBcudaErrorDevicesUnavailable\fP.
.PP

.IP "\(bu" 2
\fBmaxTexture1D\fP is the maximum 1D texture size.
.IP "\(bu" 2
\fBmaxTexture1DMipmap\fP is the maximum 1D mipmapped texture texture size.
.IP "\(bu" 2
\fBmaxTexture1DLinear\fP is the maximum 1D texture size for textures bound to linear memory.
.IP "\(bu" 2
\fBmaxTexture2D[2]\fP contains the maximum 2D texture dimensions.
.IP "\(bu" 2
\fBmaxTexture2DMipmap[2]\fP contains the maximum 2D mipmapped texture dimensions.
.IP "\(bu" 2
\fBmaxTexture2DLinear[3]\fP contains the maximum 2D texture dimensions for 2D textures bound to pitch linear memory.
.IP "\(bu" 2
\fBmaxTexture2DGather[2]\fP contains the maximum 2D texture dimensions if texture gather operations have to be performed.
.IP "\(bu" 2
\fBmaxTexture3D[3]\fP contains the maximum 3D texture dimensions.
.IP "\(bu" 2
\fBmaxTexture3DAlt[3]\fP contains the maximum alternate 3D texture dimensions.
.IP "\(bu" 2
\fBmaxTextureCubemap\fP is the maximum cubemap texture width or height.
.IP "\(bu" 2
\fBmaxTexture1DLayered[2]\fP contains the maximum 1D layered texture dimensions.
.IP "\(bu" 2
\fBmaxTexture2DLayered[3]\fP contains the maximum 2D layered texture dimensions.
.IP "\(bu" 2
\fBmaxTextureCubemapLayered[2]\fP contains the maximum cubemap layered texture dimensions.
.IP "\(bu" 2
\fBmaxSurface1D\fP is the maximum 1D surface size.
.IP "\(bu" 2
\fBmaxSurface2D[2]\fP contains the maximum 2D surface dimensions.
.IP "\(bu" 2
\fBmaxSurface3D[3]\fP contains the maximum 3D surface dimensions.
.IP "\(bu" 2
\fBmaxSurface1DLayered[2]\fP contains the maximum 1D layered surface dimensions.
.IP "\(bu" 2
\fBmaxSurface2DLayered[3]\fP contains the maximum 2D layered surface dimensions.
.IP "\(bu" 2
\fBmaxSurfaceCubemap\fP is the maximum cubemap surface width or height.
.IP "\(bu" 2
\fBmaxSurfaceCubemapLayered[2]\fP contains the maximum cubemap layered surface dimensions.
.IP "\(bu" 2
\fBsurfaceAlignment\fP specifies the alignment requirements for surfaces.
.IP "\(bu" 2
\fBconcurrentKernels\fP is 1 if the device supports executing multiple kernels within the same context simultaneously, or 0 if not. It is not guaranteed that multiple kernels will be resident on the device concurrently so this feature should not be relied upon for correctness;
.IP "\(bu" 2
\fBECCEnabled\fP is 1 if the device has ECC support turned on, or 0 if not.
.IP "\(bu" 2
\fBpciBusID\fP is the PCI bus identifier of the device.
.IP "\(bu" 2
\fBpciDeviceID\fP is the PCI device (sometimes called slot) identifier of the device.
.IP "\(bu" 2
\fBpciDomainID\fP is the PCI domain identifier of the device.
.IP "\(bu" 2
\fBtccDriver\fP is 1 if the device is using a TCC driver or 0 if not.
.IP "\(bu" 2
\fBasyncEngineCount\fP is 1 when the device can concurrently copy memory between host and device while executing a kernel. It is 2 when the device can concurrently copy memory between host and device in both directions and execute a kernel at the same time. It is 0 if neither of these is supported.
.IP "\(bu" 2
\fBunifiedAddressing\fP is 1 if the device shares a unified address space with the host and 0 otherwise.
.IP "\(bu" 2
\fBmemoryClockRate\fP is the peak memory clock frequency in kilohertz.
.IP "\(bu" 2
\fBmemoryBusWidth\fP is the memory bus width in bits.
.IP "\(bu" 2
\fBl2CacheSize\fP is L2 cache size in bytes.
.IP "\(bu" 2
\fBmaxThreadsPerMultiProcessor\fP is the number of maximum resident threads per multiprocessor.
.IP "\(bu" 2
\fBstreamPrioritiesSupported\fP is 1 if the device supports stream priorities, or 0 if it is not supported.
.IP "\(bu" 2
\fBglobalL1CacheSupported\fP is 1 if the device supports caching of globals in L1 cache, or 0 if it is not supported.
.IP "\(bu" 2
\fBlocalL1CacheSupported\fP is 1 if the device supports caching of locals in L1 cache, or 0 if it is not supported.
.IP "\(bu" 2
\fBsharedMemPerMultiprocessor\fP is the maximum amount of shared memory available to a multiprocessor in bytes; this amount is shared by all thread blocks simultaneously resident on a multiprocessor;
.IP "\(bu" 2
\fBregsPerMultiprocessor\fP is the maximum number of 32-bit registers available to a multiprocessor; this number is shared by all thread blocks simultaneously resident on a multiprocessor;
.IP "\(bu" 2
\fBmanagedMemSupported\fP is 1 if the device supports allocating managed memory, or 0 if it is not supported.
.IP "\(bu" 2
\fBisMultiGpuBoard\fP is 1 if the device is on a multi-GPU board (e.g. Gemini cards), and 0 if not;
.IP "\(bu" 2
\fBmultiGpuBoardGroupID\fP is a unique identifier for a group of devices associated with the same board. Devices on the same multi-GPU board will share the same identifier;
.PP
.PP
\fBParameters:\fP
.RS 4
\fIprop\fP - Properties for the specified device 
.br
\fIdevice\fP - Device number to get properties for
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorInvalidDevice\fP
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaGetDeviceCount\fP, \fBcudaGetDevice\fP, \fBcudaSetDevice\fP, \fBcudaChooseDevice\fP, \fBcudaDeviceGetAttribute\fP 
.RE
.PP

.SS "\fBcudaError_t\fP cudaIpcCloseMemHandle (void * devPtr)"
.PP
Unmaps memory returnd by \fBcudaIpcOpenMemHandle\fP. The original allocation in the exporting process as well as imported mappings in other processes will be unaffected.
.PP
Any resources used to enable peer access will be freed if this is the last mapping using them.
.PP
IPC functionality is restricted to devices with support for unified addressing on Linux operating systems.
.PP
\fBParameters:\fP
.RS 4
\fIdevPtr\fP - Device pointer returned by \fBcudaIpcOpenMemHandle\fP
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorMapBufferObjectFailed\fP, \fBcudaErrorInvalidResourceHandle\fP,
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaMalloc\fP, \fBcudaFree\fP, \fBcudaIpcGetEventHandle\fP, \fBcudaIpcOpenEventHandle\fP, \fBcudaIpcGetMemHandle\fP, \fBcudaIpcOpenMemHandle\fP, 
.RE
.PP

.SS "\fBcudaError_t\fP cudaIpcGetEventHandle (\fBcudaIpcEventHandle_t\fP * handle, \fBcudaEvent_t\fP event)"
.PP
Takes as input a previously allocated event. This event must have been created with the \fBcudaEventInterprocess\fP and \fBcudaEventDisableTiming\fP flags set. This opaque handle may be copied into other processes and opened with \fBcudaIpcOpenEventHandle\fP to allow efficient hardware synchronization between GPU work in different processes.
.PP
After the event has been been opened in the importing process, \fBcudaEventRecord\fP, \fBcudaEventSynchronize\fP, \fBcudaStreamWaitEvent\fP and \fBcudaEventQuery\fP may be used in either process. Performing operations on the imported event after the exported event has been freed with \fBcudaEventDestroy\fP will result in undefined behavior.
.PP
IPC functionality is restricted to devices with support for unified addressing on Linux operating systems.
.PP
\fBParameters:\fP
.RS 4
\fIhandle\fP - Pointer to a user allocated cudaIpcEventHandle in which to return the opaque event handle 
.br
\fIevent\fP - Event allocated with \fBcudaEventInterprocess\fP and \fBcudaEventDisableTiming\fP flags.
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorInvalidResourceHandle\fP, \fBcudaErrorMemoryAllocation\fP, \fBcudaErrorMapBufferObjectFailed\fP
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaEventCreate\fP, \fBcudaEventDestroy\fP, \fBcudaEventSynchronize\fP, \fBcudaEventQuery\fP, \fBcudaStreamWaitEvent\fP, \fBcudaIpcOpenEventHandle\fP, \fBcudaIpcGetMemHandle\fP, \fBcudaIpcOpenMemHandle\fP, \fBcudaIpcCloseMemHandle\fP 
.RE
.PP

.SS "\fBcudaError_t\fP cudaIpcGetMemHandle (\fBcudaIpcMemHandle_t\fP * handle, void * devPtr)"
.PP
Takes a pointer to the base of an existing device memory allocation created with \fBcudaMalloc\fP and exports it for use in another process. This is a lightweight operation and may be called multiple times on an allocation without adverse effects.
.PP
If a region of memory is freed with \fBcudaFree\fP and a subsequent call to \fBcudaMalloc\fP returns memory with the same device address, \fBcudaIpcGetMemHandle\fP will return a unique handle for the new memory.
.PP
IPC functionality is restricted to devices with support for unified addressing on Linux operating systems.
.PP
\fBParameters:\fP
.RS 4
\fIhandle\fP - Pointer to user allocated cudaIpcMemHandle to return the handle in. 
.br
\fIdevPtr\fP - Base pointer to previously allocated device memory
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorInvalidResourceHandle\fP, \fBcudaErrorMemoryAllocation\fP, \fBcudaErrorMapBufferObjectFailed\fP,
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaMalloc\fP, \fBcudaFree\fP, \fBcudaIpcGetEventHandle\fP, \fBcudaIpcOpenEventHandle\fP, \fBcudaIpcOpenMemHandle\fP, \fBcudaIpcCloseMemHandle\fP 
.RE
.PP

.SS "\fBcudaError_t\fP cudaIpcOpenEventHandle (\fBcudaEvent_t\fP * event, \fBcudaIpcEventHandle_t\fP handle)"
.PP
Opens an interprocess event handle exported from another process with \fBcudaIpcGetEventHandle\fP. This function returns a \fBcudaEvent_t\fP that behaves like a locally created event with the \fBcudaEventDisableTiming\fP flag specified. This event must be freed with \fBcudaEventDestroy\fP.
.PP
Performing operations on the imported event after the exported event has been freed with \fBcudaEventDestroy\fP will result in undefined behavior.
.PP
IPC functionality is restricted to devices with support for unified addressing on Linux operating systems.
.PP
\fBParameters:\fP
.RS 4
\fIevent\fP - Returns the imported event 
.br
\fIhandle\fP - Interprocess handle to open
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorMapBufferObjectFailed\fP, \fBcudaErrorInvalidResourceHandle\fP
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaEventCreate\fP, \fBcudaEventDestroy\fP, \fBcudaEventSynchronize\fP, \fBcudaEventQuery\fP, \fBcudaStreamWaitEvent\fP, \fBcudaIpcGetEventHandle\fP, \fBcudaIpcGetMemHandle\fP, \fBcudaIpcOpenMemHandle\fP, \fBcudaIpcCloseMemHandle\fP 
.RE
.PP

.SS "\fBcudaError_t\fP cudaIpcOpenMemHandle (void ** devPtr, \fBcudaIpcMemHandle_t\fP handle, unsigned int flags)"
.PP
Maps memory exported from another process with \fBcudaIpcGetMemHandle\fP into the current device address space. For contexts on different devices \fBcudaIpcOpenMemHandle\fP can attempt to enable peer access between the devices as if the user called \fBcudaDeviceEnablePeerAccess\fP. This behavior is controlled by the \fBcudaIpcMemLazyEnablePeerAccess\fP flag. \fBcudaDeviceCanAccessPeer\fP can determine if a mapping is possible.
.PP
Contexts that may open cudaIpcMemHandles are restricted in the following way. cudaIpcMemHandles from each device in a given process may only be opened by one context per device per other process.
.PP
Memory returned from \fBcudaIpcOpenMemHandle\fP must be freed with \fBcudaIpcCloseMemHandle\fP.
.PP
Calling \fBcudaFree\fP on an exported memory region before calling \fBcudaIpcCloseMemHandle\fP in the importing context will result in undefined behavior.
.PP
IPC functionality is restricted to devices with support for unified addressing on Linux operating systems.
.PP
\fBParameters:\fP
.RS 4
\fIdevPtr\fP - Returned device pointer 
.br
\fIhandle\fP - cudaIpcMemHandle to open 
.br
\fIflags\fP - Flags for this operation. Must be specified as \fBcudaIpcMemLazyEnablePeerAccess\fP
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorMapBufferObjectFailed\fP, \fBcudaErrorInvalidResourceHandle\fP, \fBcudaErrorTooManyPeers\fP
.RE
.PP
\fBNote:\fP
.RS 4
No guarantees are made about the address returned in \fC*devPtr\fP. In particular, multiple processes may not receive the same address for the same \fChandle\fP.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaMalloc\fP, \fBcudaFree\fP, \fBcudaIpcGetEventHandle\fP, \fBcudaIpcOpenEventHandle\fP, \fBcudaIpcGetMemHandle\fP, \fBcudaIpcCloseMemHandle\fP, \fBcudaDeviceEnablePeerAccess\fP, \fBcudaDeviceCanAccessPeer\fP, 
.RE
.PP

.SS "\fBcudaError_t\fP cudaSetDevice (int device)"
.PP
Sets \fCdevice\fP as the current device for the calling host thread. Valid device id's are 0 to (\fBcudaGetDeviceCount()\fP - 1).
.PP
Any device memory subsequently allocated from this host thread using \fBcudaMalloc()\fP, \fBcudaMallocPitch()\fP or \fBcudaMallocArray()\fP will be physically resident on \fCdevice\fP. Any host memory allocated from this host thread using \fBcudaMallocHost()\fP or \fBcudaHostAlloc()\fP or \fBcudaHostRegister()\fP will have its lifetime associated with \fCdevice\fP. Any streams or events created from this host thread will be associated with \fCdevice\fP. Any kernels launched from this host thread using the <<<>>> operator or \fBcudaLaunch()\fP will be executed on \fCdevice\fP.
.PP
This call may be made from any host thread, to any device, and at any time. This function will do no synchronization with the previous or new device, and should be considered a very low overhead call.
.PP
\fBParameters:\fP
.RS 4
\fIdevice\fP - Device on which the active host thread should execute the device code.
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorInvalidDevice\fP, \fBcudaErrorDeviceAlreadyInUse\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaGetDeviceCount\fP, \fBcudaGetDevice\fP, \fBcudaGetDeviceProperties\fP, \fBcudaChooseDevice\fP 
.RE
.PP

.SS "\fBcudaError_t\fP cudaSetDeviceFlags (unsigned int flags)"
.PP
Records \fCflags\fP as the flags to use when initializing the current device. If no device has been made current to the calling thread then \fCflags\fP will be applied to the initialization of any device initialized by the calling host thread, unless that device has had its initialization flags set explicitly by this or any host thread.
.PP
If the current device has been set and that device has already been initialized then this call will fail with the error \fBcudaErrorSetOnActiveProcess\fP. In this case it is necessary to reset \fCdevice\fP using \fBcudaDeviceReset()\fP before the device's initialization flags may be set.
.PP
The two LSBs of the \fCflags\fP parameter can be used to control how the CPU thread interacts with the OS scheduler when waiting for results from the device.
.PP
.IP "\(bu" 2
\fBcudaDeviceScheduleAuto\fP: The default value if the \fCflags\fP parameter is zero, uses a heuristic based on the number of active CUDA contexts in the process \fCC\fP and the number of logical processors in the system \fCP\fP. If \fCC\fP > \fCP\fP, then CUDA will yield to other OS threads when waiting for the device, otherwise CUDA will not yield while waiting for results and actively spin on the processor.
.IP "\(bu" 2
\fBcudaDeviceScheduleSpin\fP: Instruct CUDA to actively spin when waiting for results from the device. This can decrease latency when waiting for the device, but may lower the performance of CPU threads if they are performing work in parallel with the CUDA thread.
.IP "\(bu" 2
\fBcudaDeviceScheduleYield\fP: Instruct CUDA to yield its thread when waiting for results from the device. This can increase latency when waiting for the device, but can increase the performance of CPU threads performing work in parallel with the device.
.IP "\(bu" 2
\fBcudaDeviceScheduleBlockingSync\fP: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the device to finish work.
.IP "\(bu" 2
\fBcudaDeviceBlockingSync\fP: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the device to finish work. 
.br
 \fBDeprecated:\fP This flag was deprecated as of CUDA 4.0 and replaced with \fBcudaDeviceScheduleBlockingSync\fP.
.IP "\(bu" 2
\fBcudaDeviceMapHost\fP: This flag must be set in order to allocate pinned host memory that is accessible to the device. If this flag is not set, \fBcudaHostGetDevicePointer()\fP will always return a failure code.
.IP "\(bu" 2
\fBcudaDeviceLmemResizeToMax\fP: Instruct CUDA to not reduce local memory after resizing local memory for a kernel. This can prevent thrashing by local memory allocations when launching many kernels with high local memory usage at the cost of potentially increased memory usage.
.PP
.PP
\fBParameters:\fP
.RS 4
\fIflags\fP - Parameters for device operation
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorInvalidDevice\fP, \fBcudaErrorSetOnActiveProcess\fP
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaGetDeviceCount\fP, \fBcudaGetDevice\fP, \fBcudaGetDeviceProperties\fP, \fBcudaSetDevice\fP, \fBcudaSetValidDevices\fP, \fBcudaChooseDevice\fP 
.RE
.PP

.SS "\fBcudaError_t\fP cudaSetValidDevices (int * device_arr, int len)"
.PP
Sets a list of devices for CUDA execution in priority order using \fCdevice_arr\fP. The parameter \fClen\fP specifies the number of elements in the list. CUDA will try devices from the list sequentially until it finds one that works. If this function is not called, or if it is called with a \fClen\fP of 0, then CUDA will go back to its default behavior of trying devices sequentially from a default list containing all of the available CUDA devices in the system. If a specified device ID in the list does not exist, this function will return \fBcudaErrorInvalidDevice\fP. If \fClen\fP is not 0 and \fCdevice_arr\fP is NULL or if \fClen\fP exceeds the number of devices in the system, then \fBcudaErrorInvalidValue\fP is returned.
.PP
\fBParameters:\fP
.RS 4
\fIdevice_arr\fP - List of devices to try 
.br
\fIlen\fP - Number of devices in specified list
.RE
.PP
\fBReturns:\fP
.RS 4
\fBcudaSuccess\fP, \fBcudaErrorInvalidValue\fP, \fBcudaErrorInvalidDevice\fP 
.RE
.PP
\fBNote:\fP
.RS 4
Note that this function may also return error codes from previous, asynchronous launches.
.RE
.PP
\fBSee also:\fP
.RS 4
\fBcudaGetDeviceCount\fP, \fBcudaSetDevice\fP, \fBcudaGetDeviceProperties\fP, \fBcudaSetDeviceFlags\fP, \fBcudaChooseDevice\fP 
.RE
.PP

.SH "Author"
.PP 
Generated automatically by Doxygen from the source code.