Sophie

Sophie

distrib > Mageia > 5 > x86_64 > media > nonfree-updates > by-pkgid > fd8445e7e4d58b8cfe6e0150bd441ee1 > files > 966

nvidia-cuda-toolkit-devel-6.5.14-6.1.mga5.nonfree.x86_64.rpm

NVIDIA CUDA Toolkit Release Notes
---------------------------------


1. CUDA Toolkit Overview
------------------------

This section provides an overview of the system requirements
and major components of the CUDA Toolkit and points to
component locations after installation.

System Requirements
      The CUDA Toolkit is supported for Linux, Mac OS X, and
      Microsoft Windows. Specific system requirements are
      referenced below.

        * Linux: The latest information about support for the
          Linux platform can be found online at
          http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html.

        * Mac OS: The latest information about support for Mac
          OS X can be found online at
          http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-mac-os-x/index.html.

        * Windows: The latest information about support for
          Microsoft Windows can be found online at
          http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-microsoft-windows/index.html.

Compiler
      The CUDA-C and CUDA-C++ compiler, nvcc, is found in the
      bin/ directory. It is built on top of the NVVM
      optimizer, which is itself built on top of the LLVM
      compiler infrastructure. Developers who want to target
      NVVM directly can do so using the Compiler SDK, which is
      available in the nvvm/ directory.

Tools 
      The following development tools are available in the
      bin/ directory (except for NSight Visual Studio Edition
      (VSE) which is installed as a plug-in to Microsoft
      Visual Studio).

        * IDEs: nsight (Linux, Mac OS), NSight VSE (Windows)

        * Debuggers: cuda-memcheck, cuda-gdb (Linux, Mac OS),
          NSight VSE (Windows)

        * Profilers: nvprof, nvvp, NSight VSE (Windows)

        * Utilities: cuobjdump, nvdisasm, nvprune

Libraries
      The scientific and utility libraries listed below are
      available in the lib/ directory (DLLs on Windows are in
      bin/), and their interfaces are available in the
      include/ directory.

        * cublas (BLAS)

        * cublas_device (BLAS Kernel Interface)

        * cuda_occupancy (Kernel Occupancy Calculation [header
          file implementation])

        * cudadevrt (CUDA Device Runtime)

        * cudart (CUDA Runtime)

        * cufft (Fast Fourier Transform [FFT])

        * cupti (Profiling Tools Interface)

        * curand (Random Number Generation)

        * cusparse (Sparse Matrix)

        * npp (NVIDIA Performance Primitives [image and signal
          processing])

        * nvblas ("Drop-in" BLAS)

        * nvcuvid (CUDA Video Decoder [Windows, Linux])

        * thrust (Parallel Algorithm Library [header file
          implementation])

CUDA Samples
      Code samples that illustrate how to use various CUDA and
      library APIs are available in the samples/ directory on
      Linux and Mac OS, and are installed to
      C:\ProgramData\NVIDIA Corporation\CUDA Samples on
      Windows. On Linux and Mac OS, the samples/ directory is
      read-only and the samples must be copied to another
      location if they are to be modified. Further
      instructions can be found in the Getting Started Guides
      for Linux and Mac OS.

Documentation
      The most current version of these release notes can be
      found online at
      http://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html.

      Documentation, including Getting Started Guides,
      Programming Guides, API References, and Tools Guides,
      can be found in PDF form in the doc/pdf/ directory, or
      in HTML form at doc/html/index.html and online at
      http://docs.nvidia.com/cuda/index.html.

Other 
      The Open64 source files are controlled under terms of
      the GPL license. Current and previously released
      versions are located at
      ftp://download.nvidia.com/CUDAOpen64/.

      The CUDA-GDB source files are controlled under terms of
      the GPL license.

        * The source code for CUDA-GDB that shipped with CUDA
          5.5 and subsequent versions is located at
          https://github.com/NVIDIA/cuda-gdb.

        * The source code for CUDA-GDB that shipped with CUDA
          5.0 and previous versions is located at
          ftp://download.nvidia.com/CUDAOpen64/.


2. New Features
---------------


2.1. General CUDA

  * Added support for using _shfl intrinsics with all first
    class types. User source code already implementing this
    feature should be guarded with (CUDA_VERSION <= 6000) in
    order to compile against CUDA 6.5.

  * On Linux, Xid 13 dmesg error reporting has been improved
    to provide more detail and also to indicate which of the
    various potential causes of the Xid 13 error was to blame.

  * The Linux .run installation now comes with an
    uninstallation script, uninstall_cuda_6.5.pl, to help with
    uninstalling the toolkit during conversions to Debian/RPM
    installations.

  * On Linux, stubs that applications can link against at
    build time have been added for each library. This removes
    the need to have the full library installed when building
    an application. In addition to the CUDA Toolkit libraries,
    a stub has been provided for the CUDA Driver library (libcuda.so).
    See the NVIDIA CUDA Getting Started Guide for LINUX for
    details on how to use these stubs.


2.2. CUDA Tools


2.2.1. General CUDA Tools

  * Improved support for CUDA FORTRAN in the command-line
    debugging and profiling tools in the CUDA Toolkit,
    including new debugging support for FORTRAN arrays (in
    Linux only), improved source-to-assembly code correlation,
    and improved documentation. This improved support is
    available with PGI compiler version 14.4 and higher. CUDA
    FORTRAN support is a beta feature in the CUDA 6.5 release.


2.2.2. CUDA Compiler

  * (Windows) Support has been added for the C++ compiler (VC
    12) in Microsoft Visual Studio 2013 for Windows.

  * The default target GPU architecture (-arch) for nvcc has
    changed from sm_10 in previous releases to sm_20 in this
    release. Note that sm_20 is not the minimum target
    architecture supported by nvcc, since sm_11, sm_12, and
    sm_13 are still valid target GPU architectures if
    specified explicitly.

  * A new tool in CUDA 6.5, nvprune, prunes an object to only
    contain the compiled code for the specified architectures
    (for example, selects only the sm_35 code for
    libcublas_static.a). See the CUDA Binary Utilities
    document for more information.

  * (Linux) The cuobjdump utility for examining CUDA binaries
    is now available on Linux distributions running natively
    on the ARM architecture; this includes Android OS.


2.2.3. CUDA Occupancy Calculator

  * Added CUDA occupancy calculator and occupancy-based launch
    configuration API interfaces. These functions help set up
    execution configurations with reasonable occupancy.

    The stand-alone programmatic occupancy calculator
    implementation, cuda_occupancy.h, is rewritten and out of
    beta. Note that the API has changed significantly from the
    beta version included with CUDA 6.0. This file includes
    stand-alone implementations of both the occupancy
    calculator and the occupancy-based launch configuration
    functions, so applications can use them without depending
    on the entire CUDA software stack.


2.2.4. CUDA Profiling Tools Interface (CUPTI)

  * Instruction classification is done for the
    source-correlated instruction execution activity
    CUpti_ActivityInstructionExecution. See
    CUpti_ActivityInstructionClass for the instruction
    classes.

  *  Two new device attributes were added to the activity
    CUpti_DeviceAttribute:

      * CUPTI_DEVICE_ATTR_FLOP_SP_PER_CYCLE. Peak
        single-precision floating-point operations that can be
        performed in one cycle by the device.

      * CUPTI_DEVICE_ATTR_FLOP_DP_PER_CYCLE. Peak
        double-precision floating-point operations that can be
        performed in one cycle by the device.

  *  Two new metric device properties were added:

      * CUPTI_METRIC_PROPERTY_FLOP_SP. Peak single-precision
        floating point operations that can be performed in one
        cycle by the device.

      * CUPTI_METRIC_PROPERTY_FLOP_DP. Peak double-precision
        floating-point operations that can be performed in one
        cycle by the device.

  * Activity record CUpti_ActivityGlobalAccess2 for
    source-level global access information replaces
    CUpti_ActivityGlobalAccess, which has been deprecated. The
    new record adds information needed to map SASS assembly
    instructions to CUDA C source code; it also provides ideal
    L2 transaction counts based on access patterns.

  * Activity record CUpti_ActivityBranch2 for source-level
    branch information replaces CUpti_ActivityBranch, which
    has been deprecated. The new record adds information
    needed to map SASS assembly instructions to CUDA C source
    code.

  * Added a new sample to show how to map SASS assembly
    instructions to CUDA C source lines.


2.2.5. NVIDIA Visual Profiler

  * Visual Profiler now displays peak single-precision flops
    and peak double-precision flops for a GPU under Device
    properties.

  * The Visual Profiler Kernel profile analysis view has been
    updated with several enhancements.

      * Initially, the instruction with the maximum execution
        count is highlighted.

      * A bar is shown in the background of the counter value
        for the Exec Count column to make it easier to
        identify instructions with high execution counts.

      * The current assembly instruction block is highlighted
        using two horizontal lines around the block. Also,
        next and previous buttons have been added to move to
        the next or previous block of assembly instructions.

      * Syntax highlighting is done for the CUDA C source.

      * A tooltip describing each column has been added.

  * The Visual Profiler Kernel memory analysis view has been
    updated with several enhancements.

      * Added ECC overhead, which provides a count of memory
        transactions required for ECC.

      * For L2 cache, added a split of transactions for L1
        reads, L1 writes, texture reads, atomic reads, and
        noncoherent reads.

      * For L1 cache, added a count of atomic transactions.

  * Visual Profiler and nvprof now support a new application
    replay mode for collecting multiple events and metrics. In
    this mode, the application is run multiple times instead
    of using kernel replay. This is useful for cases when the
    kernel uses a large amount of device memory and the use of
    kernel replay is slow due to the high overhead of saving
    and restoring device memory for each kernel replay run. In
    Visual Profiler, this new application replay mode is
    enabled in the New Session dialog.


2.3. CUDA Libraries


2.3.1. General CUDA Libraries

  * Starting with the CUDA 6.5 release on Linux and Mac OS,
    the cuBLAS, cuSPARSE, cuFFT, cuRAND, and NPP libraries are
    provided as static libraries in addition to being provided
    as shared libraries. These new static libraries depend on
    a common thread abstraction layer library cuLIBOS (libculibos.a)
    that is now distributed as part of the toolkit.
    Consequently, cuLIBOS must be provided to the linker when
    at least one of these static libraries is being linked
    against. For example, on Linux, to compile an application
    using cuBLAS and cuRAND against the static versions of
    these libraries, the following command should be used:

    gcc myApp.c libcublas_static.a libcurand_static.a libculibos.a -o myApp

    Note that libculibos.a is not needed when the shared
    version of these libraries is used.


2.3.2. cuBLAS Library

  * The cublas<T>trsmBatched() routine no longer limits the m
    and n dimensions to 32. However, the routine is still
    intended to be used for matrices of relatively small size,
    for which the performance of calling cublas<T>trsm()
    multiple times would be limited by kernel launch overhead.
    Performance has also been significantly improved for n >
    1.

  * The cuBLAS Library now offers the batched routines
    cublas<T>geqrfBatched() and cublas<T>gelsBatched(), which
    are respectively a batched QR factorization and a batched
    least-squares solver for over-determined systems.


2.3.3. cuFFT Library

  * User-specified routines can now operate directly on cuFFT
    input or output data. The new cufftXt*Callback() APIs are
    used to specify which user-defined routines will be called
    when each data point is loaded or stored by the cuFFT
    kernels, potentially reducing the overall number of
    accesses to device memory.

  * Starting with cuFFT in CUDA 6.5, single 2D or 3D FFTs on
    multiple GPUs can be performed without the need for
    transposing data between successive FFTs. In prior
    releases it was necessary to transpose the data before
    performing a second FFT on multiple GPUs.


2.3.4. cuSPARSE Library

  * The cuSPARSE Library added two new routines that support
    the BSR format, cusparse<T>bsrmm() and cusparse<T>bsrsm(),
    which are respectively the multiplication of a matrix in
    BSR format by a dense matrix, and the solve of a
    triangular matrix in BSR format against multiple
    right-hand sides.


3. Unsupported Features
-----------------------

The following features are officially unsupported in the
current release. Developers must employ alternative solutions
to these features in their software.

Windows XP 64-bit Edition Support
      With this release, CUDA no longer supports the 64-bit
      version of the Windows XP operating system, although
      CUDA on the 32-bit version of Windows XP is still
      supported. We recommend that developers and users of the
      64-bit version of Windows XP migrate to Windows 7 or
      Windows 8.1, which are supported in the current and
      future CUDA releases.

Windows Vista Support
      This CUDA release no longer supports the Windows Vista
      operating system. We recommend that users and developers
      migrate to Windows 7 or Windows 8.1, which are supported
      in the current and future releases.

Windows Server 2012 Support
      CUDA on the Windows Server 2012 operating system is not
      supported in this CUDA release. We recommend that users
      and developers migrate to Windows Server 2012 R2, which
      is supported in the current and future releases.

(Linux) Support for 32-bit Applications on x86-based Linux
Distributions
      Several portions of the CUDA Toolkit are no longer
      available for developing 32-bit applications on
      x86-based Linux distributions:

        * .deb installer packages for 32-bit CUDA Toolkit
          components

        * CUDA Toolkit scientific libraries, including cuBLAS,
          cuSPARSE, cuFFT, cuRAND, and NPP

        * Thrust

        * Quadro and Tesla products

        * Tesla (sm_1x) and Fermi (sm_2x) architectures

        * CUDA Samples

      The above list also applies to 32-bit components and
      32-bit rpm/deb packages on 64-bit x86-based Linux
      distributions. The 64-bit components are unaffected by
      these changes.

(Mac OS X) Support for 32-bit CUDA and OpenCL Applications on
Mac OS X
      Developing and running 32-bit CUDA and OpenCL
      applications on Mac OS X platforms is no longer
      supported in the CUDA Toolkit and in the CUDA Driver.
      Legacy 32-bit CUDA and OpenCL applications will not run
      on this version of the CUDA Driver on Mac OS X
      platforms.

Targeting sm_10 (G80) for CUDA Applications
      The CUDA Toolkit no longer supports the sm_10 target
      architecture (the G80 architecture) for CUDA and OpenCL
      applications.

CUDA Video Encoder (NVCUVENC)
      Building applications with the CUDA Video Encoder
      interface is no longer supported; however, the driver
      will continue to run applications built against this
      interface. We recommend using the NVIDIA Encoder API
      (NVENC), a newer video encoding interface that is
      available at
      https://developer.nvidia.com/nvidia-video-codec-sdk.


4. Deprecated Features
----------------------

The following features are deprecated in the current release
of the CUDA software. The features still work in the current
release, but their documentation may have been removed, and
they will become officially unsupported in a future release.
We recommend that developers employ alternative solutions to
these features in their software.

Tesla and Quadro Products and CUDA Toolkit on 32-bit Windows
Platforms
      Support for the CUDA Toolkit on 32-bit Windows platforms
      is deprecated, as is support for Tesla and Quadro
      products for the CUDA driver on 32-bit Windows
      platforms. Additionally, on 64-bit Windows platforms,
      support for the following features for 32-bit CUDA and
      OpenCL applications is deprecated from the CUDA driver
      and CUDA toolkit, as appropriate:

        * Tesla and Quadro products

        * CUDA Toolkit scientific libraries, including cuBLAS,
          cuSPARSE, cuFFT, cuRAND, and NPP

        * Thrust

        * CUDA samples

      This deprecation notice doesn't impact any 64-bit
      components.

Interop with IDirect3D9 objects on Microsoft Windows 7 and
Later 
      This release deprecates support for interop with
      IDirect3D9 objects on Windows 7 and later Microsoft
      operating systems. This applies to the cuD3D9*() and
      cuGraphicsD3D9RegisterResource() routines in the Driver
      API, as well as the corresponding cudaD3D9*() and
      cudaGraphicsD3D9RegisterResource() routines in the
      Runtime API. We recommend using IDirect3D9ex objects,
      which will work with these same routines, instead.

Linux RHEL 5 and CentOS 5 Support
      Support for CUDA on the RHEL 5 and CentOS 5 Linux
      distributions is deprecated in this CUDA release and
      will be dropped in a future release. We recommend that
      users and developers migrate to RHEL 6, which is
      supported in the current and future releases.

Support for sm_10, sm_11, sm_12, and sm_13 Architectures
      The sm_10 architecture is deprecated within the CUDA
      Driver, and the sm_11, sm_12, and sm_13 architectures
      are deprecated within the CUDA Toolkit and the CUDA
      Driver. Support for these architectures will be removed
      in the next major version of the CUDA Toolkit and
      Driver. Note that support for the sm_10 architecture has
      already been removed from the CUDA Toolkit.

Developing and Running 32-bit CUDA and OpenCL Applications on
x86 Linux Platforms
      Support for developing and running 32-bit CUDA and
      OpenCL applications on x86 Linux platforms is
      deprecated. This implies the following:

        * Support is currently still available in the toolkit
          and driver.

        * Support may be dropped from the toolkit in a future
          release, and similarly from the driver.

        * New features may not have support for 32-bit x86
          Linux applications.

        * This notice applies to running applications on a
          32-bit Linux kernel, and also to running 32-bit
          applications on a 64-bit Linux kernel.

        * This notice applies to x86 architectures only;
          32-bit Linux applications are still officially
          supported and are not deprecated on the ARM
          architecture.

CUPTI Activity Records
      Activity record CUpti_ActivityGlobalAccess for
      source-level global access information has been
      deprecated and replaced by the new activity record
      CUpti_ActivityGlobalAccess2. Activity record
      CUpti_ActivityBranch for source-level branch information
      has been deprecated and replaced by the new activity
      record CUpti_ActivityBranch2.


5. Performance Improvements
---------------------------


5.1. General CUDA

  * MPS performance has been improved: launch performance has
    been improved from 7 to 5 microseconds; launch and
    synchronize performance has been improved from 35 to 15
    microseconds.


5.2. CUDA Libraries


5.2.1. CUDA Math Library

  * Performance has been increased for these single-precision
    functions in CUDA 6.5: acoshf(), asinhf(), atanf(),
    atan2f(), atanhf(), cyl_bessel_i0f(), cyl_bessel_i1f(),
    cbrtf(), coshf(), erfcf(), erfcinvf(), erfcxf(), erfinvf(),
    expf(), exp10f(), expm1f(), fdiv_rd(), fdiv_rn(),
    fdiv_ru(), fdiv_rz(), fmodf(), frcp_rd(), frcp_rn(),
    frcp_ru(), frcp_rz(), frsqrt_rn(), hypotf(), logf(),
    log10f(), log1pf(), log2f(), normcdff(), normcdfinvf(),
    powf(), remainderf(), remquof(), rhypotf(), sincospif(),
    sinhf(), sinpif(), and tanhf(). Of these, atanf(), expf(),
    exp10f(), expm1f(), hypotf(), and rhypotf() show
    especially marked improvement.

  * Performance has been increased for these double-precision
    functions in CUDA 6.5: acosh(), asin(), asinh(), atan(),
    atanh(), cyl_bessel_i0, cyl_bessel_i1(), cbrt(), cospi(),
    div(), erfc(), erfcx(), erfinv(), exp2(), fmod(), hypot(),
    log(), log10(), log1p(), log2(), normcdf(), pow(), rcbrt(),
    remainder(), remquo(), rhypot(), sincospi(), sinpi(), and
    tan(). Of these, acosh(), atan(), cbrt(), hypot(), and
    rhypot() show especially marked improvement.

  * Performance of the double-precision square root function,
    sqrt(), was significantly improved for GPUs with compute
    capability 2.0 and above.

  * Performance of the double-precision reciprocal square root
    function, rsqrt(), was significantly improved for GPUs
    with compute capability 2.0 and above.


6. Resolved Issues
------------------


6.1. General CUDA

  * (Linux) A driver packaging issue that forced users on
    Redhat and Fedora to ensure that the
    xorg-x11-drv-nvidia-devel package was installed has been
    resovled.

  * The device memory heap size, set using
    cudaDeviceSetLimit(cudaLimitMallocHeapSize, *) or
    cuCtxSetLimit(CU_LIMIT_MALLOC_HEAP_SIZE, *), is no longer
    limited to a size of 4,294,967,296 (4 GB).


7. Known Issues
---------------


7.1. Linux on ARMv7 Specific Issues

  * Mapping host memory allocated outside of CUDA to device
    memory is not allowed on ARM; because of this,
    cudaHostRegister() is not supported by the CUDA driver on
    ARM platforms. If required, cudaHostAlloc() with the flag
    cudaHostAllocMapped can be used to allocate device-mapped
    host-accessible memory.


7.2. General CUDA

  * The cuda and gpu-deployment-kit packages must be installed
    by separate executions of yum. See the Linux Getting
    Started Guide for more details.

  * On openSUSE and SLES, X will fail to load if the CUDA
    Toolkit RPM packages are installed using relocation
    immediately following an installation of the cuda-drivers
    package (and its dependencies). Users should reboot in
    between the driver and toolkit installations. Executing
    nvidia-xconfig may rescue a system where X has failed to
    load in this situation.

  * The CUDA drivers may fail to install if the RPMFusion
    repository is enabled at the same time as the CUDA
    repository. When installing CUDA, the
    --disablerepo="rpmfusion-nonfree*" option should be used.
    For example, to install the cuda package: yum
    --disablerepo="rpmfusion-nonfree*" install cuda.

  * (Mac OS) When CUDA applications are run on 2012 MacBook
    Pro models, allowing or forcing the system to go to sleep
    causes a system crash (kernel panic). To prevent the
    computer from automatically going to sleep, set the
    Computer Sleep option slider to Never in the Energy Saver
    pane of the System Preferences.

  * The CUDA reference manual incorrectly describes the type
    of CUdeviceptr as an unsigned int on all platforms. On
    64-bit platforms, a CUdeviceptr is an unsigned long long,
    not an unsigned int.

  * Peer access is disabled between two devices if either of
    them is in SLI mode.

  * On multi-GPU configurations without P2P support between
    any pair of devices that support Unified Memory, managed
    memory allocations are placed in zero-copy memory. When
    data is migrated, this results in lower performance than
    the default managed memory behavior. In certain cases, the
    environment variable CUDA_MANAGED_FORCE_DEVICE_ALLOC can
    be set to force managed allocations to be in device memory
    and to enable migration on these hardware configurations.
    Normally, using the environment variable
    CUDA_VISIBLE_DEVICES is recommended to restrict CUDA to
    only use those GPUs that have P2P support. Please refer to
    the environment variables section in the CUDA C
    Programming Guide for further details.


7.3. CUDA Tools


7.3.1. CUDA Compiler

  *  (Mac OS) When Clang is used as the host compiler, 32-bit
    target compilation on OS X is not supported. This is
    because the Clang compiler doesn't support the
    -malign-double switch that the NVCC compiler needs to
    properly align double-precision structure fields when
    compiling for a 32-bit target (GCC does support this
    switch). Note that GCC is the default host compiler used
    by NVCC on OS X 10.8 and Clang is the default on OS X
    10.9.

  *  The NVCC compiler doesn't accept Unicode characters in
    any filename or path provided as a command-line parameter.

  *  A CUDA program may not compile correctly if a type or
    typedefT is private to a class or a structure, and at
    least one of the following is satisfied:

      * T is a parameter type for a __global__ function.

      * T is an argument type for a template instantiation of
        a __global__ function.

    This restriction will be fixed in a future release.

  *  (Mac OS) The documentation surrounding the use of the
    flag -malign-double suggests it be used to make the struct
    size the same between host and device code. We know now
    that this flag causes problems with other host libraries.
    The CUDA documentation will be updated to reflect this.
    The workaround for this issue is to manually add padding
    so that the structs between the host compiler and CUDA are
    consistent.


7.3.2. CUDA-GDB

  * There can be a significant performance degradation for
    large routines when the debugger steps over inlined
    routines. This happens because inlined code blocks may
    have multiple exit points under the hood, and the debugger
    steps every single instruction until an exit point is
    reached.


7.3.3. Nsight Eclipse Edition

  * On Linux, the NVIDIA Visual Profiler (nvvp) and the Nsight
    IDE (nsight) do not run properly when the oxygen-gtk theme
    is used. If you experience such crashes, please uninstall
    the oxygen-gtk theme. The command to do this on OpenSUSE
    is sudo zypper rm gtk2-theme-oxygen and on Ubuntu is sudo
    apt-get remove gtk2-engines-oxygen.


7.3.4. NVIDIA Visual Profiler

  * (Windows) Using the mouse wheel button to scroll does not
    work within the Visual Profiler on Windows.

  * (Mac OS) Visual Profiler events and metrics do not work
    correctly on Mac OS X 10.9.3. Mac OS X 10.9.2 can be used
    as a workaround.


7.4. CUDA Libraries


7.4.1. cuFFT Library

  * In the CUDA 6.5 Early Access release, there are some
    limitations in the cuFFT callback implementation.

      * The static version of the cuFFT library is not
        supported on 32-bit Windows systems; consequently, the
        callback feature is not supported there either.

      * If the size of any dimension cannot be factored into a
        combination of powers of 2, 3, 5, and 7 (that is, the
        size has a prime factor of 11 or greater), the
        callback routine cannot safely call __syncthreads().

      * For 2D and 3D transforms, if the size of any dimension
        has a prime factor of 131 or greater,
        cufftUnsetCallback() does not function correctly.

      * For 2D and 3D C2C transforms, if any dimension has a
        prime factor of 131 or greater, the store() callback
        does not function correctly.

      * For multi-GPU C2R and R2C plans, callbacks are not
        supported if the batch size is greater than one and
        any dimension has a prime factor of 131 or greater.


7.4.2. Thrust Library

  * (Linux) There is a known issue that causes the
    TestGetTemporaryBufferDispatchExplicit and
    TestGetTemporaryBufferDispatchImplicit unit tests provided
    with the Thrust library to fail on the SLES 11 Linux
    distribution.

  * (Linux) There is a known issue that causes the
    segmentationTreeThrust CUDA sample in the 6_Advanced
    directory to fail on the SLES 11 Linux distribution.


7.4.3. CUDA Samples

  * On 32-bit Windows systems, certain samples may fail to
    compile due to the compiler exhausting available memory,
    especially if compiling is done in Debug mode or if the
    sample is using Dynamic Parallelism.


Notices
-------


Notice

ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES,
DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER
AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS."
NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR
OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY
DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT,
MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.

Information furnished is believed to be accurate and reliable.
However, NVIDIA Corporation assumes no responsibility for the
consequences of use of such information or for any
infringement of patents or other rights of third parties that
may result from its use. No license is granted by implication
of otherwise under any patent rights of NVIDIA Corporation.
Specifications mentioned in this publication are subject to
change without notice. This publication supersedes and
replaces all other information previously supplied. NVIDIA
Corporation products are not authorized as critical components
in life support devices or systems without express written
approval of NVIDIA Corporation.


Trademarks

NVIDIA and the NVIDIA logo are trademarks or registered
trademarks of NVIDIA Corporation in the U.S. and other
countries. Other company and product names may be trademarks
of the respective companies with which they are associated.


Copyright

© 2007-2014 NVIDIA Corporation. All rights reserved.


-------------------------------------------------------------