NVIDIA cuTENSOR
Project description
cuTENSOR is a high-performance CUDA library for tensor primitives.
Key Features
Extensive mixed-precision support:
FP64 inputs with FP32 compute.
FP32 inputs with FP16, BF16, or TF32 compute.
Complex-times-real operations.
Conjugate (without transpose) support.
Support for up to 64-dimensional tensors.
Arbitrary data layouts.
Trivially serializable data structures.
Main computational routines:
Direct (i.e., transpose-free) tensor contractions.
Support just-in-time compilation of dedicated kernels.
Tensor reductions (including partial reductions).
Element-wise tensor operations:
Support for various activation functions.
Support for padding of the output tensor
Arbitrary tensor permutations.
Conversion between different data types.
Documentation
Please refer to https://docs.nvidia.com/cuda/cutensor/index.html for the cuTENSOR documentation.
Installation
The cuTENSOR wheel can be installed as follows:
pip install cutensor-cuXX
where XX is the CUDA major version (currently CUDA 11 & 12 are supported). The package cutensor (without the -cuXX suffix) is deprecated. If you have cutensor installed, please remove it prior to installing cutensor-cuXX.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cutensor_cu11-2.0.1-py3-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | be1d277febf2d507d06349fd5455c56a11d9177906050969b41802795c991ba8 |
|
MD5 | d64ea710ff5b5e0ebc622a18735ffc2b |
|
BLAKE2b-256 | 4249ebeeb12e73eab21aff1249922a32e4153aae5bcccfab718ef463591a3b53 |
Hashes for cutensor_cu11-2.0.1-py3-none-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1777d1dba54fbab69df515717cb658f61338d550aec7cf868eb99bd091bee57 |
|
MD5 | e92503e1a7a70026218d59ba2c925a52 |
|
BLAKE2b-256 | ac360374fd785658de9e96258575554eeabf5606607ca18bf9439a9250db981f |
Hashes for cutensor_cu11-2.0.1-py3-none-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c39ca2a10b541c7c1dfedd03103d145f4fdacaebfa1ed77e62307c2757390e0f |
|
MD5 | bc524bd69850cf15f8ca10dd38d76b96 |
|
BLAKE2b-256 | 4308629a92a86b5a9c2d9587ca0de5f82f9bf5eceb3780df427ef86efd253858 |