NVIDIA cuTENSOR
Project description
cuTENSOR is a high-performance CUDA library for tensor primitives.
Key Features
Extensive mixed-precision support:
FP64 inputs with FP32 compute.
FP32 inputs with FP16, BF16, or TF32 compute.
Complex-times-real operations.
Conjugate (without transpose) support.
Support for up to 64-dimensional tensors.
Arbitrary data layouts.
Trivially serializable data structures.
Main computational routines:
Direct (i.e., transpose-free) tensor contractions.
Support just-in-time compilation of dedicated kernels.
Tensor reductions (including partial reductions).
Element-wise tensor operations:
Support for various activation functions.
Support for padding of the output tensor
Arbitrary tensor permutations.
Conversion between different data types.
Documentation
Please refer to https://docs.nvidia.com/cuda/cutensor/index.html for the cuTENSOR documentation.
Installation
The cuTENSOR wheel can be installed as follows:
pip install cutensor-cuXX
where XX is the CUDA major version (currently CUDA 11 & 12 are supported). The package cutensor (without the -cuXX suffix) is deprecated. If you have cutensor installed, please remove it prior to installing cutensor-cuXX.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cutensor_cu12-2.0.1-py3-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5ae232d4c4a422a6c17864a13ef5e99a1134b056dcaf9ad9e38afead8791e0b |
|
MD5 | 93ae5963383bcabd5bc4d6d153ce7d49 |
|
BLAKE2b-256 | 66691351b43555c26c42d799228ffe5a2f2350b098e724ee9a6f8cce7d78aff7 |
Hashes for cutensor_cu12-2.0.1-py3-none-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4345d99b3dba3cef0b44199b094e40c24f4cc14ae8f259addb9288d1cba5023 |
|
MD5 | b1160e3261ff3117de7ee785eeb2705e |
|
BLAKE2b-256 | d1c5e5a0616154e03f72ed2e641cdea479bf246fb2e5fc66967c70d1c1493dd2 |
Hashes for cutensor_cu12-2.0.1-py3-none-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eceef4c91e4fd9d95bd4859de300074534cc1997c21b132446fc4be771f3e4fd |
|
MD5 | 2f632b11b0f1f71d7a86b200fdcce9d9 |
|
BLAKE2b-256 | 447333bf1dfddf31ae8419d2af48537fcca8ac172e355f6a0e9be9b282c39e42 |