Fork me on GitHub
Arraymancer Technical reference Tutorial Spellbook (How-To's) Under the hood


Arraymancer - A n-dimensional tensor (ndarray) library.

Arraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU and GPU ndarray library on which to build a scientific computing and in particular a deep learning ecosystem.

The library is inspired by Numpy and PyTorch. The library provides ergonomics very similar to Numpy, Julia and Matlab but is fully parallel and significantly faster than those libraries. It is also faster than C-based Torch.

Note: While Nim is compiled and does not offer an interactive REPL yet (like Jupyter), it allows much faster prototyping than C++ due to extremely fast compilation times. Arraymancer compiles in about 5 seconds on my dual-core MacBook.

Why Arraymancer

The Python community is struggling to bring Numpy up-to-speed

Why not use in a single language with all the blocks to build the most efficient scientific computing library with Python ergonomics.

OpenMP batteries included.

A researcher workflow is a fight against inefficiencies

Researchers in a heavy scientific computing domain often have the following workflow: Mathematica/Matlab/Python/R (prototyping) -> C/C++/Fortran (speed, memory)

Why not use in a language as productive as Python and as fast as C? Code once, and don’t spend months redoing the same thing at a lower level.

Tools available in labs are not available in production:

Nim is compiled, no need to worry about version conflicts and the whole toolchain being in need of fixing due to Python version updates. Furthermore, as long as your platform supports C, Arraymancer will run on it from Raspberry Pi to mobile phones and drones.

Note: Arraymancer Cuda/Cudnn backend is shaping up and convolutional neural nets are just around the corner.

Bridging the gap between deep learning research and production

The deep learning frameworks are currently in two camps: - Research: Theano, Tensorflow, Keras, Torch, PyTorch - Production: Caffe, Darknet, (Tensorflow)

Furthermore, Python preprocessing steps, unless using OpenCV, often needs a custom implementation (think text/speech preprocessing on phones).

So why Arraymancer ?

All those pain points may seem like a huge undertaking however thanks to the Nim language, we can have Arraymancer: - Be as fast as C - Accelerated routines with Intel MKL/OpenBLAS or even NNPACK - Access to CUDA and CuDNN and generate custom CUDA kernels on the fly via metaprogramming.

  • A Python-like syntax with custom operators a * b

for tensor multiplication instead of (Numpy/Tensorflow) or (Torch) - Numpy-like slicing ergonomics t[0..4, 2..10|2]

  • For everything that Nim doesn’t have yet, you can use Nim bindings to

C, C++, Objective-C or Javascript to bring it to Nim. Nim also has unofficial Python->Nim and Nim->Python wrappers.

Future ambitions

Because apparently to be successful you need a vision, I would like Arraymancer to be:

Support (Types, OS, Hardware)

Arraymancer’s tensors supports arbitrary types (floats, strings, objects …).

Arraymancer run anywhere you can compile C code. Linux, MacOS are supported, Windows should work too as Appveyor (Continuous Integration for Windows) never flash red. Optionally you can compile Arraymancer with Cuda support.

Note: Arraymancer Tensors and CudaTensors are tensors in the machine learning sense (multidimensional array) not in the mathematical sense (describe transformation laws)


EXPERIMENTAL: Arraymancer may summon Ragnarok and cause the heat death of the Universe.

  1. Display of 5-dimensional or more tensors is not implemented. (To be honest Christopher Nolan had the same issue in Interstellar)


Nim is available in some Linux repositories and on Homebrew for macOS.

I however recommend installing Nim in your user profile via ```choosenim`` <>`__. Once choosenim installed Nim, you can nimble install arraymancer which will pull arraymancer and all its dependencies.

Tensors on CPU and on Cuda

Tensors and CudaTensors do not have the same features implemented yet. Also Cuda Tensors can only be float32 or float64 while Cpu Tensor can be integers, string, boolean or any custom object.

Here is a comparative table.

Accessing tensor properties[x][x][x]
Tensor creation[x]by converting a cpu Tensorby converting a cpu Tensor
Accessing or modifying a single value[x][][]
Iterating on a Tensor[x][][]
Slicing a Tensor[x][x][x]
Slice mutation a[1,_] = 10[x][][]
Comparison ==[x][][]
Element-wise basic operations[x][x][x]
Universal functions[x][][]
Automatically broadcasted operations[x][x][x]
Matrix-Matrix and Matrix vector multiplication[x][x][x]
Displaying a tensor[x][x][x]
Higher-order functions (map, apply, reduce, fold)[x]internal onlyinternal only
Converting to contiguous[x][x][]
Explicit broadcast[x][x][x]
Permuting dimensions[x][][]
Concatenating along existing dimensions[x][][]
Squeezing singleton dimensions[x][x][]
Slicing + squeezing in one operation[x][][]