Fork me on GitHub

src/arraymancer/laser/strided_iteration/foreach

  Source Edit

Macros

macro forEach(args: varargs[untyped]): untyped

Parallel iteration over one or more tensors

Format: forEach x in a, y in b, z in c: x += y * z

The iteration strategy is selected at runtime depending of the tensors memory layout. If you know at compile-time that the tensors are contiguous or strided, use forEachContiguous or forEachStrided instead. Runtime selection requires duplicating the code body.

In the contiguous case: The threshold for parallelization by default is OMP_MEMORY_BOUND_GRAIN_SIZE = 1024 elementwise operations to process per cores.

Compiler will also be hinted to unroll loop for SIMD vectorization.

Otherwise if tensor is strided: The threshold for parallelization by default is OMP_MEMORY_BOUND_GRAIN_SIZE div OMP_NON_CONTIGUOUS_SCALE_FACTOR = 1024/4 = 256 elementwise operations to process per cores.

Use forEachStaged to fine-tune this default.

  Source Edit
macro forEachContiguous(args: varargs[untyped]): untyped

Parallel iteration over one or more contiguous tensors.

Format: forEachContiguous x in a, y in b, z in c: x += y * z

The threshold for parallelization by default is OMP_MEMORY_BOUND_GRAIN_SIZE = 1024 elementwise operations to process per cores.

Compiler will also be hinted to unroll loop for SIMD vectorization.

Use forEachStaged to fine-tune those defaults.

  Source Edit
macro forEachContiguousSerial(args: varargs[untyped]): untyped

Serial iteration over one or more contiguous tensors.

Format: forEachContiguousSerial x in a, y in b, z in c: x += y * z

  Source Edit
macro forEachSerial(args: varargs[untyped]): untyped

Serial iteration over one or more tensors

Format: forEachSerial x in a, y in b, z in c: x += y * z

openMP parameters will be ignored

The iteration strategy is selected at runtime depending of the tensors memory layout. If you know at compile-time that the tensors are contiguous or strided, use forEachContiguousSerial or forEachStridedSerial instead. Runtime selection requires duplicating the code body.

  Source Edit
macro forEachStrided(args: varargs[untyped]): untyped

Parallel iteration over one or more tensors of unknown strides for example resulting from most slices.

Format: forEachStrided x in a, y in b, z in c: x += y * z

The threshold for parallelization by default is OMP_MEMORY_BOUND_GRAIN_SIZE div OMP_NON_CONTIGUOUS_SCALE_FACTOR = 1024/4 = 256 elementwise operations to process per cores.

Use forEachStaged to fine-tune this default.

  Source Edit
macro forEachStridedSerial(args: varargs[untyped]): untyped

Serial iteration over one or more tensors of unknown strides for example resulting from most slices.

Format: forEachStridedSerial x in a, y in b, z in c: x += y * z

  Source Edit
Arraymancer Technical reference Tutorial Spellbook (How-To's) Under the hood