Macros
macro forEachStaged(args: varargs[untyped]): untyped
-
Staged optionally parallel iteration over one or more tensors This is useful if you need thread-local initialization or cleanup before the parallel loop Example usage for reduction
forEachStaged xi in x, yi in y: openmp_config: use_openmp: true use_simd: false nowait: true omp_grain_size: OMP_MEMORY_BOUND_GRAIN_SIZE iteration_kind: {contiguous, strided} # Default, "contiguous", "strided" are also possible before_loop: var local_sum = 0.T in_loop: local_sum += xi + yi after_loop: omp_critical: result += local_sum
Source Edit