Imports

compiler_optim_hints, openmp, align_unroller, gemm_tiling, gemm_utils, gemm_packing, gemm_ukernel_dispatch, gemm

Procs

proc gemm_packed[T: SomeNumber](M, N, K: int; alpha: T; packedA: ptr (T or UncheckedArray[T]); packedB: ptr (T or UncheckedArray[T]); beta: T; C: ptr (T or UncheckedArray[T]); rowStrideC, colStrideC: int): Source Edit
proc gemm_prepackA[T](dst_packedA: ptr (T or UncheckedArray[T]); M, N, K: int; src_A: ptr T; rowStrideA, colStrideA: int): Prepack matrix A of shape MxK and strides rowStrideA and colStrideA for matrix multiplication. A must be 64-bit aligned.

For optimal performance packing is machine and architecture dependent i.e. it depends on detected features like AVX and number of cores and may depend on your machine cache sizes in the future. It is unsafe to store or serialize it.
Source Edit
func gemm_prepackA_mem_required(T: typedesc; M, N, K: int): int: Returns the amount of memory that needs to be preallocated to pack matrix B. Source Edit
func gemm_prepackA_mem_required_impl(ukernel: static MicroKernel; T: typedesc; M, N, K: int): int: Source Edit
proc gemm_prepackB[T](dst_packedB: ptr (T or UncheckedArray[T]); M, N, K: int; src_B: ptr T; rowStrideB, colStrideB: int): Prepack matrix B of shape KxN and strides rowStrideB and colStrideB for matrix multiplication. B must be 64-bit aligned.

For optimal performance packing is machine and architecture dependent i.e. it depends on detected features like AVX and number of cores and may depend on your machine cache sizes in the future. It is unsafe to store or serialize it.
Source Edit
func gemm_prepackB_mem_required(T: type; M, N, K: int): int: Returns the amount of memory that needs to be preallocated to pack matrix B. Source Edit
func gemm_prepackB_mem_required_impl(ukernel: static MicroKernel; T: typedesc; M, N, K: int): int: Source Edit