Imports

tensor

Procs

proc kaiming_normal(shape: varargs[int]; T: type): Tensor[T]: Kaiming He initialisation for trainable layers preceding a ReLU activation. Kaiming initialization is recommended for relu activated layers.

Weight is sampled from a normal distribution of mean 0 and standard deviation √(2/fan_in) with fan_in the number of input unit in the forward pass.

This preserves the magnitude of the variance of the weight during the forward pass

Paper:

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Source Edit
proc kaiming_uniform(shape: varargs[int]; T: type): Tensor[T]: Kaiming He initialisation for trainable layers preceding a ReLU activation. Kaiming initialization is recommended for relu activated layers.

Weight is sampled from an uniform distribution of range -√3 * √(2/fan_in), √3 * √(2/fan_in) with fan_in the number of input unit in the forward pass.

This preserves the magnitude of the variance of the weight during the forward pass

Paper:

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Source Edit
proc xavier_normal(shape: varargs[int]; T: type): Tensor[T]: Xavier Glorot initialisation for trainable layers preceding a linear activation (sigmoid, tanh). Xavier initialization is recommended for sigmoid, tanh and softsign activated layers.

Weight is sampled from a normal distribution of mean 0 and standard deviation √(2/(fan_in+fan_out)) with fan_in the number of input units in the forward pass. and fan_out the number of input units during the backward pass (and not output units during the forward pass).

This provides a balance between preserving the magnitudes of the variance of the weight during the forward pass, and the backward pass.

Paper:

Understanding the difficulty of training deep feedforward neural networks

Source Edit
proc xavier_uniform(shape: varargs[int]; T: type): Tensor[T]: Xavier Glorot initialisation for trainable layers preceding a linear activation (sigmoid, tanh). Xavier initialization is recommended for sigmoid, tanh and softsign activated layers.

Weight is sampled from an uniform distribution of range -√3 * √(2/(fan_in+fan_out)), √3 * √(2/(fan_in+fan_out)) with fan_in the number of input units in the forward pass. and fan_out the number of input units during the backward pass (and not output units during the forward pass).

This provides a balance between preserving the magnitudes of the variance of the weight during the forward pass, and the backward pass.

Paper:

Understanding the difficulty of training deep feedforward neural networks

Source Edit
proc yann_normal(shape: varargs[int]; T: type): Tensor[T]: Yann Lecun initialisation for trainable layers

Weight is sampled from a normal distribution of mean 0 and standard deviation √(1/fan_in) with fan_in the number of input unit in the forward pass.

This preserves the magnitude of the variance of the weight during the forward pass

Paper:

Efficient BackProp

Source Edit
proc yann_uniform(shape: varargs[int]; T: type): Tensor[T]: Yann Lecun initialisation for trainable layers

Weight is sampled from an uniform distribution of range √(3/fan_in), √(3/fan_in) with fan_in the number of input unit in the forward pass.

This preserves the magnitude of the variance of the weight during the forward pass

Paper:

Efficient BackProp

Source Edit