Procs
proc kaiming_normal(shape: varargs[int]; T: type): Tensor[T]
-
Kaiming He initialisation for trainable layers preceding a ReLU activation. Kaiming initialization is recommended for relu activated layers.
Weight is sampled from a normal distribution of mean 0 and standard deviation √(2/fan_in) with fan_in the number of input unit in the forward pass.
This preserves the magnitude of the variance of the weight during the forward pass
Paper:
Source Edit proc kaiming_uniform(shape: varargs[int]; T: type): Tensor[T]
-
Kaiming He initialisation for trainable layers preceding a ReLU activation. Kaiming initialization is recommended for relu activated layers.
Weight is sampled from an uniform distribution of range -√3 * √(2/fan_in), √3 * √(2/fan_in) with fan_in the number of input unit in the forward pass.
This preserves the magnitude of the variance of the weight during the forward pass
Paper:
Source Edit proc xavier_normal(shape: varargs[int]; T: type): Tensor[T]
-
Xavier Glorot initialisation for trainable layers preceding a linear activation (sigmoid, tanh). Xavier initialization is recommended for sigmoid, tanh and softsign activated layers.
Weight is sampled from a normal distribution of mean 0 and standard deviation √(2/(fan_in+fan_out)) with fan_in the number of input units in the forward pass. and fan_out the number of input units during the backward pass (and not output units during the forward pass).
This provides a balance between preserving the magnitudes of the variance of the weight during the forward pass, and the backward pass.
Paper:
Source Edit proc xavier_uniform(shape: varargs[int]; T: type): Tensor[T]
-
Xavier Glorot initialisation for trainable layers preceding a linear activation (sigmoid, tanh). Xavier initialization is recommended for sigmoid, tanh and softsign activated layers.
Weight is sampled from an uniform distribution of range -√3 * √(2/(fan_in+fan_out)), √3 * √(2/(fan_in+fan_out)) with fan_in the number of input units in the forward pass. and fan_out the number of input units during the backward pass (and not output units during the forward pass).
This provides a balance between preserving the magnitudes of the variance of the weight during the forward pass, and the backward pass.
Paper:
Source Edit proc yann_normal(shape: varargs[int]; T: type): Tensor[T]
-
Yann Lecun initialisation for trainable layers
Weight is sampled from a normal distribution of mean 0 and standard deviation √(1/fan_in) with fan_in the number of input unit in the forward pass.
This preserves the magnitude of the variance of the weight during the forward pass
Paper:
Source Edit proc yann_uniform(shape: varargs[int]; T: type): Tensor[T]
-
Yann Lecun initialisation for trainable layers
Weight is sampled from an uniform distribution of range √(3/fan_in), √(3/fan_in) with fan_in the number of input unit in the forward pass.
This preserves the magnitude of the variance of the weight during the forward pass
Paper:
Source Edit