Source Edit

Imports

tensor, p_activation, nnp_linear, nnp_activation

Procs

proc gru_backward[T: SomeFloat](dInput, dHidden0, dW3s0, dW3sN, dU3s, dbW3s, dbU3s: var Tensor[T]; dOutput, dHiddenN: Tensor[T]; cached_inputs: seq[Tensor[T]]; cached_hiddens: seq[seq[Tensor[T]]]; W3s0, W3sN, U3s, rs, zs, ns, Uhs: Tensor[T]): ⚠️ API subject to change to match CuDNNs Source Edit
proc gru_cell_backward[T: SomeFloat](dx, dh, dW3, dU3, dbW3, dbU3: var Tensor[T]; dnext: Tensor[T]; x, h, W3, U3: Tensor[T]; r, z, n, Uh: Tensor[T]): Input:
dx, dh, dW3, dU3: respectively gradients of
x, input tensor during the forward pass. Shape batch_size, features

h, hidden state during the forward pass. Shape batch_size, hidden_size

W3, gate input weights (multiplied by x) during the forward pass. Shape 3 * hidden_size, features

U3, recurrent weights (multiplied by h) during the forward pass. Shape 3 * hidden_size, features

dbW3 and dbU3: gradients of the biases for W3 and U3 weights

dnext: gradient flowing back from the next layer

x, h, W3, U3: inputs saved from the forward pass

r, z, n, Uh: intermediate results saved from the forward pass of shape batch_size, hidden_size

Source Edit
proc gru_cell_forward[T: SomeFloat](input, W3, U3, bW3, bU3: Tensor[T]; r, z, n, Uh, hidden: var Tensor[T]): Input:
input tensor of shape batch_size, features

hidden state of shape batch_size, hidden_size

gates weights of input W3 3 * hidden_size, features

recurrent weights of hidden state U3 3 * hidden_size, hidden_size

biases of input and hidden state 1, 3 * hidden_size

Output:

r, z, n, Uh: intermediate tensors saved for backpropagation. of shape batch_size, hidden_size

y == h'(t): The next hidden state of the GRU Cell. (GRU output and next hidden state are the same)

⚠️ Input/output updated in place:

h(t) -> h'(t), the hidden state of shape batch_size, hidden_size is both an input and output

Source Edit
proc gru_cell_inference[T: SomeFloat](input: Tensor[T]; W3, U3, bW3, bU3: Tensor[T]; hidden: var Tensor[T]): Input:
input tensor of shape batch_size, features

weight of input W3 3 * hidden_size, features

weight of hidden U3 3 * hidden_size, hidden_size

biases of input and hidden state 1, 3 * hidden_size

Output (in-place):

y == h'(t): The next hidden state of the GRU Cell. (GRU output and next hidden state are the same)

⚠️ Input/Output updated in-place:

h(t) -> h'(t), the hidden state of shape batch_size, hidden_size is both an input and output

This is an optimized function when backpropagation is not needed.
Source Edit
proc gru_forward[T: SomeFloat](input: Tensor[T]; W3s0, W3sN: Tensor[T]; U3s, bW3s, bU3s: Tensor[T]; rs, zs, ns, Uhs: var Tensor[T]; output, hidden: var Tensor[T]; cached_inputs: var seq[Tensor[T]]; cached_hiddens: var seq[seq[Tensor[T]]]): ⚠️ API subject to change to match CuDNNs

Bidirectional support is not implemented

Inputs:

input: Input tensor of shape sequence/timesteps, batch, features

Input weights W3s of shapes:
W3s0: 3 * hidden_size, features for the first layer

W3sN: num_stacked_layers - 1, 3 * hidden_size, num_directions * hidden_size for the following layers

A series of hidden state weights U3s of shape num_stacked_layers, 3 * hidden_size, hidden_size

A series of biases for input and hidden state weights of shape num_stacked_layers, 1, 3 * hidden_size

Outputs:

rs, zs, ns, Uhs: intermediate tensors saved for backpropagation. Shape num_stacked_layers, timesteps, batch_size, hidden_size. They must be preallocated (but it can be with unitialized buffers).

output of shape sequence/timesteps, batch, num_directions * hidden_size. output contains the output features hiddenT for each T (timesteps)

hidden of shape num_stacked_layers * num_directions, batch, hidden_size. hidden contains the hidden state for timestep T == sequence/timesteps length of input

cached_inputs, a sequence of length num_stacked_layers containing
the first layer input of shape sequence/timesteps, batch, features

the following layer inputs of shape sequence/timesteps, batch, num_directions * hidden_size

cached_hiddens, a sequence of sequences of length num_stacked_layers, sequence/timesteps
containing all intermediate hidden states for each timesteps for each stacked layers. Hidden states are of tensors of shape 3 * hidden_size, hidden_size

⚠️ Input/Output updated in-place:

h(t) -> h'(t), the hidden state of shape num_stacked_layers * num_directions, batch, hidden_size is both an input and output

Source Edit
proc gru_inference[T: SomeFloat](input: Tensor[T]; W3s0, W3sN: Tensor[T]; U3s, bW3s, bU3s: Tensor[T]; output, hidden: var Tensor[T]): Bidirectional support is not implemented

Inputs:

input: Input tensor of shape sequence/timesteps, batch, features

Input weights W3s of shapes:
W3s0: 3 * hidden_size, features for the first layer

W3sN: num_stacked_layers - 1, 3 * hidden_size, num_directions * hidden_size for the following layers

A series of hidden state weights U3s of shape num_stacked_layers, 3 * hidden_size, hidden_size

A series of biases for input and hidden state weights of shape num_stacked_layers, 1, 3 * hidden_size

Outputs:

output of shape sequence/timesteps, batch, num_directions * hidden_size. output contains the output features hiddenT for each T (timesteps)

hidden of shape num_stacked_layers * num_directions, batch, hidden_size. hidden contains the hidden state for timestep T == sequence/timesteps length of input

⚠️ Input/Output updated in-place:

h(t) -> h'(t), the hidden state of shape num_stacked_layers * num_directions, batch, hidden_size is both an input and output

Source Edit

src/arraymancer/nn_primitives/nnp_gru

Imports

Procs