Procs
proc gru_cell_backward[T: SomeFloat](dx, dh, dW3, dU3, dbW3, dbU3: var Tensor[T]; dnext: Tensor[T]; x, h, W3, U3: Tensor[T]; r, z, n, Uh: Tensor[T])
-
Input:
- dx, dh, dW3, dU3: respectively gradients of
- x, input tensor during the forward pass. Shape batch_size, features
- h, hidden state during the forward pass. Shape batch_size, hidden_size
- W3, gate input weights (multiplied by x) during the forward pass. Shape 3 * hidden_size, features
- U3, recurrent weights (multiplied by h) during the forward pass. Shape 3 * hidden_size, features
- dbW3 and dbU3: gradients of the biases for W3 and U3 weights
- dnext: gradient flowing back from the next layer
- x, h, W3, U3: inputs saved from the forward pass
- r, z, n, Uh: intermediate results saved from the forward pass of shape batch_size, hidden_size
- dx, dh, dW3, dU3: respectively gradients of
proc gru_cell_forward[T: SomeFloat](input, W3, U3, bW3, bU3: Tensor[T]; r, z, n, Uh, hidden: var Tensor[T])
-
Input:
- input tensor of shape batch_size, features
- hidden state of shape batch_size, hidden_size
- gates weights of input W3 3 * hidden_size, features
- recurrent weights of hidden state U3 3 * hidden_size, hidden_size
- biases of input and hidden state 1, 3 * hidden_size
Output:
- r, z, n, Uh: intermediate tensors saved for backpropagation. of shape batch_size, hidden_size
- y == h'(t): The next hidden state of the GRU Cell. (GRU output and next hidden state are the same)
⚠️ Input/output updated in place:
- h(t) -> h'(t), the hidden state of shape batch_size, hidden_size is both an input and output
proc gru_cell_inference[T: SomeFloat](input: Tensor[T]; W3, U3, bW3, bU3: Tensor[T]; hidden: var Tensor[T])
-
Input:
- input tensor of shape batch_size, features
- weight of input W3 3 * hidden_size, features
- weight of hidden U3 3 * hidden_size, hidden_size
- biases of input and hidden state 1, 3 * hidden_size
Output (in-place):
- y == h'(t): The next hidden state of the GRU Cell. (GRU output and next hidden state are the same)
⚠️ Input/Output updated in-place:
- h(t) -> h'(t), the hidden state of shape batch_size, hidden_size is both an input and output
This is an optimized function when backpropagation is not needed.
Source Edit proc gru_forward[T: SomeFloat](input: Tensor[T]; W3s0, W3sN: Tensor[T]; U3s, bW3s, bU3s: Tensor[T]; rs, zs, ns, Uhs: var Tensor[T]; output, hidden: var Tensor[T]; cached_inputs: var seq[Tensor[T]]; cached_hiddens: var seq[seq[Tensor[T]]])
-
⚠️ API subject to change to match CuDNNs
Bidirectional support is not implemented
Inputs:
- input: Input tensor of shape sequence/timesteps, batch, features
- Input weights W3s of shapes:
- W3s0: 3 * hidden_size, features for the first layer
- W3sN: num_stacked_layers - 1, 3 * hidden_size, num_directions * hidden_size for the following layers
- A series of hidden state weights U3s of shape num_stacked_layers, 3 * hidden_size, hidden_size
- A series of biases for input and hidden state weights of shape num_stacked_layers, 1, 3 * hidden_size
Outputs:
- rs, zs, ns, Uhs: intermediate tensors saved for backpropagation. Shape num_stacked_layers, timesteps, batch_size, hidden_size. They must be preallocated (but it can be with unitialized buffers).
- output of shape sequence/timesteps, batch, num_directions * hidden_size. output contains the output features hiddenT for each T (timesteps)
- hidden of shape num_stacked_layers * num_directions, batch, hidden_size. hidden contains the hidden state for timestep T == sequence/timesteps length of input
- cached_inputs, a sequence of length num_stacked_layers containing
- the first layer input of shape sequence/timesteps, batch, features
- the following layer inputs of shape sequence/timesteps, batch, num_directions * hidden_size
- cached_hiddens, a sequence of sequences of length num_stacked_layers, sequence/timesteps
- containing all intermediate hidden states for each timesteps for each stacked layers. Hidden states are of tensors of shape 3 * hidden_size, hidden_size
⚠️ Input/Output updated in-place:
- h(t) -> h'(t), the hidden state of shape num_stacked_layers * num_directions, batch, hidden_size is both an input and output
proc gru_inference[T: SomeFloat](input: Tensor[T]; W3s0, W3sN: Tensor[T]; U3s, bW3s, bU3s: Tensor[T]; output, hidden: var Tensor[T])
-
Bidirectional support is not implemented
Inputs:
- input: Input tensor of shape sequence/timesteps, batch, features
- Input weights W3s of shapes:
- W3s0: 3 * hidden_size, features for the first layer
- W3sN: num_stacked_layers - 1, 3 * hidden_size, num_directions * hidden_size for the following layers
- A series of hidden state weights U3s of shape num_stacked_layers, 3 * hidden_size, hidden_size
- A series of biases for input and hidden state weights of shape num_stacked_layers, 1, 3 * hidden_size
Outputs:
- output of shape sequence/timesteps, batch, num_directions * hidden_size. output contains the output features hiddenT for each T (timesteps)
- hidden of shape num_stacked_layers * num_directions, batch, hidden_size. hidden contains the hidden state for timestep T == sequence/timesteps length of input
⚠️ Input/Output updated in-place:
- h(t) -> h'(t), the hidden state of shape num_stacked_layers * num_directions, batch, hidden_size is both an input and output