Fork me on GitHub
Arraymancer Technical reference Tutorial Spellbook (How-To's) Under the hood

Module nnp_softmax_cross_entropy

Procs

proc softmax_cross_entropy[T](input, target: Tensor[T]): T
Softmax function + Cross-Entropy loss fused in one layer.
Input:
  • A Tensor of shape [batch_size, predicted_labels_probabilities]
  • The target values of shape [batchsize, truth_labels_probability]
Returns:
  • Apply a softmax activation and returns the cross-entropy loss.

Softmax_cross_entropy measures the cross-entropy error for multiclass classification. Classes are mutually exclusive (only 1 label is true) but the truth labels (target) need not be.

Note: Instead of one-hot-encoded labels, it is more efficient to use sparse_softmax_cross_entropy instead of feeding softmax_cross_entropy.

For example if your true probablities are (car: 0.10, airplane: 0.60, bike: 0.05, bus: 0.25), you have to use softmax_cross_entropy

However if your true probablities are (car: 0, airplane: 1, bike: 0, bus: 0) (a one-hot-encoded vector), you should prefer sparse_softmax_cross_entropy

  Source Edit
proc sparse_softmax_cross_entropy[T](input: Tensor[T]; target: Tensor[int]): T
Softmax function + Cross-Entropy loss fused in one layer.
Input:
  • A Tensor of shape [batchsize, predicted_labels_probabilities]
  • The target values of shape [batchsize] containing the truth label id
Returns:
  • Apply a softmax activation and returns the cross-entropy loss.

sparse_softmax_cross_entropy measures the cross-entropy error for multiclass classification. Classes are mutually exclusive (only 1 label is true).

Important: [0, 0, 1] means label 2 is true i.e. labels start at 0

Note: Instead of one-hot-encoded labels, it is more efficient to use sparse_softmax_cross_entropy instead of feeding softmax_cross_entropy.

For example if your true probablities are (car: 0.10, airplane: 0.60, bike: 0.05, bus: 0.25), you have to use softmax_cross_entropy

However if your true probablities are (car: 0, airplane: 1, bike: 0, bus: 0) (a one-hot-encoded vector), you should prefer sparse_softmax_cross_entropy

  Source Edit
proc softmax_cross_entropy_backward[T](gradient: Tensor[T] or T;
                                      cached_tensor: Tensor[T]; target: Tensor[T]): Tensor[
    T] {.
noInit
.}
Derivatives of softmax_cross_entropy
Input:
  • The input gradient as a scalar or a Tensor
  • A cache tensor that contains data from before the forward pass
  • The target values
Shape:
  • Both the cache and target shape should be [batchsize, features] i.e. number of samples as first dimension
  Source Edit
proc sparse_softmax_cross_entropy_backward[T](gradient: Tensor[T] or T;
    cached_tensor: Tensor[T]; target: Tensor[int]): Tensor[T] {.
noInit
.}
Derivatives of sparse_softmax_cross_entropy
Input:
  • The input gradient as a scalar or a Tensor
  • A cache tensor that contains data from before the forward pass
  • The target values
Shape:
  • Both the cache should be [features, batchsize] i.e. number of samples as last dimension
  • target shape should be [batchsize]
  Source Edit