Procs
proc softmax_cross_entropy[T](input, target: Tensor[T]): T
-
Softmax function + Cross-Entropy loss fused in one layer.
Input:
- A Tensor of shape batch_size, predicted_labels_probabilities
- The target values of shape batchsize, truth_labels_probability
Returns:
- Apply a softmax activation and returns the cross-entropy loss.
Softmax_cross_entropy measures the cross-entropy error for multiclass classification. Classes are mutually exclusive (only 1 label is true) but the truth labels (target) need not be.
Note: Instead of one-hot-encoded labels, it is more efficient to use sparse_softmax_cross_entropy instead of feeding softmax_cross_entropy.
For example if your true probablities are (car: 0.10, airplane: 0.60, bike: 0.05, bus: 0.25), you have to use softmax_cross_entropy
However if your true probablities are (car: 0, airplane: 1, bike: 0, bus: 0) (a one-hot-encoded vector), you should prefer sparse_softmax_cross_entropy
Source Edit proc softmax_cross_entropy_backward[T](gradient: Tensor[T] or T; cached_tensor: Tensor[T]; target: Tensor[T]): Tensor[T] {.noinit.}
-
Derivatives of softmax_cross_entropy Input:
- The input gradient as a scalar or a Tensor
- A cache tensor that contains data from before the forward pass
- The target values
Shape:
- Both the cache and target shape should be batchsize, features i.e. number of samples as first dimension
proc sparse_softmax_cross_entropy[T; Idx: SomeNumber or byte or char or enum]( input: Tensor[T]; target: Tensor[Idx]): T
-
Softmax function + Cross-Entropy loss fused in one layer.
Input:
- A Tensor of shape batchsize, predicted_labels_probabilities
- The target values of shape batchsize containing the truth label id
Returns:
- Apply a softmax activation and returns the cross-entropy loss.
sparse_softmax_cross_entropy measures the cross-entropy error for multiclass classification. Classes are mutually exclusive (only 1 label is true).
Important: 0, 0, 1 means label 2 is true i.e. labels start at 0
Note: Instead of one-hot-encoded labels, it is more efficient to use sparse_softmax_cross_entropy instead of feeding softmax_cross_entropy.
For example if your true probablities are (car: 0.10, airplane: 0.60, bike: 0.05, bus: 0.25), you have to use softmax_cross_entropy
However if your true probablities are (car: 0, airplane: 1, bike: 0, bus: 0) (a one-hot-encoded vector), you should prefer sparse_softmax_cross_entropy
Source Edit proc sparse_softmax_cross_entropy_backward[T; Idx: SomeNumber or byte or char or enum](gradient: Tensor[T] or T; cached_tensor: Tensor[T]; target: Tensor[Idx]): Tensor[T] {.noinit.}
-
Derivatives of sparse_softmax_cross_entropy Input:
- The input gradient as a scalar or a Tensor
- A cache tensor that contains data from before the forward pass
- The target values
Shape:
- Both the cache should be features, batchsize i.e. number of samples as last dimension
- target shape should be batchsize