Procs
- proc softmax_cross_entropy[T](input, target: Tensor[T]): T 
- 
    
    Softmax function + Cross-Entropy loss fused in one layer. Input: - A Tensor of shape batch_size, predicted_labels_probabilities
- The target values of shape batchsize, truth_labels_probability
 Returns: - Apply a softmax activation and returns the cross-entropy loss.
 Softmax_cross_entropy measures the cross-entropy error for multiclass classification. Classes are mutually exclusive (only 1 label is true) but the truth labels (target) need not be. Note: Instead of one-hot-encoded labels, it is more efficient to use sparse_softmax_cross_entropy instead of feeding softmax_cross_entropy. For example if your true probablities are (car: 0.10, airplane: 0.60, bike: 0.05, bus: 0.25), you have to use softmax_cross_entropy However if your true probablities are (car: 0, airplane: 1, bike: 0, bus: 0) (a one-hot-encoded vector), you should prefer sparse_softmax_cross_entropy Source Edit
- proc softmax_cross_entropy_backward[T](gradient: Tensor[T] or T; cached_tensor: Tensor[T]; target: Tensor[T]): Tensor[T] {.noinit.} 
- 
    
    Derivatives of softmax_cross_entropy Input:- The input gradient as a scalar or a Tensor
- A cache tensor that contains data from before the forward pass
- The target values
 Shape: - Both the cache and target shape should be batchsize, features i.e. number of samples as first dimension
 
- proc sparse_softmax_cross_entropy[T; Idx: SomeNumber or byte or char or enum]( input: Tensor[T]; target: Tensor[Idx]): T 
- 
    
    Softmax function + Cross-Entropy loss fused in one layer. Input: - A Tensor of shape batchsize, predicted_labels_probabilities
- The target values of shape batchsize containing the truth label id
 Returns: - Apply a softmax activation and returns the cross-entropy loss.
 sparse_softmax_cross_entropy measures the cross-entropy error for multiclass classification. Classes are mutually exclusive (only 1 label is true). Important: 0, 0, 1 means label 2 is true i.e. labels start at 0 Note: Instead of one-hot-encoded labels, it is more efficient to use sparse_softmax_cross_entropy instead of feeding softmax_cross_entropy. For example if your true probablities are (car: 0.10, airplane: 0.60, bike: 0.05, bus: 0.25), you have to use softmax_cross_entropy However if your true probablities are (car: 0, airplane: 1, bike: 0, bus: 0) (a one-hot-encoded vector), you should prefer sparse_softmax_cross_entropy Source Edit
- proc sparse_softmax_cross_entropy_backward[T; Idx: SomeNumber or byte or char or enum](gradient: Tensor[T] or T; cached_tensor: Tensor[T]; target: Tensor[Idx]): Tensor[T] {.noinit.} 
- 
    
    Derivatives of sparse_softmax_cross_entropy Input:- The input gradient as a scalar or a Tensor
- A cache tensor that contains data from before the forward pass
- The target values
 Shape: - Both the cache should be features, batchsize i.e. number of samples as last dimension
- target shape should be batchsize
 
