Fork me on GitHub

src/arraymancer/nn_primitives/nnp_embedding

  Source Edit

Procs

proc embedding[T; Idx: byte or char or SomeInteger](vocab_id: Tensor[Idx];
    weight: Tensor[T]): Tensor[T]

Returns embeddings from a weight embedding matrix and vocab_id to represent the part of the global vocabulary present.

The main use-case is for natural language processing. Words (or characters or group of words) need to be encoded into arbitrary integers first that will be used to index the weight embedding matrix.

During training, words that are related will get become close in some dimensions of the embedding.

For example, if we want to encode a text containing 10000 different words into a 300-dimensional vector, we will require a 10000, 300 embedding matrix.

Make sure to add an index to represent <UNKNOWN> words. (Words present during test that didn't exist in the training vocabulary)

If working with variable-length sequences a <START>, <STOP> and <PAD> "words" are also useful

In summary it's a lookup table that maps words to meanings in a high-dimensional space and that can be trained.

Input:

Result:

  Source Edit
proc embedding_backward[T; Idx: byte or char or SomeInteger](
    dWeight: var Tensor[T]; vocab_id: Tensor[Idx]; dOutput: Tensor[T];
    padding_idx: Idx; scale_grad_by_freq: static[bool] = false)
  Source Edit
Arraymancer Technical reference Tutorial Spellbook (How-To's) Under the hood