Word Embeddings

An embedding is:

a matrix that results from a transformation of words into vectors that encode the semantics in a meaningful way.
a dense representation of a One-Hot-Encoded matrix.
computed using a neural network (note: we say it’s unsupervised since the algorithm itself constructs the labels).

Word2Vec

The below is highly inspired from this tutorial.

Word2Vec is one of the most famous embedding algorithm. We construct the $X$ matrix with one-hot-encoding where the number of repetitions is equal to the window size we want to consider. The higher the window size, the more context is taken into account.

The $Y$ matrix represents the labels. We can see this matrix as the “shifted” $X$ to account for the previous and next words.

Word2vec is a 2-layer (shallow) neural network:

\[softmax((XW_1)W_2) = \hat Y\]

Where $W_1$ is the embedding matrix.

The learning is done through a classical backpropagation. Here is a simple implementation.