Tokens and Embeddings
How text is split into tokens, then converted into numeric vectors a model can work with.
Attention Is All You Need · 2026 A1
Required concept
How a single active slot can represent a category or label.
One-hot encoding is a way to represent one choice from a small set using a vector of zeros with exactly one 1.
Example mapping
The position of the `1` is the identity. Everything else stays `0`.
The position of the 1 is the identity. Everything else stays 0.
| Category | slot 0 | slot 1 | slot 2 | slot 3 |
|---|---|---|---|---|
| cat | 1 | 0 | 0 | 0 |
| dog | 0 | 1 | 0 | 0 |
| bird | 0 | 0 | 1 | 0 |
| fish | 0 | 0 | 0 | 1 |
One-hot encoding gives each category its own slot. The vector is sparse on purpose: the category is encoded by where the 1 appears, not by the size of the number.
For example, if you had three categories, you could write them as:
cat -> (1, 0, 0)dog -> (0, 1, 0)bird -> (0, 0, 1)The important idea is that the position of the 1 carries the identity. The vector is not trying to be smooth or semantic. It is just a direct numeric label.
In a model, one-hot vectors are often used when the system needs a discrete choice in numeric form. They are a simple bridge between categories and vectors, and they are easy to read because only one slot is active.
A one-hot vector is a very sparse representation
. It says “this item is category k” and leaves the rest of the slots empty.
An embedding is different: it usually has many non-zero values and can encode richer relationships between items. In that sense, a one-hot vector is the simplest possible kind of vector representation.
The toy input generator uses a small entity one-hot region in its structured mode so repeated noun-like mentions can share a simple identity slot. That is a debugging aid, not something you have to implement in the assignment itself.
If you want the broader “tokens become vectors” picture, read Tokens and Embeddings.