All concepts

Further reading

Tokens and Embeddings

How text becomes ordered rows of numbers before attention begins.

Also called: tokenisation, prompt embeddingsFurther Reading

LLMs do not work directly on text. They work on tokens , which are pieces of text in a fixed order, and on embeddings , which are vectors of numbers attached to those tokens.

Text

the small cat

Tokens in order

0 the 1 small 2 cat

Embedding rows

The order is preserved all the way through: token 0 becomes position i = 0, token 1 becomes i = 1, and so on. Attention later uses those positions when it decides what each position may look back at.

For this assignment, you can treat tokens almost like words:

In this assignment, you first practise the representation idea with a simplified embedding step.

Stage 1 builds one-hot vectors for the tokens as a simple stand-in for embeddings. That is the representation exercise students do directly. After that, the later attention stages use richer embedding rows as the prompt matrix that gets projected into $\mathbf{Q}$, $\mathbf{K}$, and $\mathbf{V}$.

The prompt matrix can be read as:

Prompt matrix shape

n rows by d columns

Rows track token positions. Columns are the components of each embedding vector .

This is why the scaffold naturally uses a 2D array: one axis for token positions and one axis for embedding components.

That is why the rest of the assignment is mostly about array loops and matrix-style computations rather than text processing.

Why the spec says token instead of word

Models do not literally operate on words. A token might be:

  • a whole word
  • part of a word
  • punctuation
  • even whitespace in some systems

For this assignment, you can safely pretend tokens are words because the important fact is just that token positions have an order and the attention rules depend on that order.