Attention
A mechanism that lets one position decide how much other positions should influence it.
Attention Is All You Need · 2026 A1
COMP10002 Foundations of Algorithms
The core ideas you need for the assignment, plus optional demos, tooling, and background if you want extra help.
A mechanism that lets one position decide how much other positions should influence it.
How a single active slot can represent a category or label.
How one vector is transformed into query, key, and value versions used by attention.
A way to measure how strongly two vectors line up, often used to build attention scores.
A function that turns raw scores into non-negative weights that add up to 1.
Rules that prevent certain positions from being used, such as future tokens or padding.
A store of previously computed keys and values so generation can reuse past work.
A guided reading of the provided `a1.c`, with direct links from the main spec and tooltip-style explanations for the names, arrays, and helper functions you keep seeing.
A small concrete example of the assignment input format, showing what a real input file looks like and how to read each line.
How to compile with stronger warnings, run useful checking tools, and catch bugs in your own `a1.c` before submission.
The small amount of C string handling you may need for Stage 1: storing tokens, comparing them, sorting them, and copying them safely.
A practical guide to using `qsort(...)` for Stage 1 and understanding the comparison function it calls.
A concrete numeric walkthrough that connects the assignment data back to token positions and attention weights.
How text is split into tokens, then converted into numeric vectors a model can work with.
How to read the compact notation used in the original Transformer paper and later ones.
A one-page view of the larger model architecture so you can see where this assignment fits.
How to read the boxes, arrows, repeated stacks, and omissions in the architecture figures that appear in papers.
Why attention mattered historically, including the 2017 paper and why the impact took time to become obvious.
What terms like “7B parameters” mean, where those numbers live, and what they imply for memory.
Why the assignment was developed as a human-AI collaboration, what stayed human, and what AI helped with.
The main assignment page links the needed material inline at the point where it matters, so you can either read from here or jump in from the spec as needed.