COMP10002 Foundations of Algorithms

Reading LLM Architecture Diagrams

How to read the block diagrams that often appear in Transformer and LLM papers, without getting lost in all the boxes and arrows.

Short answer

When you see an LLM architecture figure in a paper, do not try to understand every box at once.

Read it as a data-flow diagram:

  1. what goes in
  2. what gets repeated
  3. what the main sub-blocks are
  4. what comes out

Most architecture figures are trying to show the shape of the computation, not the full implementation detail.

How to read them

Start with the arrows.

They usually tell you the highest-level story:

Then identify what kind of diagram it is.

Most paper figures are doing one of these jobs:

If you know which of those jobs the figure is doing, it becomes much easier to read.

Hover or focus the main boxes in the figure below for short plain-English explanations of the main parts.

A typical Transformer-style paper figure compresses a large computation into a few labelled boxes and repeated stacks. Read it first as a pipeline, then zoom into the sub-block you care about. Figure from “Transformer, full architecture” by dvgodoy, licensed CC BY 4.0, via Wikimedia Commons.

Look for the input and output first

Find the leftmost or bottommost input, then trace where the arrows eventually lead. That gives you the big picture before you worry about details.

Notice what is repeated

If a diagram says something like “×12”, “×24”, or “N layers”, it usually means “this same block is stacked many times”.

Read box labels as roles

Labels such as “self-attention”, “feed-forward”, “add & norm”, or “MLP” are usually naming the role of a sub-computation, not giving all the loop details.

Repeated blocks

A common source of confusion is that one box in a paper figure may really stand for a large repeated stack.

For example, if you see:

that usually means the model applies the same kind of block many times in sequence.

The figure is not claiming there is literally only one attention computation. It is compressing many similar layers into one readable visual unit.

That is the same reason papers often draw one attention block, one feed-forward block, and one output head even though the real implementation may involve:

What diagrams often leave out

Paper diagrams are helpful, but they often omit the details you would need to code the model directly.

What commonly gets suppressed:

Why Figures Feel Simpler Than Code

This is why a paper figure can feel easy to look at but hard to implement from directly. The figure tells you the broad structure; the equations and prose carry the operational detail.

If you want the equation-side version of the same problem, read Reading Transformer paper notation.

How this maps to the assignment

The architecture diagrams in papers usually contain much more than this assignment asks you to implement.

In a typical LLM figure, your assignment corresponds to only a small middle slice:

That is why the Transformer explainer is useful before or after this page: it shows the larger block structure, while this page is about how to read the style of figure that papers tend to use.