COMP10002 Foundations of Algorithms

Annotated Scaffold

A guided reading of the provided `a1.c`, with direct links from the main spec and tooltip-style explanations for the names, arrays, and helper functions you keep seeing.

Overview

This page is meant to be open beside a1.c.

The main assignment page tells you what to implement. This page is about how to read the scaffold you were given without getting lost in the arrays, the stage order, or the helper names.

Why this page exists

The scaffold does some important work for you already:

  • it allocates the main arrays
  • it calls the stages in order
  • it prints the stage headers and formatted rows
  • it provides the printing helpers you should normally leave alone

That means your job is mostly to fill in the TODO functions correctly and keep the existing output contract intact.

File map

Top of file

Comment block, includes, and constants

The header identifies the file, the complexity-analysis block is where students write their Big-O notes, and the preprocessor section defines the fixed array bounds such as MAX_TOKENS , MAX_D , and MAX_GEN .

Interface

Function prototypes

This section tells you the shape of every function before you reach the implementations. When you are coding a stage from the main page, the prototype is the fastest way to remind yourself which arrays are inputs and which arrays you are expected to fill.

Execution flow

main() is the roadmap

The `main` function is worth reading carefully once, even if you do not edit it much. It shows the exact order in which the scaffold expects the stages to happen, which arrays are reused later, and where the printing already happens.

Your code

TODO functions

The core work lives in the TODO block: `read_input`, `create_embeddings`, `compute_projection`, the combined prompt score/weight function, the prompt output function, and the generation-with-cache function. Those are the places you should spend most of your implementation effort.

Leave these alone

Provided helpers

The printing helpers already implement the expected formatting, including clamping near-zero values. This is one of the easiest places to accidentally break the autograder if you start “cleaning things up”.

Walking Through main()

The easiest way to understand the scaffold is to read main() as a pipeline:

/* Some details of the main function such as printing and variable declarations
   are ommitted here for brevity, but are included in the skeleton code */
int main(void) {
    /* allocate all the main arrays */
    read_input(&n, &d, &g, &text_len, embedding_table, mask, prompt, gen, wq, wk, wv);

    /* Stage 1: sort and print the one-hot table */
    create_embeddings(embedding_table, text_len);

    /* build Q, K, V */
    compute_projection(n, d, prompt, wq, q_matrix);
    compute_projection(n, d, prompt, wk, k_matrix);
    compute_projection(n, d, prompt, wv, v_matrix);

    /* scores -> weights -> outputs */
    compute_attention_scores_or_weights_prompt(
        n, d, mask, q_matrix, k_matrix, scores, NO_SOFTMAX);
    compute_attention_scores_or_weights_prompt(
        n, d, mask, q_matrix, k_matrix, weights, APPLY_SOFTMAX);
    compute_attention_output_prompt(n, d, weights, v_matrix, out_prompt);

    /* generation with cache */
    compute_generation_with_cache(..., t, ..., output);
}

What to notice first

The scaffold already decides the stage order for you. If your output is wrong, one of the earlier filled arrays is usually the real cause.

Why there are separate arrays

The scaffold does not overwrite the prompt matrix in place. It keeps prompt, then separate q_matrix, k_matrix, v_matrix, then separate scores, weights, and out_prompt so each stage can be inspected independently.

What printing means for you

Notice where printf already happens. That is why the spec keeps warning you not to add extra output and not to change the existing formatting helpers.

read_input(...)

Stage 0

Load the whole snapshot into memory

This function is the bridge between the input file and every later computation. If this stage is wrong, nothing downstream is trustworthy.

void read_input(int *n, int *d, int *g, int *text_len,
                char embedding_table[MAX_TEXT_SIZE][MAX_TOKEN_LENGTH + 1],
                int mask[MAX_TOKENS],
                double prompt[MAX_TOKENS][MAX_D],
                double gen[MAX_GEN][MAX_D],
                double wq[MAX_D][MAX_D], double wk[MAX_D][MAX_D],
                double wv[MAX_D][MAX_D]);
n d g text_len embedding_table mask prompt gen Wq / Wk / Wv

What this function should do

Use scanf to read the values in the exact order the spec gives: sizes, Stage 1 token text, prompt mask, prompt rows, generated rows, then Wq, Wk, and Wv. This is a storage stage, not a computation stage.

Why the scalar values are pointers

n, d, g, and text_len are passed by address so read_input can write them back into the variables owned by main().

Common mistake

Mixing up loop bounds. The prompt loops run over n rows, gen runs over g, and each matrix Wq/Wk/Wv runs over d × d entries.

create_embeddings(...)

Stage 1

Turn the token list into a sorted one-hot table

This is the warm-up representation stage. It is separate from the later numeric prompt embeddings, and its whole job is to sort the unique token strings and print the one-hot rows.

int compare_tokens(const void *a, const void *b);
void create_embeddings(char embedding_table[MAX_TEXT_SIZE][MAX_TOKEN_LENGTH + 1],
                       int text_len);
embedding_table text_len compare_tokens

What this function should do

Sort the token strings, keep each unique token once, and print one one-hot vector per token in sorted order.

What the vector length is

The Stage 1 vector length is the number of unique tokens you ended up with after sorting and deduplicating, not the later prompt embedding dimension d.

Common mistake

Mixing up “first seen” with the final printed order. First seen helps define the unique set; the printed order is the sorted order.

compute_projection(...)

Stage 2

Apply the same matrix-multiply pattern three times

This is the reusable projection helper. The scaffold calls it three times so the same loop structure builds Q, K, and V.

void compute_projection(int count, int d, double src[MAX_TOKENS][MAX_D],
                        double w[MAX_D][MAX_D],
                        double dest[MAX_TOKENS][MAX_D]);
count src w dest

Loop shape

One loop over source rows, one loop over output columns, and an inner accumulation loop over the shared dimension d.

Mental model

Each output component is “one source row dotted with one column of the weight matrix”. The projection explainer shows that visually.

Common mistake

Forgetting to reset the accumulator to 0.0 for each destination component before the inner loop starts.

compute_attention_scores_or_weights_prompt(...)

Stages 3 and 4

Use one function twice: first for scores, then for weights

The scaffold combines raw score computation and weight computation using stable softmax into one function. The first call fills the square prompt score matrix; the second call repeats the same masking logic and turns each row into attention weights. The only reason we compute the scores on their own seperately, is as a check that your program output is correct.

void compute_attention_scores_or_weights_prompt(
    int n, int d, const int mask[MAX_TOKENS],
    double q[MAX_TOKENS][MAX_D], double k[MAX_TOKENS][MAX_D],
    double scores_or_weights[MAX_TOKENS][MAX_TOKENS],
    int apply_softmax);
q k scores_or_weights apply_softmax

What the first call does

When apply_softmax == NO_SOFTMAX, compute the scaled dot products, apply both masks, and store -INFINITY in the masked positions. If mask[i] == 0, the whole row is masked.

What the second call does

When apply_softmax == APPLY_SOFTMAX, use the same allowed positions but turn each row into a stable softmax over the unmasked columns. If a row has no valid columns, the whole output row should be zeros.

Why the output is square

Each prompt row is compared with every prompt row position, so the output is n × n even though q and k are n × d.

compute_attention_output_prompt(...)

Stage 5

Use the weights to blend value rows into output rows

This is the weighted-sum stage. The previous stage decided how much each earlier row matters; this stage uses those coefficients to build the new output vector.

void compute_attention_output_prompt(
    int n, int d,
    double weights[MAX_TOKENS][MAX_TOKENS],
    double v[MAX_TOKENS][MAX_D], double out[MAX_TOKENS][MAX_D]);
weights v out

Why the dimensions fit

A weight row has length n, and each column of v also has n entries. That is why you can think of each output component as “one weight row dotted with one column of v”.

What changes conceptually

This is the point where a prompt row stops being “just itself” and becomes a mixture of the value rows from the positions it attended to.

Common mistake

Looping over the wrong dimension in the inner sum. The inner sum here runs over prompt positions j, not over embedding components.

compute_generation_with_cache(...)

Stage 6

Do the same attention logic again, but reuse history

This function is the most crowded because it has to compute the new generated token's projections, append to the caches, score against the cache, produce the output, and return the dot-product count.

long compute_generation_with_cache(
    int n, int d, int g, int t, const int mask[MAX_TOKENS],
    double gen[MAX_GEN][MAX_D], double wq[MAX_D][MAX_D],
    double wk[MAX_D][MAX_D], double wv[MAX_D][MAX_D],
    double k_cache[MAX_TOKENS + MAX_GEN][MAX_D],
    double v_cache[MAX_TOKENS + MAX_GEN][MAX_D],
    double output[MAX_D]);
t gen k_cache v_cache output

What makes this different

Prompt attention compares prompt rows to other prompt rows. Generation compares one fresh generated query against the already-built cache of older keys and values.

Why the cache arrays are bigger

The cache has to hold both the original prompt rows and the later generated rows, so the row capacity is MAX_TOKENS + MAX_GEN.

What the return value means

The function returns a long because the stage also has to report how many dot products were computed for that generated token.

Provided Helpers

These are the functions you should read once, understand, and usually leave alone:

static double clamp_near_zero(double value) { ... }
void print_vector(int d, const double row[]) { ... }
void print_attention_matrix(int n,
    const double matrix[MAX_TOKENS][MAX_TOKENS]) { ... }
void print_embedding_matrix(int n, int d,
    const double matrix[MAX_TOKENS][MAX_D]) { ... }
int compare_tokens(const void *a, const void *b) { ... }

Do not casually rewrite the printing helpers

Why they matter:

  • clamp_near_zero(...) rounds tiny floating-point numbers to 0.0
  • print_vector(...) print a length-d vector rounded to 3 decimal places seperated by spaces.
  • print_embedding_matrix(...) print an n by d embedding matrix with each row on a new line.
  • print_attention_matrix(...) print an n by n attention matrix with each row on a new line.
  • compare_tokens(...) is an optional helper function to use for sorting an array of strings with qsort

These helpers are part of the expected output behaviour, so changing them is risky unless the spec explicitly tells you to.