COMP10002 Foundations of Algorithms
What An Input File Looks Like
A concrete example of the assignment input format, using a small sample file and a line-by-line explanation.
Raw sample input
Here is the visible test_data/test0.txt input file with the same numbers and line breaks as the real file. The colours are added here only to make the sections easier to spot:
3 2 2 9 the subject foundations of algorithms is the best subject 1 1 0 1 0 0 1 1 1 1 1 0.5 -0.5 1 0 0 1 1 0 0 1 1 0 0 1
If you only want the big picture, read it like this:
3 2 2 9meansn = 3,d = 2,g = 2, andtext_len = 9- the next line is the token list used only for Stage 1 embedding creation
1 1 0is the prompt mask- the next
n = 3lines are prompt embeddings - the next
g = 2lines are generated embeddings - the last
3d = 6lines are the matricesWq,Wk, andWv, each withd = 2rows
Line-by-line breakdown
Each part of the file means:
Line 1
n d g text_len
3 2 2 9 means n = 3 prompt tokens, d = 2 components per vector
, g = 2 generated tokens, and text_len = 9 input tokens for Stage 1.
Line 2
Stage 1 token list
This whitespace-separated text is what Stage 1 turns into a sorted unique-token list and then into one-hot embeddings.
Line 3
mask
1 1 0 is the prompt padding mask. Prompt positions 0 and 1 are real, and position 2 is padding.
Lines 4 to 6
prompt embeddings
These are the 3 prompt embedding rows, each of length d = 2.
Lines 7 to 8
generated embeddings
These are the 2 generated embedding rows, also each of length d = 2.
Lines 9 to 10
Wq
These two rows form the query projection matrix Wq.
Lines 11 to 12
Wk
These two rows form the key projection matrix Wk.
Lines 13 to 14
Wv
These two rows form the value projection matrix Wv.
Two details matter:
- the actual input file does not contain labels like
maskorWq; only this page adds labels and colours to make the structure easier to read - input is whitespace-separated, so
scanfdoes not care whether values are separated by spaces or newlines
If you want a more human-friendly example generated from real text instead of a tiny numeric test case, use the Toy demo.