Interactive · Guide
How LLMs Work
4 steps from raw text to output — click and interact to learn
Tokenization
Split text into small "tokens" with numeric IDs
A model doesn't read text character by character — it splits it into tokens, which can be words, parts of words, or punctuation. Each token gets a numeric ID for lookup in the embedding table.
Example · click a token to see its ID
Numbers below = token IDs in vocabulary
"tokenization" splits into token +
ization —
this subword approach lets models handle unseen words by recognising familiar pieces.
Embeddings
Map each token ID to a high-dimensional numeric vector
Each token ID is looked up in an embedding table to retrieve a vector of hundreds of numbers. Words with similar meanings end up geometrically close — that's how the model "knows" king and queen are related.
vector space (2D projection) · hover to explore
Vector arithmetic works: king − man + woman ≈ queen — the dashed arrows show this relationship. The model learns it entirely from data, nothing is hard-coded.
Self-Attention
Each token "looks at" other tokens to gather context
Attention lets the model understand context — when processing one token, the model scores every other token. A high attention weight means that token is important for understanding the current one.
Click a token to see its attention weights
A Transformer runs multiple attention heads in parallel — each learns different patterns: some track syntax, some track coreference (pronouns → nouns), some track semantic similarity. Their outputs are concatenated and passed to the next layer.
Token Generation
Pick the next token from a probability distribution
After all attention layers, the model produces a probability distribution over the entire vocabulary (~50,000 tokens) and samples from it using a parameter called Temperature.
context
The weather today is
top 5 candidate tokens
Low temperature (0.1) → always picks the highest-probability token. Great for code and facts.
High temperature (1.5+) → flattens the distribution, producing more varied output. Better for creative writing.
🎉
All 4 steps complete!
Text → Tokens → Embeddings → Attention → Generation
That's the core pipeline every LLM runs.