Buildup to GPT2

llm
gpt2
transformer
machine learning
deep learning
neural network
Published

July 7, 2025

Am embarking on a month long journey, getting myself locked in for this.

I am going to code, record video, push on github my version of coding:

  1. count based next character prediction
  2. neural network based next character prediction
  3. wave-net for next character prediction
  4. GPT for next char / token prediction
  5. Building up tokenizer for GPT2
  6. Improving attention for faster training and inference
  7. Fine tuning on downstream tasks using CS 224N: Default Final Project: Build GPT-2
  8. Implementing SOTA Fine-tuning methods, or put those to some use
  9. Try quantization approaches, whatever possible at small scale

Steps 1 - 4 will look like Karpathy’s tutorials, coz they are obviously inspired from his series. I did code them along the videos over a year back, now am going to code it with as minimal help as possible.

While doing all these, I will also be going through gist of seminal papers as and when their use comes along.

Let’s begin, cheers!