Buildup to GPT2

llm

gpt2

transformer

machine learning

deep learning

neural network

Published

July 7, 2025

Am embarking on a month long journey, getting myself locked in for this.

I am going to code, record video, push on github my version of coding:

count based next character prediction
neural network based next character prediction
wave-net for next character prediction
GPT for next char / token prediction
Building up tokenizer for GPT2
Improving attention for faster training and inference
Fine tuning on downstream tasks using CS 224N: Default Final Project: Build GPT-2
Implementing SOTA Fine-tuning methods, or put those to some use
Try quantization approaches, whatever possible at small scale

Steps 1 - 4 will look like Karpathy’s tutorials, coz they are obviously inspired from his series. I did code them along the videos over a year back, now am going to code it with as minimal help as possible.

While doing all these, I will also be going through gist of seminal papers as and when their use comes along.

Let’s begin, cheers!