Buildup to GPT2
llm
gpt2
transformer
machine learning
deep learning
neural network
Am embarking on a month long journey, getting myself locked in for this.
I am going to code, record video, push on github my version of coding:
- count based next character prediction
- neural network based next character prediction
- wave-net for next character prediction
- GPT for next char / token prediction
- Building up tokenizer for GPT2
- Improving attention for faster training and inference
- Fine tuning on downstream tasks using CS 224N: Default Final Project: Build GPT-2
- Implementing SOTA Fine-tuning methods, or put those to some use
- Try quantization approaches, whatever possible at small scale
Steps 1 - 4 will look like Karpathy’s tutorials, coz they are obviously inspired from his series. I did code them along the videos over a year back, now am going to code it with as minimal help as possible.
While doing all these, I will also be going through gist of seminal papers as and when their use comes along.
Let’s begin, cheers!