Neural network overview
Before Everything Else:
So one day, out of no where, I got thinking: How does a language model generate texts? I searched, and found a neural network course by Andrej Karpathy. Immediately I dived in.
Structure/History:
The different stages of neural networks are the following: Bigram (one character predicts the next one with a lookup table of counts) MLP, following Bengio et al. 2003 CNN, following DeepMind WaveNet 2016 RNN, following Mikolov et al. 2010 LSTM, following Graves et al. 2014 GRU, following Kyunghyun Cho et al. 2014 Transformer, following Vaswani et al. 2017
The series mainly focused on Bigram models and MLP, and modern Transformers.
Brief Summary:
A neural network first tokenizes inputs - transform inputs into tokens that the machine can understand. Then, it feed these tokens into its layers, in which mathematical functions are applied, then onto the next. During training, the results produced are evaluated using a loss function, and we backward on the loss function to get the gradiants of each individual parameters, and update each of them such that the loss function’s value can be decreased accordingly.
What I did:
I followed the series and build a clone of the makemore neural network by Andrej Karpathy. A lot of the other concepts I still find puzzling, but I will get to the end of them.