A research paper from DeepMind that proved most large language models were over-sized and under-trained.
A research paper from DeepMind that proved most large language models were over-sized and under-trained. It showed that given a fixed compute budget, it's better to train a smaller model on more data than a bigger model on less data. Changed how the industry thinks about scaling and influenced LLaMA's design.
Mathematical relationships showing how AI model performance improves predictably with more data, compute, and parameters.
The initial, expensive phase of training where a model learns general patterns from a massive dataset.
The processing power needed to train and run AI models.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.