A training approach where the model creates its own labels from the data itself.
A training approach where the model creates its own labels from the data itself. Masking words and predicting them (BERT) or predicting the next word (GPT) are self-supervised tasks. Enables training on massive unlabeled datasets, which is why it powers most modern AI — labeled data is expensive and scarce.
The initial, expensive phase of training where a model learns general patterns from a massive dataset.
A pre-training technique where random words in text are hidden (masked) and the model learns to predict them from context.
A self-supervised learning approach where the model learns by comparing similar and dissimilar pairs of examples.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.