Reinforcement Learning from Human Feedback. The technique that makes language models actually useful as assistants. Humans rank model outputs by quality, a reward model learns these preferences, and the language model is fine-tuned to maximize the reward. How ChatGPT, Claude, and others learned to be helpful.
A learning approach where an agent learns by interacting with an environment and receiving rewards or penalties.
A model trained to predict how helpful, harmless, and honest a response is, based on human preferences.
Fine-tuning a language model on datasets of instructions paired with appropriate responses.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.