Rotary Position Embedding. A positional encoding method that encodes position information by rotating the embedding vectors. Used in LLaMA, Mistral, and many modern LLMs. Handles long sequences well and enables techniques like YaRN that extend the effective context window beyond what the model was trained on.
Information added to token embeddings to tell a transformer the order of elements in a sequence.
The neural network architecture behind virtually all modern AI language models.
The maximum amount of text a language model can process at once, measured in tokens.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.