An optimized attention algorithm that's mathematically equivalent to standard attention but runs much faster and uses less GPU memory.
An optimized attention algorithm that's mathematically equivalent to standard attention but runs much faster and uses less GPU memory. Achieves this through careful memory management — avoiding materializing the full attention matrix. Now standard in most LLM implementations. A hardware-aware algorithm breakthrough.
A mechanism that lets neural networks focus on the most relevant parts of their input when producing output.
The neural network architecture behind virtually all modern AI language models.
Graphics Processing Unit.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.