Analysis Originals Models Research Startups Tools

Home/Glossary

AI Glossary

Your guide to understanding AI and machine learning terminology. From transformers and attention to RLHF and fine-tuning — every term explained in plain language.

178 terms found

A

Activation Function

A mathematical function applied to a neuron's output that introduces non-linearity into the network.

Adam Optimizer

An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.

AGI

Artificial General Intelligence.

AI Agent

An autonomous AI system that can perceive its environment, make decisions, and take actions to achieve goals.

AI Alignment

The research field focused on making sure AI systems do what humans actually want them to do.

AI Safety

The broad field studying how to build AI systems that are safe, reliable, and beneficial.

Anthropic

An AI safety company founded in 2021 by former OpenAI researchers, including Dario and Daniela Amodei.

Artificial Intelligence

The science of creating machines that can perform tasks requiring human-like intelligence — reasoning, learning, perception, language understanding, and decision-making.

ASI

Artificial Superintelligence.

Attention

A mechanism that lets neural networks focus on the most relevant parts of their input when producing output.

Autoencoder

A neural network trained to compress input data into a smaller representation and then reconstruct it.

Autonomous AI

AI systems capable of operating independently for extended periods without human intervention.

Autoregressive Model

A model that generates output one piece at a time, with each new piece depending on all the previous ones.

B

Backpropagation

The algorithm that makes neural network training possible.

Batch Normalization

A technique that normalizes the inputs to each layer in a neural network, making training faster and more stable.

Batch Size

The number of training examples processed together before the model updates its weights.

Beam Search

A decoding strategy that keeps track of multiple candidate sequences at each step instead of just picking the single best option.

Benchmark

A standardized test used to measure and compare AI model performance.

BERT

Bidirectional Encoder Representations from Transformers.

Bias

In AI, bias has two meanings.

BPE

Byte Pair Encoding.

C

Catastrophic Forgetting

When a neural network trained on new data suddenly loses its ability to perform well on previously learned tasks.

Chain of Thought

A prompting technique where you ask an AI model to show its reasoning step by step before giving a final answer.

Chatbot

An AI system designed to have conversations with humans through text or voice.

Chinchilla

A research paper from DeepMind that proved most large language models were over-sized and under-trained.

Classification

A machine learning task where the model assigns input data to predefined categories.

Claude

Anthropic's family of AI assistants, including Claude Haiku, Sonnet, and Opus.

CLIP

Contrastive Language-Image Pre-training.

CNN

Convolutional Neural Network.

Compute

The processing power needed to train and run AI models.

Computer Vision

The field of AI focused on enabling machines to interpret and understand visual information from images and video.

Constitutional AI

An approach developed by Anthropic where an AI system is trained to follow a set of principles (a 'constitution') rather than relying solely on human feedback for every decision.

Context Window

The maximum amount of text a language model can process at once, measured in tokens.

Contrastive Learning

A self-supervised learning approach where the model learns by comparing similar and dissimilar pairs of examples.

Conversational AI

AI systems designed for natural, multi-turn dialogue with humans.

Cross-Attention

An attention mechanism where one sequence attends to a different sequence.

CUDA

NVIDIA's parallel computing platform that lets developers use GPUs for general-purpose computing.

D

DALL-E

OpenAI's text-to-image generation model.

Data Augmentation

Techniques for artificially expanding training datasets by creating modified versions of existing data.

Data Poisoning

Deliberately corrupting training data to manipulate a model's behavior.

Decoder

The part of a neural network that generates output from an internal representation.

Deep Learning

A subset of machine learning that uses neural networks with many layers (hence 'deep') to learn complex patterns from large amounts of data.

Deepfake

AI-generated media that realistically depicts a person saying or doing something they never actually did.

DeepMind

A leading AI research lab, now part of Google.

Diffusion Model

A generative AI model that creates data by learning to reverse a gradual noising process.

Distillation

A technique where a smaller 'student' model learns to mimic a larger 'teacher' model.

DPO

Direct Preference Optimization.

Dropout

A regularization technique that randomly deactivates a percentage of neurons during training.

E

Edge AI

Running AI models directly on local devices (phones, laptops, IoT devices) instead of in the cloud.

Embedding

A dense numerical representation of data (words, images, etc.

Emergent Abilities

Capabilities that appear suddenly as language models reach certain sizes.

Emergent Behavior

Capabilities that appear in AI models at scale without being explicitly trained for.

Encoder

The part of a neural network that processes input data into an internal representation.

Encoder-Decoder

A neural network architecture with two parts: an encoder that processes the input into a representation, and a decoder that generates the output from that representation.

Epoch

One complete pass through the entire training dataset.

Ethical AI

The practice of developing AI systems that are fair, transparent, accountable, and respect human rights.

Evaluation

The process of measuring how well an AI model performs on its intended task.

Explainability

The ability to understand and explain why an AI model made a particular decision.

F

Feature Extraction

The process of identifying and pulling out the most important characteristics from raw data.

Federated Learning

A training approach where the model learns from data spread across many devices without that data ever leaving those devices.

Few-Shot Learning

The ability of a model to learn a new task from just a handful of examples, often provided in the prompt itself.

Fine-Tuning

The process of taking a pre-trained model and continuing to train it on a smaller, specific dataset to adapt it for a particular task or domain.

Flash Attention

An optimized attention algorithm that's mathematically equivalent to standard attention but runs much faster and uses less GPU memory.

Foundation Model

A large AI model trained on broad data that can be adapted for many different tasks.

Function Calling

A capability that lets language models interact with external tools and APIs by generating structured function calls.

G

GAN

Generative Adversarial Network.

GELU

Gaussian Error Linear Unit.

Gemini

Google's flagship multimodal AI model family, developed by Google DeepMind.

Generative AI

AI systems that create new content — text, images, audio, video, or code — rather than just analyzing or classifying existing data.

GPT

Generative Pre-trained Transformer.

GPU

Graphics Processing Unit.

Gradient Accumulation

A technique that simulates larger batch sizes by accumulating gradients over multiple forward passes before updating weights.

Gradient Descent

The fundamental optimization algorithm used to train neural networks.

Grounding

Connecting an AI model's outputs to verified, factual information sources.

Guardrails

Safety measures built into AI systems to prevent harmful, inappropriate, or off-topic outputs.

H

Hallucination

When an AI model generates confident-sounding but factually incorrect or completely fabricated information.

Hallucination Detection

Methods for identifying when an AI model generates false or unsupported claims.

Hugging Face

The leading platform for sharing and collaborating on AI models, datasets, and applications.

Hyperparameter

A setting you choose before training begins, as opposed to parameters the model learns during training.

I

Image Classification

The task of assigning a label to an image from a set of predefined categories.

ImageNet

A massive image dataset containing over 14 million labeled images across 20,000+ categories.

In-Context Learning

A model's ability to learn new tasks simply from examples provided in the prompt, without any weight updates.

Inference

Running a trained model to make predictions on new data.

Instruction Tuning

Fine-tuning a language model on datasets of instructions paired with appropriate responses.

J

Jailbreak

A technique for bypassing an AI model's safety restrictions and guardrails.

K

Knowledge Distillation

Training a smaller model to replicate the behavior of a larger one.

Knowledge Graph

A structured representation of information as a network of entities and their relationships.

L

Language Model

An AI model that understands and generates human language.

Large Language Model

An AI model with billions of parameters trained on massive text datasets.

Latent Space

The compressed, internal representation space where a model encodes data.

Layer Normalization

A technique that normalizes activations across the features of each training example, rather than across the batch.

Learning Rate

A hyperparameter that controls how much the model's weights change in response to each update.

LLaMA

Meta's family of open-weight large language models.

LLM

Large Language Model.

LoRA

Low-Rank Adaptation.

Loss Function

A mathematical function that measures how far the model's predictions are from the correct answers.

LSTM

Long Short-Term Memory.

M

Machine Learning

A branch of AI where systems learn patterns from data instead of following explicitly programmed rules.

Masked Language Modeling

A pre-training technique where random words in text are hidden (masked) and the model learns to predict them from context.

Meta-Learning

Training models that learn how to learn — after training on many tasks, they can quickly adapt to new tasks with very little data.

Midjourney

A popular AI image generation service known for its distinctive artistic style.

Mistral

A French AI company that builds efficient, high-performance language models.

Mixture of Experts

An architecture where multiple specialized sub-networks (experts) share a model, but only a few activate for each input.

MMLU

Massive Multitask Language Understanding.

Model Collapse

A degradation that happens when AI models are trained on data generated by other AI models.

Multi-Head Attention

An extension of the attention mechanism that runs multiple attention operations in parallel, each with different learned projections.

Multimodal

AI models that can understand and generate multiple types of data — text, images, audio, video.

N

Narrow AI

AI systems designed for a specific task, as opposed to general intelligence.

Natural Language Processing

The field of AI focused on enabling computers to understand, interpret, and generate human language.

Neural Network

A computing system loosely inspired by biological brains, consisting of interconnected nodes (neurons) organized in layers.

Next-Token Prediction

The fundamental task that language models are trained on: given a sequence of tokens, predict what comes next.

NLP

Natural Language Processing.

NVIDIA

The dominant provider of AI hardware.

O

Object Detection

A computer vision task that identifies and locates objects within an image, drawing bounding boxes around each one.

Open Source AI

AI models whose weights, code, and sometimes training data are publicly released for anyone to use, modify, and build upon.

OpenAI

The AI company behind ChatGPT, GPT-4, DALL-E, and Whisper.

Optimization

The process of finding the best set of model parameters by minimizing a loss function.

Overfitting

When a model memorizes the training data so well that it performs poorly on new, unseen data.

P

Parameter

A value the model learns during training — specifically, the weights and biases in neural network layers.

Perplexity

A measurement of how well a language model predicts text.

Positional Encoding

Information added to token embeddings to tell a transformer the order of elements in a sequence.

Pre-Training

The initial, expensive phase of training where a model learns general patterns from a massive dataset.

Prompt Engineering

The art and science of crafting inputs to AI models to get the best possible outputs.

Prompting

The text input you give to an AI model to direct its behavior.

PyTorch

The most popular deep learning framework, developed by Meta.

Q

Quantization

Reducing the precision of a model's numerical values — for example, from 32-bit to 4-bit numbers.

R

RAG

Retrieval-Augmented Generation.

Reasoning

The ability of AI models to draw conclusions, solve problems logically, and work through multi-step challenges.

Recurrent Neural Network

A neural network architecture where connections form loops, letting the network maintain a form of memory across sequences.

Red Teaming

Systematically testing an AI system by trying to make it produce harmful, biased, or incorrect outputs.

Regression

A machine learning task where the model predicts a continuous numerical value.

Regularization

Techniques that prevent a model from overfitting by adding constraints during training.

Reinforcement Learning

A learning approach where an agent learns by interacting with an environment and receiving rewards or penalties.

ReLU

Rectified Linear Unit.

Representation Learning

The idea that useful AI comes from learning good internal representations of data.

Responsible AI

The practice of developing and deploying AI systems with careful attention to fairness, transparency, safety, privacy, and social impact.

Reward Model

A model trained to predict how helpful, harmless, and honest a response is, based on human preferences.

RLHF

Reinforcement Learning from Human Feedback.

RNN

Recurrent Neural Network.

RoPE

Rotary Position Embedding.

S

Sampling

The process of selecting the next token from the model's predicted probability distribution during text generation.

Scaling Laws

Mathematical relationships showing how AI model performance improves predictably with more data, compute, and parameters.

Self-Attention

An attention mechanism where a sequence attends to itself — each element looks at all other elements to understand relationships.

Self-Supervised Learning

A training approach where the model creates its own labels from the data itself.

Semantic Search

Search that understands meaning and intent rather than just matching keywords.

Sentiment Analysis

Automatically determining whether a piece of text expresses positive, negative, or neutral sentiment.

Softmax

A function that converts a vector of numbers into a probability distribution — all values between 0 and 1 that sum to 1.

Speech Recognition

Converting spoken audio into written text.

Stable Diffusion

An open-source image generation model released by Stability AI.

Structured Output

Getting a language model to generate output in a specific format like JSON, XML, or a database schema.

Supervised Learning

The most common machine learning approach: training a model on labeled data where each example comes with the correct answer.

Synthetic Data

Artificially generated data used for training AI models.

System Prompt

Instructions given to an AI model that define its role, personality, constraints, and behavior rules.

T

Temperature

A parameter that controls the randomness of a language model's output.

TensorFlow

Google's open-source deep learning framework.

Text-to-Image

AI models that generate images from text descriptions.

Text-to-Speech

AI systems that convert written text into natural-sounding spoken audio.

Token

The basic unit of text that language models work with.

Tokenizer

The component that converts raw text into tokens that a language model can process.

Tool Use

The ability of AI models to interact with external tools and systems — browsing the web, running code, querying APIs, reading files.

Top-P Sampling

A text generation method (also called nucleus sampling) that only considers tokens whose cumulative probability exceeds a threshold P.

TPU

Tensor Processing Unit.

Training

The process of teaching an AI model by exposing it to data and adjusting its parameters to minimize errors.

Transfer Learning

Using knowledge learned from one task to improve performance on a different but related task.

Transformer

The neural network architecture behind virtually all modern AI language models.

Turing Test

A test proposed by Alan Turing in 1950: if a human can't reliably tell whether they're talking to a machine or another human, the machine passes.

U

Underfitting

When a model is too simple to capture the patterns in the data, performing poorly on both training and test sets.

Unsupervised Learning

Machine learning on data without labels — the model finds patterns and structure on its own.

V

VAE

Variational Autoencoder.

Vector Database

A database optimized for storing and searching high-dimensional vectors (embeddings).

Vision Transformer

A transformer architecture adapted for image processing.

Voice Cloning

Using AI to create a synthetic copy of someone's voice from a small sample of their speech.

W

Weight

A numerical value in a neural network that determines the strength of the connection between neurons.

Whisper

OpenAI's open-source speech recognition model.

Word2Vec

One of the earliest successful word embedding models, from Google in 2013.

World Model

An AI system's internal representation of how the world works — understanding physics, cause and effect, and spatial relationships.

Y

YOLO

You Only Look Once.

Z

Zero-Shot Learning

A model's ability to perform a task it was never explicitly trained on, with no examples provided.

Navigate

Home
About Us
Newsletter
Search
Sitemap

Content

Original Analysis
Blog
Glossary
Best Lists
AI Tools

Categories

Models
Research
Startups
Robotics
Policy
Business
Analysis
Originals

Legal

Privacy Policy
Terms of Service

2026 Machine Brief. All rights reserved.