The word "open" is doing more work in AI than any other word in technology right now. And nobody agrees on what it means.
Meta says Llama is open source. They've released the model weights. You can download them. You can fine-tune them. You can run them on your own hardware. By any intuitive definition, that sounds open. Millions of developers treat it that way. Meta's licensing page has racked up hundreds of millions of downloads across the Llama family. The models power startups, research projects, and enterprise deployments on every continent.
But the Open Source Initiative, the organization that's maintained the definition of "open source" since 1998, says Meta is wrong. And in October 2024, they published the Open Source AI Definition (OSAID) version 1.0 to prove it.
Here's where it gets interesting. And contentious.
## What the OSI Actually Requires
The OSAID doesn't mess around. To qualify as open source AI, a system needs to provide four freedoms: use it for any purpose without permission, study how it works, modify it for any purpose, and share it with others. These mirror the four freedoms of the Free Software Definition that Richard Stallman wrote decades ago.
But the kicker is what OSI calls the "preferred form to make modifications." For a machine learning system, that means three things. First, sufficiently detailed information about the training data, including the complete description of all data used, where it came from, how it was processed, and how to obtain it. Second, the complete source code used to train and run the system. Third, the model parameters including weights.
Meta releases the weights. Meta releases inference code. Meta does not release detailed training data information. Neither does almost anyone else.
By OSI's definition, Llama isn't open source. Neither is Google's Gemma. Neither is Mistral's models. Neither is, arguably, any major model released in the last two years that calls itself "open."
The term OSI prefers for what Meta does? "Open weight." Not open source. The difference matters.
## Why Meta Disagrees
Meta's position is straightforward: the traditional open source definition was written for software, not for AI. Software is deterministic. You compile the source code, you get the same binary. AI isn't like that. Training data is consumed during the process, transformed into weights through optimization. You can't "decompile" a model back into its training data.
Mark Zuckerberg has been vocal about this. In a July 2024 open letter, he argued that open source AI benefits from sharing weights and code, even without full training data transparency. Meta's Llama license (a custom license, not an OSI-approved one) allows broad use with some restrictions: companies with over 700 million monthly active users need a separate license, and you can't use Llama outputs to train competing models.
Meta's practical argument is hard to dismiss. Llama 3.1, released in July 2024 with 405 billion parameters, was the most capable openly available model at the time. Thousands of developers built on it. Hundreds of companies deployed it. The community treated it as open source because it functioned like open source for their purposes. Do they care whether the OSI approves?
Yann LeCun, Meta's chief AI scientist, has been even more blunt. He's argued repeatedly on social media that requiring training data disclosure is impractical and would effectively prevent any large model from being open source, since training datasets contain copyrighted material, licensed content, and data that can't legally be redistributed.
He's got a point. And the OSI knows it.
## The Data Problem Nobody Can Solve
This is where the entire debate runs aground. Every major language model is trained on data that the creators probably don't have the right to redistribute. Web scrapes. Books. Academic papers. News articles. Code repositories with various licenses. Social media posts. The legal status of using this data for training is still being litigated in courts around the world.
If you require full training data disclosure and access as a condition of "open source," you're requiring something that might be illegal in many jurisdictions. The New York Times is suing OpenAI. Getty Images sued Stability AI. Authors' guilds have filed class actions. These cases haven't been fully resolved. Nobody knows yet what's legal.
The OSI tried to thread this needle. The OSAID says you need "sufficiently detailed information about the data" but acknowledges that some data may be "unshareable." In those cases, you have to disclose the provenance, scope, and characteristics, even if you can't share the actual data. The listing of publicly available training data must include where to get it. Third-party data must list where to obtain it, "including for fee."
Critics call this a compromise that satisfies nobody. You can meet the OSAID's data requirement without actually enabling someone to reproduce your model. So what's the point?
## The Real Stakes
This isn't just academics arguing about definitions. There's real money and real power at play.
If "open source AI" gets defined narrowly, requiring full data transparency, code, and weights, then very few models qualify. That means the "open source AI" label becomes aspirational, something small research labs might achieve but major companies won't. Meta loses its marketing advantage. The term becomes meaningless for most practical purposes.
If "open source AI" gets defined broadly, covering any model that releases weights, then the term loses its teeth. A company could release weights with a restrictive license, call it "open source," and benefit from the goodwill without providing the transparency that the open source movement was built to ensure. That's arguably what's already happening.
The European Union's AI Act treats open source models differently, exempting them from certain compliance requirements. Which definition of "open source" the EU adopts has direct regulatory consequences. If Meta's models qualify as open source under the AI Act, they get a lighter regulatory burden. If they don't, Meta faces the same compliance costs as closed-model providers.
France has been lobbying hard for a broad definition that would benefit Mistral, its national AI champion. Germany and the Nordics have pushed for stricter definitions that align more with OSI's approach. The European Commission hasn't fully resolved this yet.
In the US, the National Institute of Standards and Technology (NIST) has been developing its own frameworks for AI transparency, and the definition of "open" factors into how those frameworks get applied.
## The Compromise Nobody Likes
Some researchers have proposed a spectrum approach. Instead of a binary open/closed distinction, rate models on a transparency scale. EleutherAI proposed a framework in 2024 that evaluated openness across multiple dimensions: training code, inference code, model weights, training data, data preprocessing code, and technical documentation.
Under this framework, Meta's Llama would score high on weights, inference code, and documentation, but low on training data. That's informative. It tells developers exactly what they're getting.
The problem is that spectrums don't fit neatly into licenses. You can't write a license that says "this is 60% open source." Procurement officers at enterprises need a clear yes or no. Regulatory frameworks need a clear definition. Courts need a clear standard.
So we're stuck. And the people stuck in the middle are the developers actually building things.
## What Developers Actually Care About
I've talked to dozens of engineers who work with "open" models daily. Here's what they tell me.
They want to download the weights and run inference locally. They want to fine-tune for their use case without asking permission. They want commercial use rights. They want some confidence the model won't be yanked or relicensed without warning. And they'd like to know, in general terms, what the model was trained on.
Most of them don't need to reproduce the training run from scratch. Most of them don't need the actual training data. Most of them don't care about the OSI's opinion. They care about whether the license lets them ship a product.
By that standard, Meta's Llama, Google's Gemma, Mistral's models, and many others are "open enough." They're not open source by the historical definition. They're something new. And maybe we need a new term for it.
"Open weight" is gaining traction. So is "source-available," borrowed from the software world. Neither has the marketing power of "open source," which is exactly why Meta keeps using that phrase.
## Where This Goes
The OSI published OSAID 1.0 in October 2024. It's been endorsed by Mozilla, Creative Commons, the Electronic Frontier Foundation, and several academic institutions. It hasn't been endorsed by Meta, Google, Mistral, or any major model provider. That tells you everything.
In December 2025, Anthropic and others helped establish the Agentic AI Foundation under the Linux Foundation. That foundation is focused on interoperability protocols like MCP, not on defining "open source." But the Linux Foundation's involvement signals that the governance of open AI standards is fragmenting across multiple institutions.
The most likely outcome? Two parallel definitions coexist for years. The OSI definition remains the idealistic standard that few models meet. The industry definition, "we released the weights, call it open," becomes the practical standard that everyone uses. Policy makers muddle through, applying different definitions in different contexts.
It's messy. But so was the original open source definition debate in the late 1990s. Bruce Perens and Eric Raymond spent years arguing about what "open source" meant for software. It took a decade for the definition to stabilize and for businesses to build around it.
AI's definition war is just getting started. The question isn't who's right. It's whose definition wins in practice. And right now, Meta's version, imperfect and controversial, is the one developers are building on. The OSI can publish all the definitions it wants. The community votes with its downloads.
That's 200 million downloads for Llama and counting. The market has spoken. Whether the definition follows is a different question entirely.
Models9 min read
The Open Source AI Definition War: Who Gets to Decide What 'Open' Means?
Meta calls Llama open source. The Open Source Initiative says it isn't. This isn't a semantic debate. It's a fight that will determine who controls AI's future, who gets locked out, and whether 'open' means anything at all.