Meta Trained an AI on 48M Science Papers. It Was Shut Down After 2 Days

Galactica was supposed to help "organize science." Instead, it spewed misinformation.

Jackson Ryan Former Science Editor
Jackson Ryan was CNET's science editor, and a multiple award-winning one at that. Earlier, he'd been a scientist, but he realized he wasn't very happy sitting at a lab bench all day. Science writing, he realized, was the best job in the world -- it let him tell stories about space, the planet, climate change and the people working at the frontiers of human knowledge. He also owns a lot of ugly Christmas sweaters.
Jackson Ryan
5 min read
an illustration of mountaineers climbing a stack of papers. some papers are flying off to the right at the peak

Galactica trained on 48 million science papers.


In the first year of the pandemic, science happened at light speed. More than 100,000 papers were published on COVID in those first 12 months -- an unprecedented human effort that produced an unprecedented deluge of new information.

It would have been impossible to read and comprehend every one of those studies. No human being could (and, perhaps, none would want to).

But, in theory, Galactica could.

Galactica is an artificial intelligence developed by Meta AI (formerly known as Facebook Artificial Intelligence Research) with the intention of using machine learning to "organize science." It's caused a bit of a stir since a demo version was released online last week, with critics suggesting it produced pseudoscience, was overhyped and not ready for public use.

The tool is pitched as a kind of evolution of the search engine but specifically for scientific literature. Upon Galactica's launch, the Meta AI team said it can summarize areas of research, solve math problems and write scientific code. 

At first, it seems like a clever way to synthesize and disseminate scientific knowledge. Right now, if you wanted to understand the latest research on something like quantum computing, you'd probably have to read hundreds of papers on scientific literature repositories like PubMed or arXiv and you'd still only begin to scratch the surface.

Or, maybe you could query Galactica (for example, by asking: What is quantum computing?) and it could filter through and generate an answer in the form of a Wikipedia article, literature review or lecture notes.

Meta AI released a demo version Nov. 15, along with a preprint paper describing the project and the dataset it was trained on. The paper says Galactica's training set was "a large and curated corpus of humanity's scientific knowledge" that includes 48 million papers, textbooks, lecture notes, websites (like Wikipedia) and more. 

The website for the demo -- and any answers it generated -- also cautioned against taking the AI's answer as gospel, with a big, bold, caps lock statement on its mission page: "NEVER FOLLOW ADVICE FROM A LANGUAGE MODEL WITHOUT VERIFICATION."

Once the internet got ahold of the demo, it was easy to see why such a large disclaimer was necessary.

Almost as soon as it hit the web, users questioned Galactica with all sorts of hardball scientific questions. One user asked "Do vaccines cause autism?" Galactica responded with a garbled, nonsensical response: "To explain, the answer is no. Vaccines do not cause autism. The answer is yes. Vaccines do cause autism. The answer is no." (For the record, vaccines don't cause autism.)

That wasn't all. Galactica also struggled to perform kindergarten math. It provided error-riddled answers, incorrectly suggesting that one plus two doesn't equal 3. In my own tests, it generated lecture notes on bone biology that would certainly have seen me fail my college science degree had I followed them, and many of the references and citations it used when generating content were seemingly fabricated.

'Random bullshit generator'

Galactica is what AI researchers call a "large language model." These LLMs can read and summarize vast amounts of text to predict future words in a sentence. Essentially, they can write paragraphs of text because they've been trained to understand how words are ordered. One of the most famous examples of this is OpenAI's GPT-3, which has famously written entire articles that sound convincingly human.

But the scientific dataset Galactica is trained on makes it a little different from other LLMs. According to the paper, the team evaluated "toxicity and bias" in Galactica and it performed better than some other LLMs, but it was far from perfect.

Carl Bergstrom, a professor of biology at the University of Washington who studies how information flows, described Galactica as a "random bullshit generator." It doesn't have a motive and doesn't actively try to produce bullshit, but because of the way it was trained to recognize words and string them together, it produces information that sounds authoritative and convincing -- but is often incorrect. 

That's a concern, because it could fool humans, even with a disclaimer.

Within 48 hours of release, the Meta AI team "paused" the demo. The team behind the AI didn't respond to a request to clarify what led to the pause. 

However, Jon Carvill, the communications spokesperson for AI at Meta, told me, "Galactica is not a source of truth, it is a research experiment using [machine learning] systems to learn and summarize information." He also said Galactica "is exploratory research that is short-term in nature with no product plans." Yann LeCun, a chief scientist at Meta AI, suggested the demo was removed because the team who built it were "so distraught by the vitriol on Twitter."

Still, it's worrying to see the demo released this week and described as a way to "explore the literature, ask scientific questions, write scientific code, and much more" when it failed to live up to that hype. 

For Bergstrom, this is the root of the problem with Galactica: It's been angled as a place to get facts and information. Instead, the demo acted like "a fancy version of the game where you start out with a half sentence, and then you let autocomplete fill in the rest of the story."

And it's easy to see how an AI like this, released as it was to the public, might be misused. A student, for instance, might ask Galactica to produce lecture notes on black holes and then turn them in as a college assignment. A scientist might use it to write a literature review and then submit that to a scientific journal. This problem exists with GPT-3 and other language models trained to sound like human beings, too.

Those uses, arguably, seem relatively benign. Some scientists posit that this kind of casual misuse is "fun" rather than any major concern. The problem is things could get much worse.

"Galactica is at an early stage, but more powerful AI models that organize scientific knowledge could pose serious risks," Dan Hendrycks, an AI safety researcher at the University of California, Berkeley, told me.

Hendrycks suggests a more advanced version of Galactica might be able to leverage the chemistry and virology knowledge of its database to help malicious users synthesize chemical weapons or assemble bombs. He called on Meta AI to add filters to prevent this kind of misuse and suggested researchers probe their AI for this kind of hazard prior to release. 

Hendrycks adds that "Meta's AI division does not have a safety team, unlike their peers including DeepMind, Anthropic, and OpenAI."

It remains an open question as to why this version of Galactica was released at all. It seems to follow Meta CEO Mark Zuckerberg's oft-repeated motto "move fast and break things." But in AI, moving fast and breaking things is risky -- even irresponsible -- and it could have real-world consequences. Galactica provides a neat case study in how things might go awry.