Science Is Drowning in AI Slop
On a frigid Norwegian afternoon earlier this month, Dan Quintana, a psychology professor at the University of Oslo, decided to stay in and complete a tedious task that he had been putting off for weeks. An editor from a well-known journal in his field had asked him to review a paper that they were considering for publication. It seemed like a straightforward piece of science. Nothing set off any alarm bells, until Quintana looked at the references and saw his own name. The citation of his work looked correct—it contained a plausible title and included authors whom he’d worked with in the past—but the paper it referred to did not exist.
Every day, on Bluesky and LinkedIn, Quintana had seen academics posting about finding these “phantom citations” in scientific papers. (The initial version of the Trump administration’s “MAHA Report” on children’s health, released last spring, contained more than half a dozen of them.) But until Quintana found a fake “Quintana” paper cited in a journal he was refereeing, he’d figured that the problem was limited to publications with lower standards. “When it happens at a journal that you respect, you realize how widespread this problem is,” he told me.
For more than a century, scientific journals have been the pipes through which knowledge of the natural world flows into our culture. Now they’re being clogged with AI slop.
Scientific publishing has always had its plumbing problems. Even before ChatGPT, journal editors struggled to control the quantity and quality of submitted work. Alex Csiszar, a historian of science at Harvard, told me that he has found letters from editors going all the way back to the early 19th century in which they complain about receiving unmanageable volumes of manuscripts. This glut was part of the reason that peer review arose in the first place. Editors would ease their workload by sending articles to outside experts. When journals proliferated during the Cold War spike in science funding, this practice first became widespread. Today it’s nearly universal.
But the editors and unpaid reviewers who act as guardians of the scientific literature are newly besieged. Almost immediately after large language models went mainstream, manuscripts started pouring into journal inboxes in unprecedented numbers. Some portion of this effect can be chalked up to AI’s ability to juice productivity, especially among non-English-speaking scientists who need help presenting their research. But ChatGPT and its ilk are also being used to give fraudulent or shoddy work a new veneer of plausibility, according to Mandy Hill, the managing director of academic publishing at Cambridge University Press & Assessment. That makes the task of sorting wheat from chaff much more time-consuming for editors and referees, and also more technically difficult. “From here on, it’s going to be a constant arms race,” Hill told me.
[Read: Scientific publishing is a joke]
Adam Day runs a company in the United Kingdom called Clear Skies that uses AI to help scientific publishers stay ahead of scammers. He told me that he has a considerable advantage over investigators of, say, financial fraud because the people he’s after publish the evidence of their wrongdoing where lots of people can see it. Day knows that individual scientists might go rogue and have ChatGPT generate a paper or two, but he’s not that interested in these cases. Like a narcotics detective who wants to take down a cartel, he focuses on companies that engage in industrialized cheating by selling papers in large quantities to scientist customers.
These “paper mills” have to do their work at scale, and so they tend to recycle their own materials, even to the point of putting out multiple papers with closely matching text. Day told me that he finds these templates by looking through the papers flagged as being fraudulent by scientific publishers. When he sees a high rate of retractions on a particular template, he trains his tool to look for other, unflagged papers that might have been produced the same way.
Some scientific disciplines have become hotbeds for slop. Publishers are sharing intelligence about the most egregious ones, according to Jennifer Wright, the head of research integrity and publication ethics at Cambridge University Press. Unfortunately, many are fields that society would very much like to be populated with genuinely qualified scientists—cancer research, for one. The mills have hit on a very effective template for a cancer paper, Day told me. Someone can claim to have tested the interactions between a tumor cell and just one protein of the many thousands that exist, and as long as they aren’t reporting a dramatic finding, no one will have much reason to replicate their results.
AI can also generate the images for a fake paper. A now-retracted 2024 review paper in Frontiers in Cell and Developmental Biology featured an AI-generated illustration of a rat with hilariously disproportionate testicles, which not only passed peer review but was published before anyone noticed. As embarrassing as this was for the journal, little harm was done. Much more worrying is the ability of generative AI to conjure up convincing pictures of thinly sliced tissue, microscopic fields, or electrophoresis gels that are commonly used as evidence in biomedical research.
Day told me that waves of LLM-assisted fraud have recently hit faddish tech-related fields in academia, including blockchain research. Now, somewhat ironically, the problem is affecting AI research itself. It’s easy to see why: The job market for people who can credibly claim to have published original research in machine learning or robotics is as strong, if not stronger, than the one for cancer biologists. There’s also a fraud template for AI researchers: All they have to do is claim to have run a machine-learning algorithm on some kind of data, and say that it produced an interesting outcome. Again, so long as the outcome isn’t too interesting, few people, if any, will bother to vet it.
[Read: Science is becoming less human]
Conference proceedings are the main publishing venue for articles in AI and other computer sciences, and in recent years they’ve been overrun with submissions. NeurIPS, one of the top AI conferences, has seen them double in five years. ICLR, the leading conference for deep learning, has also experienced an increase, and it appears to include a fair amount of slop: An LLM-detection start-up analyzed submissions for its upcoming meeting in Brazil and found more than 50 that included hallucinated citations. Most had not been caught during peer review.
That might be because many of the peer reviews were themselves done by AI. Pangram Labs recently analyzed thousands of peer reviews that were submitted to ICLR, and found that more than half of them were written with help from an LLM, and about a fifth of them were wholly AI-generated. Across the academic sciences, paper authors have even started using tiny white fonts to embed secret messages to LLM reviewers. They urge the AIs to rave about the paper they’re reading, to describe it as “groundbreaking” and “transformative,” and to save them the trouble of a tough revision by suggesting only easy fixes.
AI science slop has spread beyond the journals now, and is also overrunning other venues for disseminating research. In 1991, Paul Ginsparg, who was then a physicist at Los Alamos National Laboratory, set up a special server where his colleagues could upload their forthcoming papers right after they finished writing them. That way, they could get immediate feedback on these “preprints” while the notoriously slow peer-review process played out. The arXiv, as the server came to be called, grew quickly, and spawned sister sites in other disciplines. Together, they now form the fastest-moving firehose of new scientific knowledge that has ever existed. But in the months after ChatGPT was released, preprint servers experienced the same spike in submissions that journals did.
Ginsparg, who is now a professor of information science at Cornell, told me he hoped that this would be a short-lived trend, but the rate of submissions continues to rise. Every arXiv preprint now gets at least a brief glance by a scientist before it’s posted, to make sure it’s at least a plausible piece of science, but the models are getting better at clearing this hurdle. In 2025, Ginsparg collaborated with several colleagues on an analysis of submissions that had recently been posted to the arXiv. They found that scientists who appeared to be using LLMs were posting about 33 percent more papers than researchers who didn’t.
A similar influx of AI-assisted submissions has hit bioRxiv and medRxiv, the preprint servers for biology and medicine. Richard Sever, the chief science and strategy officer at the nonprofit organization that runs them, told me that in 2024 and 2025, he saw examples of researchers who had never once submitted a paper sending in 50 in a year. Research communities have always had to sift out some junk on preprint servers, but this practice makes sense only when the signal-to-noise ratio is high. “That won’t be the case if 99 out of 100 papers are manufactured or fake,” Sever said. “It’s potentially an existential crisis.”
Given that it’s so easy to publish on preprint servers, they may be the places where AI slop has its most powerful diluting effect on scientific discourse. At scientific journals, especially the top ones, peer reviewers like Quintana will look at papers carefully. But this sort of work was already burdensome for scientists, even before they had to face the glut of chatbot-made submissions, and the AIs themselves are improving, too. Easy giveaways, such as the false citation that Quintana found, may disappear completely. Automated slop-detectors may also fail. If the tools become too good, all of scientific publishing could be upended.
When I called A. J. Boston, a professor at Murray State University who has written about this issue, he asked me if I’d heard of the dead-internet conspiracy theory. Its adherents believe that on social media and in other online spaces, only a few real people create posts, comments, and images. The rest are generated and amplified by competing networks of bots. Boston said that in the worst-case scenario, the scientific literature might come to look something like that. AIs would write most papers, and review most of them, too. This empty back-and-forth would be used to train newer AI models. Fraudulent images and phantom citations would embed themselves deeper and deeper in our systems of knowledge. They’d become a permanent epistemological pollution that could never be filtered out.