Anthropic Is at War With Itself
These are not the words you want to hear when it comes to human extinction, but I was hearing them: “Things are moving uncomfortably fast.” I was sitting in a conference room with Sam Bowman, a safety researcher at Anthropic. Worth $183 billion at the latest estimate, the AI firm has every incentive to speed things up, ship more products, and develop more advanced chatbots to stay competitive with the likes of OpenAI, Google, and the industry’s other giants. But Anthropic is at odds with itself—thinking deeply, even anxiously, about seemingly every decision.
Anthropic has positioned itself as the AI industry’s superego: the firm that speaks with the most authority about the big questions surrounding the technology, while rival companies develop advertisements and affiliate shopping links (a difference that Anthropic’s CEO, Dario Amodei, was eager to call out during an interview in Davos last week). On Monday, Amodei published a lengthy essay, “The Adolescence of Technology,” about the “civilizational concerns” posed by what he calls “powerful AI”—the very technology his firm is developing. The essay has a particular focus on democracy, national security, and the economy. “Given the horror we’re seeing in Minnesota, its emphasis on the importance of preserving democratic values and rights at home is particularly relevant,” Amodei posted on X, making him one of very few tech leaders to make a public statement against the Trump administration’s recent actions.
This rhetoric, of course, serves as good branding—a way for Anthropic to stand out in a competitive industry. But having spent a long time following the company and, recently, speaking with many of its employees and executives, including Amodei, I can say that Anthropic is at least consistent. It messages about the ethical issues surrounding AI constantly, and it appears unusually focused on user safety. Bowman’s job, for example, is to vet Anthropic’s products before they’re released into the world, making sure that they will not spew, say, white-supremacist talking points; push users into delusional crises; or generate nonconsensual porn.
So far, the effort seems to be working: Unlike other popular chatbots, including OpenAI’s ChatGPT and Elon Musk’s Grok, Anthropic’s bot, Claude, has not had any major public blowups despite being as advanced as, and by some measures more advanced than, the rest of the field. (That may be in part because its chatbot does not generate images and has a smaller user base than some rival products.) But although Anthropic has so far dodged the various scandals that have plagued other large language models, the company has not inspired much faith that such problems will be avoided forever. When I met Bowman last summer, the company had recently divulged that, in experimental settings, versions of Claude had demonstrated the ability to blackmail users and assist them when they ask about making bioweapons. But the company has pushed its models onward anyway, and now says that Claude writes a good chunk—and in some instances all—of its own code.
Anthropic publishes white papers about the terrifying things it has made Claude capable of (“How LLMs Could Be Insider Threats,” “From Shortcuts to Sabotage”), and raises these issues to politicians. OpenAI CEO Sam Altman and other AI executives also have long spoken in broad, aggrandizing terms about AI’s destructive potential, often to their own benefit. But those competitors have released junky TikTok clones and slop generators. Today, Anthropic’s only major consumer product other than its chatbot is Claude Code, a powerful tool that promises to automate all kinds of work, but is nonetheless targeted to a relatively small audience of developers and coders.
The company’s discretion has resulted in a corporate culture that doesn’t always make much sense. Anthropic comes across as more sincerely committed to safety than its competitors, but it is also moving full speed toward building tools that it acknowledges could be horrifically dangerous. The firm seems eager for a chance to stand out. But what does Anthropic really stand for?
Founded in 2021 by seven people who splintered off from OpenAI, Anthropic is full of staff and executives who come across as deeply, almost pathologically earnest. I sat in on a meeting of Anthropic’s Societal Impacts team, a small group dedicated to studying how AI affects work, education, and more. This was a brainstorming session: The team wanted to see if it could develop AI models that work better with people than alone, which, the group reasoned, could help prevent or slow job loss. A researcher spoke up. He pressed the team to consider that, in the very near future, AI models might just be better than humans at everything. “Basically, we’re cooked,” he said. In which case, this meeting was nothing more than a “lovely thought exercise.” The group agreed this was possible. Then it moved on.
The researcher referred to his brief, existential interruption as “classic Anthropic.” Hyperrational thought experiments, forceful debates on whether AI could be shaped for the better, an unshakable belief in technological progress—these are classic Anthropic qualities. They trickle down from the top. A few weeks after the Societal Impacts meeting, I wanted to see what Amodei himself thought about all of this. If Altman is the AI boom’s great salesman and Demis Hassabis, the CEO of Google DeepMind and a Nobel laureate, its scientist, then Amodei is the closest the industry has to a philosopher. He is also responsible for some of the technical research that made ChatGPT possible. “Whenever I say ‘AI,’ people think about the thing they’re using today,” Amodei told me, hands clasped and perched atop his head. “That’s almost never where my mind is. My mind is almost always at: We’re releasing a new version every three months. Where are we gonna be eight versions from now? In two years?”
When he was at OpenAI, Amodei wrote an internal document called “The Big Blob of Compute.” It laid out his belief that AI models improve as a function of the resources put into them. More power, more data, more chips, better AI. That belief now animates the entire industry. Such unwavering faith in AI progress is perhaps Anthropic’s defining feature. The company has hired a “model welfare” researcher to study whether Claude can experience suffering or is conscious. The Societal Impacts team has set up a miniature, AI-run vending machine in the firm’s cafeteria to study whether the technology could autonomously operate a small business selling snacks and trinkets. Claude selects inventory, sets prices, and requests refills, while humans just restock the shelves. Welcome to the singularity.
Amodei and the rest of the group founded Anthropic partly because of disagreements over how to prepare the world for AI. Amodei is especially worried about job displacement, telling me that AI could erase a large portion of white-collar jobs within five years; he dedicated an entire section of “The Adolescence of Technology” to the danger that the AI boom might accumulate tremendous wealth primarily to firms such as his own.
Even with this and other gloomy forecasts of his, Amodei has bristled at the notion that he and his firm are “doomers”—that their primary motivation is preventing AI from wiping out a large number of jobs or lives. “I tend to be fairly optimistic,” he told me. In addition to “The Adolescence of Technology,” Amodei has published a 14,000-word manifesto called “Machines of Loving Grace” that comprehensively details a utopian vision for his technology: eliminating almost all disease, lifting billions out of poverty, doubling human lifespan. There is not a hint of irony; the essay envisions people being “literally moved to tears” by the majesty of AI’s accomplishments. Amodei’s employees cited it to me in conversation numerous times. Meanwhile, Altman trolls on X, and Musk seems to exist in a continuum of AI slop and conspiracy theories.
When Anthropic launched Claude, in 2023, the bot’s distinguishing feature was a “Constitution” that the model was trained on detailing how it should behave; last week, Anthropic revamped the document into a 22,000-word treatise on how to make Claude a moral and sincere actor. Claude, the constitution’s authors write, has the ability to foster emotional dependence, design bioweapons, and manipulate its users, so it’s Anthropic’s responsibility to instill upright character in Claude to avoid these outcomes. “Once we decide to create Claude, even inaction is a kind of action,” they write. No other firm had, or has, any truly comparable document.
Amodei says he wants rival companies to act in ways he believes are more responsible. Several of Anthropic’s major AI-safety initiatives and research advances have indeed been adopted by top competitors, such as its approach to preventing the use of AI to build bioweapons. And OpenAI has shared a “Model Spec,” its far more streamlined and pragmatic answer to Anthropic’s constitution—which contains no talk of ChatGPT’s “character” or “preserving important societal structures.” (OpenAI has a corporate partnership with The Atlantic.)
All of this helps Anthropic’s bottom line, of course: The emphasis on responsibility is “very attractive to large enterprise businesses which are also quite safety-, brand-conscious,” Daniela Amodei, Anthropic’s president (and Dario’s sister), told me from a sweaty conference room in Anthropic’s old headquarters in 2024. Nearly two years later, Anthropic controls 40 percent of the enterprise-AI market. The Amodeis hopes their commercial success will pressure competitors to more aggressively prioritize safety as well.
That said, it’s not always clear that these efforts to spark a “race to the top”—another phrase of Amodei’s that his employees invoke constantly—have been successful. Anthropic’s research established AI sycophancy as an issue well before “AI psychosis” emerged, yet AI psychosis still became something that many people apparently suffer from. Amodei recognizes that his own products aren’t perfect, either. “I absolutely do not want to warrant and guarantee that we will never have these problems,” he said. Several independent AI researchers, including some who have partnered with Anthropic to test Claude for various risks, told me that although Anthropic appears more committed to AI safety than its competitors, that’s a low bar.
Anthropic’s mode is generally to publish information about AI models and wait for the world to make the hard calls about how to control or regulate them. The main regulatory proposal of Jack Clark, a co-founder of Anthropic and its head of policy, is that governments establish “transparency” requirements, or some sort of mandated reporting about what internal tests reveal about AI products. But the company is particular about what it deems worth publishing. The firm does not, for instance, share much about its AI-training data or carbon footprint. When I asked Clark about how much information remains hidden—particularly in terms of how Anthropic’s AI tools are actually developed—he argued that transparency into how AI models are produced isn’t all that important. (Some of that information is also, presumably, proprietary.) Rather, Clark told me, the outcomes of the technology are what matter.
There is a “well-established norm that whatever goes on inside a factory is by and large left up to the innovator that’s built that factory, but you care a lot about what comes out of the factory,” he said, explaining why he believes that AI companies sharing information about how their products are made matters less than reporting what they can do. Typically the government “reaches inside” the factory, he said, only when something in the output—say, heavy metals—raises cause for concern. Never mind the long history of regulation dictating what goes on inside factories—emergency exits in clothing factories, cleanliness standards in meatpacking facilities, and so on. (Clark did note that laws sometimes need to change, and that they haven’t yet adapted to AI.)
He brought up Wall Street, of all examples, to make his point. Lawmakers “thought they had transparency into financial systems,” he said—that banks and hedge funds and so on were giving reliable reports on their dealings. “Then the financial crash happened,” regulators realized that transparency was inadequate and gameable, and Congress changed the law. (President Trump then changed much of it back.) In the long run, Clark seemed to feel, this was the system working as it should. But his comparison also raises the possibility that before anybody can figure out how to get the AI boom right, something must go horribly wrong.
In mid-September, Anthropic cybersecurity experts detected unusual activity among a group of Claude users. They came to suspect that it was a major, AI-enabled Chinese cyberespionage campaign—an attempt by foreign actors to use Claude to automate the theft of sensitive information. Anthropic promptly shut the operation down, published a report, and sent Logan Graham, who heads a team at the company that evaluates advanced uses of AI, to explain the situation to Congress.
In theory, this sequence represented Anthropic’s philosophy at work: Detect risks posed by AI and warn the public. But the incident also underscored how unpredictable, and uncontrollable, the environment really is. Months before the Chinese hack, Graham told me that he felt “pretty good” about the precautions the company had taken around cyberthreats.
Nobody can foresee all of the ways any AI product might be used, for good or ill, but that’s exactly why Anthropic’s sanctimony can seem silly. For all Amodei’s warnings about the possible harms of automation, Anthropic’s bots themselves are among the products that may take away jobs; many consider Claude the best AI at coding, for instance. After one of my visits to Anthropic’s offices, I went to an event for software engineers a few blocks away at which founders gave talks about products developed with Anthropic software. Someone demonstrated a tool that could automate outreach for job recruitment—leading one attendee to exclaim, with apparent glee, “This is going to destroy an entire industry!”
When I asked several Anthropic employees if they’d want to slow down the AI boom in an ideal world, none seemed to have ever seriously considered the question; it was too far-fetched a possibility, even for them. Joshua Batson, an interpretability researcher at Anthropic—he studies the labyrinthine inner workings of AI models—told me that it would be nice if the industry could go half as fast. Jared Kaplan, a co-founder of Anthropic and the firm’s chief science officer, told me he’d prefer it if AGI, or artificial general intelligence, arrived in 2032 rather than, say, 2028; Bowman, the safety researcher, said he thought slowing down for just a couple of months might be enough. Everyone seemed to believe, though, that AI-safety research itself could eventually be automated with Claude—and once that happens, they reasoned, their tests could keep up with the AI’s exponentially improving capabilities.
Like so many others in the industry, the employees I spoke with also contended that neither Anthropic nor any other AI company could actually slow development down. “The world gets to make this decision, not companies,” Clark told me, seated cross-legged on his chair, and “the system of capital markets says, Go faster.” So they are. Anthropic is reportedly fundraising at a $350 billion valuation, and its advertisements litter Instagram and big-city billboards. This month, the company launched a version of its Claude Code product geared toward non-software engineers called Claude Cowork. And in July, as first reported in Wired, Amodei wrote an internal memo to employees that Anthropic would seek investments from the United Arab Emirates and Qatar, which, in his words, would likely enrich “dictators.” Warnings about the dangers of authoritarian AI have been central in Anthropic’s public messaging; “Machines of Loving Grace” includes dire descriptions of the threat of “authoritarian” AI.
When I brought this up to Amodei, he cut me off. “We never made a commitment not to seek funding from the Middle East,” he said. “One of the traps you can fall into when you’re doing a good job running a responsible company is every decision that you make” can be “interpreted as a moral commitment.” There was no “pressing need” to seek Middle Eastern funding before, and doing so entailed “complexities,” he said. I took his implication to be that the intensive capital demands of the AI race now made such investments a necessity. Still, such investors, Amodei said, wouldn’t have any control over his firm. A few days after we spoke, Anthropic announced the Qatar Investment Authority as a “significant” investor in a new fundraising round.
If you zoom out enough, and perhaps not even all that far, Anthropic stands for the same things that OpenAI, Google, Meta, and anyone else in the AI race do: to build fantastically powerful chatbots and use them to transform the world and beat the competition. Across the company, the belief in AI’s potential is messianic. AI “presents one of the only technologies” that gets us out of the challenges ahead for humanity, Clark told me: climate change, aging populations, resource contention, authoritarianism, war. Without AI, he said, there will be more and more “Mad Max–like swaths of the world.”
Trenton Bricken, who works on AI safety at Anthropic, took this notion to an even greater extreme: He would ideally want the AI industry to slow down, but “every year that we stall, there are lots of people suffering who otherwise would not,” he told me, referring to the possibility that AI will eventually cure diseases and achieve everything else outlined in “Machines of Loving Grace.” His colleague Sholto Douglas claimed that such a delay “comes at the cost of millions of lives.”
Perhaps the greatest confusion at Anthropic is between theory and practice—the idea of safe AI versus the speed necessary to win the AI race. A corporate culture built around deep thought experiments and genuine disagreements about the future also has to sell AI. In the company’s view, these ends are complementary; better for it to responsibly usher in the AI future than Elon Musk or China. But that’s also a convenient way to justify an any-means-necessary approach to progress. I thought of that automated vending machine that the company had set up in its office. Claude ran the business into the ground in only a month through a string of very poor pricing and stocking decisions. But none of those really mattered: Anthropic had placed the machine next to all the free snacks in the office canteen.
When I asked Amodei recently about how he could justify the breakneck pace given the concerns he has over safety, he expressed total confidence in his staff—and also floated a new idea. Perhaps, he suggested, Claude will become so intelligent in the very near future that the bot will enable something radical: “Maybe at some point in 2027, what we want to do is just slow things down,” he said, and let the models fix themselves. “For just a few months.”