Inside the U.K.’s Bold Experiment in AI Safety
In May 2023, three of the most important CEOs in artificial intelligence walked through the iconic black front door of No. 10 Downing Street, the official residence of the U.K. Prime Minister, in London. Sam Altman of OpenAI, Demis Hassabis of Google DeepMind, and Dario Amodei of Anthropic were there to discuss AI, following the blockbuster release of ChatGPT six months earlier.
After posing for a photo opportunity with then Prime Minister Rishi Sunak in his private office, the men filed through into the cabinet room next door and took seats at its long, rectangular table. Sunak and U.K. government officials lined up on one side; the three CEOs and some of their advisers sat facing them. After a polite discussion about how AI could bring opportunities for the U.K. economy, Sunak surprised the visitors by saying he wanted to talk about the risks. The Prime Minister wanted to know more about why the CEOs had signed what he saw as a worrying declaration arguing that AI was as risky as pandemics or nuclear war, according to two people with knowledge of the meeting. He invited them to attend the world’s first AI Safety Summit, which the U.K. was planning to host that November. And he managed to get each to agree to grant his government prerelease access to their companies’ latest AI models, so that a task force of British officials, established a month earlier and modeled on the country’s COVID-19 vaccine unit, could test them for dangers.
[time-brightcove not-tgx=”true”]Read More: Inside the U.K.’s AI Safety Summit
The U.K. was the first country in the world to reach this kind of agreement with the so-called frontier AI labs —the few groups responsible for the world’s most capable models. Six months later, Sunak formalized his task force as an official body called the AI Safety Institute (AISI), which in the year since has become the most advanced program inside any government for evaluating the risks of AI. With £100 million ($127 million) in public funding, the body has around 10 times the budget of the U.S. government’s own AI Safety Institute, which was established at the same time.
Inside the new U.K. AISI, teams of AI researchers and national-security officials began conducting tests to check whether new AIs were capable of facilitating biological, chemical, or cyberattacks, or escaping the control of their creators. Until then, such safety testing had been possible only inside the very AI companies that also had a market incentive to forge ahead regardless of what the tests found. In setting up the institute, government insiders argued that it was crucial for democratic nations to have the technical capabilities to audit and understand cutting-edge AI systems, if they wanted to have any hope of influencing pivotal decisions about the technology in the future. “You really want a public-interest body that is genuinely representing people to be making those decisions,” says Jade Leung, the AISI’s chief technology officer. “There aren’t really legitimate sources of those [decisions], aside from governments.”
In a remarkably short time, the AISI has won the respect of the AI industry by managing to carry out world-class AI safety testing within a government. It has poached big-name researchers from OpenAI and Google DeepMind. So far, they and their colleagues have tested 16 models, including at least three frontier models ahead of their public launches. One of them, which has not previously been reported, was Google’s Gemini Ultra model, according to three people with knowledge of the matter. This prerelease test found no significant previously unknown risks, two of those people said. The institute also tested OpenAI’s o1 model and Anthropic’s Claude 3.5 Sonnet model ahead of their releases, both companies said in documentation accompanying each launch. In May, the AISI launched an open-source tool for testing the capabilities of AI systems, which has become popular among businesses and other governments attempting to assess AI risks.
But despite these accolades, the AISI has not yet proved whether it can leverage its testing to actually make AI systems safer. It often does not publicly disclose the results of its evaluations, nor information about whether AI companies have acted upon what it has found, for what it says are security and intellectual-property reasons. The U.K., where it is housed, has an AI economy that was worth £5.8 billion ($7.3 billion) in 2023, but the government has minimal jurisdiction over the world’s most powerful AI companies. (While Google DeepMind is headquartered in London, it remains a part of the U.S.-based tech giant.) The British government, now controlled by Keir Starmer’s Labour Party, is incentivized not to antagonize the heads of these companies too much, because they have the power to grow or withdraw a local industry that leaders hope will become an even bigger contributor to the U.K.’s struggling economy. So a key question remains: Can the fledgling AI Safety Institute really hold billion-dollar tech giants accountable?
In the U.S., the extraordinary wealth and power of tech has deflected meaningful regulation. The U.K. AISI’s lesser-funded U.S. counterpart, housed in moldy offices in Maryland and Colorado, does not size up to be an exception. But that might soon change. In August, the U.S. AISI signed agreements to gain predeployment access to AI models from OpenAI and Anthropic. And in October, the Biden Administration released a sweeping national-security memorandum tasking the U.S. AISI with safety-testing new frontier models and collaborating with the NSA on classified evaluations.
While the U.K. and U.S. AISIs are currently partners, and have already carried out joint evaluations of AI models, the U.S. institute may be better positioned to take the lead by securing unilateral access to the world’s most powerful AI models should it come to that. But Donald Trump’s electoral victory has made the future of the U.S. AISI uncertain. Many Republicans are hostile to government regulation—and especially to bodies like the federally funded U.S. AISI that may be seen as placing obstacles in front of economic growth. Billionaire Elon Musk, who helped bankroll Trump’s re-election, and who has his own AI company called xAI, is set to co-lead a body tasked with slashing federal spending. Yet Musk himself has long expressed concern about the risks from advanced AI, and many rank-and-file Republicans are supportive of more national-security-focused AI regulations. Amid this uncertainty, the unique selling point of the U.K. AISI might simply be its stability—a place where researchers can make progress on AI safety away from the conflicts of interest they’d face in industry, and away from the political uncertainty of a Trumpian Washington.
On a warm June morning about three weeks after the big meeting at 10 Downing Street, Prime Minister Sunak stepped up to a lectern at a tech conference in London to give a keynote address. “The very pioneers of AI are warning us about the ways these technologies could undermine our values and freedoms, through to the most extreme risks of all,” he told the crowd. “And that’s why leading on AI also means leading on AI safety.” Explaining to the gathered tech industry that his was a government that “gets it,” he announced the deal that he had struck weeks earlier with the CEOs of the leading labs. “I’m pleased to announce they’ve committed to give early or priority access to models for research and safety purposes,” he said.
Behind the scenes, a small team inside Downing Street was still trying to work out exactly what that agreement meant. The wording itself had been negotiated with the labs, but the technical details had not, and “early or priority access” was a vague commitment. Would the U.K. be able to obtain the so-called weights—essentially the underlying neural network—of these cutting-edge AI models, which would allow a deeper form of interrogation than simply chatting with the model via text? Would the models be transferred to government hardware that was secure enough to test for their knowledge of classified information, like nuclear secrets or details of dangerous bioweapons? Or would this “access” simply be a link to a model hosted on private computers, thus allowing the maker of the model to snoop on the government’s evaluations? Nobody yet knew the answers to these questions.
In the weeks after the announcement, the relationship between the U.K. and the AI labs grew strained. In negotiations, the government had asked for full-blown access to model weights—a total handover of their most valuable intellectual property that the labs saw as a complete nonstarter. Giving one government access to model weights would open the door to doing the same for many others—democratic or not. For companies that had spent millions of dollars on hardening their own cybersecurity to prevent their models’ being exfiltrated by hostile actors, it was a hard sell. It quickly became clear that the type of testing the U.K. government wanted to do would be possible via a chat interface, so the U.K. government dropped its request for model weights, and officials privately conceded that it was a mistake to ever ask. The experience was an early lesson in where the real power lay between the British government and the tech companies. It was far more important to keep the labs friendly and collaborative, officials believed, than to antagonize them and risk torpedoing the access to models upon which the AISI relied to do its job.
Still, the question of snooping remained. If they were going to carry out their safety tests by connecting to computers owned by AI companies, then the U.K. wanted assurances that employees of those companies couldn’t watch its evaluations. Doing so might allow the companies to manipulate their models so that they concealed unsafe behaviors in ways that would pass the tests, some researchers worried. So they and the labs settled on a compromise. The labs would not keep logs of the tests being done on their servers by the AISI, nor would they require individual testers to identify themselves. For their part, safety testers inside the AISI would not input classified information into the models, and instead would use workarounds that still allowed them to test whether, for example, a model had the capability to advise a user on how to create a bioweapon or computer virus. “Instead of asking about a dangerous virus, you can ask about some harmless virus,” says Geoffrey Irving, the AISI’s chief scientist. “And if a model can do advanced experimental design or give detailed advice for the non-dangerous virus, it can do the same thing for the dangerous virus.” It was these kinds of tests that AISI workers applied to Claude 3.5 Sonnet, OpenAI’s o1, and Gemini Ultra, the models that they tested ahead of release.
And yet despite all these tests, the AISI does not—cannot—certify that these models are safe. It can only identify dangers. “The science of evaluations is not strong enough that we can confidently rule out all risks from doing these evaluations,” says Irving. “To have more confidence those behaviors are not there, you need a lot more resources devoted to it. And I think some of those experiments, at least with the current level of access, can only be conducted at the labs.” The AISI does not currently have the infrastructure, the right expertise, or indeed the model access that would be required to scrutinize the weights of frontier models for dangers. That science is a nascent field, mostly practiced behind closed doors at the major AI companies. But Irving doesn’t rule out asking for model weights again if the AISI spins up a team capable of doing similar work. “We will ask again, more intensely, if we need that access in the future,” he says.
On a typical day, AISI researchers test models not only for dangers but also for specific types of capability that might become dangerous in the future. The tests aren’t limited to assessing chemical, biological, and cyber-risks. They also include measuring the ability of AI systems to act autonomously as “agents,” carrying out strings of actions; the ease of “jailbreaking” an AI, or removing its safety features that prevent it from saying or doing things its creators did not intend; and the ability of an AI to manipulate users, by changing their beliefs or inducing them to act in certain ways. Recent joint tests by the U.K. and U.S. AISIs on a version of Claude found that the model was better than any other they had tested at software engineering tasks that might help to accelerate AI research. They also found that safeguards built into the model could be “routinely circumvented” via jailbreaking. “These evaluations give governments an insight into the risks developing at the frontier of AI, and an empirical basis to decide if, when, and how to intervene,” Leung and Oliver Illott, the AISI’s director, wrote in a blog post in November. The institute is now working on putting together a set of “capability thresholds” that would be indicative of severe risks, which could serve as triggers for more strenuous government regulations to kick in.
Whether the government will decide to intervene is another question altogether. Sunak, the AISI’s chief political cheerleader, was defeated in a landslide general election in the summer of 2024. His Conservative Party, which for all its hand-wringing about AI safety had advocated only light-touch AI regulation, was replaced by a Labour government that has signaled a greater willingness to legislate on AI. Labour promised ahead of the election to enact “binding regulation on the handful of companies developing the most powerful AI models,” though these regulations are yet to appear in Parliament. New laws could also formally require AI labs to share information with the U.K. government, replacing the voluntary agreements that currently exist. This might help turn the AISI into a body with more teeth, by reducing its need to keep the AI companies on friendly terms. “We want to preserve our relationships with labs,” Irving tells TIME of the current system. “It is hard to avoid that kind of relationship if you’re in a purely voluntary regime.”
Without any legal ability to compel labs to act, the AISI could be seen—from one angle—as a taxpayer-funded helper to several multibillion-dollar companies that are unilaterally releasing potentially dangerous AIs into the world. But for AISI insiders, the calculus is very different. They believe that building AI capacity inside a state—and nurturing a network of sister AISIs around the globe—is essential if governments want to have any say in the future of what could be the most transformative technology in human history. “Work on AI safety is a global public good,” says Ian Hogarth, the chair of the institute. “Fundamentally this is a global challenge, and it’s not going to work for any company or country to try to go it alone.”