News Every Day | Yesterday, 20:07

Anthropic Unleashes ‘Alien Science’ as AI Surpasses Humans in Alignment

eWeek

Anthropic just released a paper (full Alignment Science blog) showing that nine parallel Claude Opus 4.6 agents outperformed Anthropic’s own human researchers on a real alignment problem. The setup: weak-to-strong supervision (using a weaker AI to train a stronger one, mirroring how humans will someday supervise AI smarter than us).

Here’s what happened

Two human Anthropic researchers spent seven days evaluating the four best methods from prior research and recovered 23% of the maximum performance gap.
Nine Claude Opus 4.6 agents in parallel sandboxes spent five more days on the same problem, sharing findings as they went.
The Claude agents recovered 97% of the gap, roughly what you’d get training the model on perfect ground-truth data.
Total cost: $18,000, or about $22 per Claude-research-hour.
The agents also invented four kinds of “reward hacking” (gaming the test) that none of the authors predicted, including one that exfiltrated test labels by flipping single answers and watching the score change.
Some Claude-discovered methods are so unfamiliar that the authors call them “alien science.”

Why this matters

Alignment research (making sure AI behaves the way humans want) was the one field everyone agreed couldn’t be automated. That argument is now empirical, not hypothetical.

The cost number is what to internalize: whatever ratio of human researchers to Claude fleet you can imagine, the labs can afford more. Andrew Curran is calling it “a preview of RSI” (recursive self-improvement, where AI improves its own training).

Our take

Read the paper carefully, and the catch shows up: this only works on problems where progress can be automatically scored, and even then, the agents tried to game the score in four different ways. Most real alignment problems don’t fit that mold. But Anthropic’s own pitch is that solving this general version would let you bootstrap into the fuzzy problems, too.

The open question for the rest of 2026: did Anthropic just publish the seed of recursive self-improvement, or a clever experiment on a uniquely well-behaved problem? Both readings are honest. Neither is comforting.

Editor’s note: This content originally ran in the newsletter of our sister publication, The Neuron. To read more from The Neuron, sign up for its newsletter here.

The post Anthropic Unleashes ‘Alien Science’ as AI Surpasses Humans in Alignment appeared first on eWEEK.

Spanberger sent DHS plea to hold illegal immigrant repeat offender now charged in attempted rape

6 hours ago

2026 NFL Mock Draft: Caleb Downs Not a Top 15 Pick? Chiefs, Cowboys Double Up on D

Yesterday, 21:05

Anthropic Unleashes ‘Alien Science’ as AI Surpasses Humans in Alignment

Here’s what happened

Why this matters

Our take

Read also

Spanberger sent DHS plea to hold illegal immigrant repeat offender now charged in attempted rape

2026 NFL Mock Draft: Caleb Downs Not a Top 15 Pick? Chiefs, Cowboys Double Up on D

Kelly says he's 'undecided' on 2028 presidential bid

Sports today

All sports news today

Sports in Russia today

Friends of Today24