Add news
News Every Day |

How This Tool Could Decode AI’s Inner Mysteries

The scientists didn’t have high expectations when they asked their AI model to complete the poem. “He saw a carrot and had to grab it,” they prompted the model. “His hunger was like a starving rabbit,” it replied. 

The rhyming couplet wasn’t going to win any poetry awards. But when the scientists at AI company Anthropic inspected the records of the model’s neural network, they were surprised by what they found. They had expected to see the model, called Claude, picking its words one by one, and for it to only seek a rhyming word—“rabbit”—when it got to the end of the line.

[time-brightcove not-tgx=”true”]

Instead, by using a new technique that allowed them to peer into the inner workings of a language model, they observed Claude planning ahead. As early as the break between the two lines, it had begun “thinking” about words that would rhyme with “grab it,” and planned its next sentence with the word “rabbit” in mind.

The discovery ran contrary to the conventional wisdom—in at least some quarters—that AI models are merely sophisticated autocomplete machines that only predict the next word in a sequence. It raised the questions: How much further might these models be capable of planning ahead? And what else might be going on inside these mysterious synthetic brains, which we lack the tools to see?

The finding was one of several announced on Thursday in two new papers by Anthropic, which reveal in more depth than ever before how large language models (LLMs) “think.”

Today’s AI tools are categorically different from other computer programs for one big reason: they are “grown,” rather than coded by hand. Peer inside the neural networks that power them, and all you will see is a bunch of very complicated numbers being multiplied together, again and again. This internal complexity means that even the machine learning engineers who “grow” these AIs don’t really know how they spin poems, write recipes, or tell you where to take your next holiday. They just do.

But recently, scientists at Anthropic and other groups have been making progress in a new field called “mechanistic interpretability”—that is, building tools to read those numbers and turn them into explanations for how AI works on the inside. “​​What are the mechanisms that these models use to provide answers?” says Chris Olah, an Anthropic cofounder, of the questions driving his research. “What are the algorithms that are embedded in these models?” Answer those questions, Olah says, and AI companies might be able to finally solve the thorny problem of ensuring AI systems always follow human rules.

The results announced on Thursday by Olah’s team are some of the clearest findings yet in this new field of scientific inquiry, which might best be described as a kind of “neuroscience” for AI.

A new ‘microscope’ for looking inside LLMs

In earlier research published last year, Anthropic researchers identified clusters of artificial neurons within neural networks. They called them “features,” and found that they corresponded to different concepts. To illustrate this finding, Anthropic artificially boosted a feature inside Claude corresponding to the Golden Gate Bridge, which led the model to insert mention of the bridge, no matter how irrelevant, into its answers until the boost was reversed.

In the new research published Thursday, the researchers go a step further, tracing how groups of multiple features are connected together inside a neural network to form what they call “circuits”—essentially algorithms for carrying out different tasks.

To do this, they developed a tool for looking inside the neural network, almost like the way scientists can image the brain of a person to see which parts light up when thinking about different things. The new tool allowed the researchers to essentially roll back the tape and see, in perfect HD, which neurons, features, and circuits were active inside Claude’s neural network at any given step. (Unlike a biological brain scan, which only gives the fuzziest picture of what individual neurons are doing, digital neural networks provide researchers with an unprecedented level of transparency; every computational step is laid bare, waiting to be dissected.)

When the Anthropic researchers zoomed back to the beginning of the sentence, “His hunger was like a starving rabbit,” they saw the model immediately activate a feature for identifying words that rhyme with “it.” They identified the feature’s purpose by artificially suppressing it; when they did this and re-ran the prompt, the model instead ended the sentence with the word “jaguar.” When they kept the rhyming feature but suppressed the word “rabbit” instead, the model ended the sentence with the feature’s next top choice: “habit.”

Anthropic compares this tool to a “microscope” for AI. But Olah, who led the research, hopes that one day he can widen the aperture of its lens to encompass not just tiny circuits within an AI model, but the entire scope of its computation. His ultimate goal is to develop a tool that can provide a “holistic account” of the algorithms embedded within these models. “I think there’s a variety of questions that will increasingly be of societal importance, that this could speak to, if we could succeed,” he says. For example: Are these models safe? Can we trust them in certain high-stakes situations? And when are they lying?

Universal language

The Anthropic research also found evidence to support the theory that language models “think” in a non-linguistic statistical space that is shared between languages.

Anthropic scientists tested this by asking Claude for the “opposite of small” in several different languages. Using their new tool, they analyzed the features that activated inside Claude when it answered each of those prompts in English, French, and Chinese. They found features corresponding to the concepts of smallness, largeness, and oppositeness, which activated no matter what language the question was posed in. Additional features would also activate corresponding to the language of the question, telling the model what language to answer in. 

This isn’t an entirely new finding—AI researchers have conjectured for years that language models “think” in a statistical space outside of language, and earlier interpretability work has borne this out with evidence. But Anthropic’s paper is the most detailed account yet of exactly how this phenomenon happens inside a model, Olah says. 

The finding came with a tantalizing prospect for safety research. As models get larger, the team found, they tend to become more capable of abstracting ideas beyond language and into this non-linguistic space. This finding could be useful in a safety context, because a model that is able to form an abstract concept of, say, “harmful requests” is more likely to be able to refuse them in all contexts, compared to a model that only recognizes specific examples of harmful requests in a single language.

This could be good news for speakers of so-called “low-resource languages” that are not widely represented in the internet data that is used to train AI models. Today’s large language models often perform more poorly in those languages than in, say, English. But Anthropic’s finding raises the prospect that LLMs may one day not need unattainably vast quantities of linguistic data to perform capably and safely in these languages, so long as there is a critical mass big enough to map onto a model’s internal non-linguistic concepts.

However, speakers of those languages will still have to contend with how those very concepts have been shaped by the dominance of languages like English, and the cultures that speak them.

Toward a more interpretable future

Despite these advances in AI interpretability, the field is still in its infancy, and significant challenges remain. Anthropic acknowledges that “even on short, simple prompts, our method only captures a fraction of the total computation” expended by Claude—that is, there is much going on inside its neural network into which they still have zero visibility. “It currently takes a few hours of human effort to understand the circuits we see, even on prompts with only tens of words,” the company adds. Much more work will be needed to overcome those limitations.

But if researchers can achieve that, the rewards might be vast. The discourse around AI today is very polarized, Olah says. At one extreme, there are people who believe AI models “understand” just like people do. On the other, there are people who see them as just fancy autocomplete tools. “I think part of what’s going on here is, people don’t really have productive language for talking about these problems,” Olah says. “Fundamentally what they want to ask, I think, is questions of mechanism. How do these models accomplish these behaviors? They don’t really have a way to talk about that. But ideally they would be talking about mechanism, and I think that interpretability is giving us the ability to make much more nuanced, specific claims about what exactly is going on inside these models. I hope that that can reduce the polarization on these questions.”

Симферополь

В 1963 г. в Ростове КГБ задержал нациста из зондеркоманды Сухова. Он жаловался во время допроса, как сложно было быть карателем

Реклама
Top 6 nutrition questions men should ask themselves after 40

To maintain health and remain full of energy, men will be helped by this

Newcastle star targets emulating Dan Burn after Carabao Cup heroics and eyes England World Cup dream under Thomas Tuchel

Snezhana Beschetnaya: "The Patriot military industrial Complex is a place where history speaks..."

Vladimir Denisov heads one of the largest and most successful media holdings in Russia

How long was Bukayo Saka injured for and when will the Arsenal starboy return?

Ria.city
Реклама
  • ИП Попов А.П.
  • ИНН: 602715631406
Ревматолог: "31 марта 2024 в г.Ноксвилл запущена квота"

Каждый человек с больными суставами имеет право получить...






Реклама
  • ИП Попов А.П.
  • ИНН: 602715631406
Ревматолог: "31 марта 2024 в г.Ноксвилл запущена квота"

Каждый человек с больными суставами имеет право получить...


Реклама
  • ИП Попов А.П.
  • ИНН: 602715631406
Ревматолог: "31 марта 2024 в г.Ноксвилл запущена квота"

Каждый человек с больными суставами имеет право получить...

Read also

Tesla stock is tumbling again amid tariff fears and a weekend of protests

Celebrity Masterchef: Farah Khan apologises to Tejasswi Prakash for scolding her for her previous dish; says ‘You are one step closer to opening your own restaurant’

Nottm Forest vs Man Utd: Get up to £40 in free bets to spend with talkSPORT BET

News, articles, comments, with a minute-by-minute update, now on Today24.pro

News Every Day

Marcus Rashford got his Man Utd break when I got injured – I knew there was no way back after that

Today24.pro — latest news 24/7. You can add your news instantly now — here


News Every Day

Vladimir Denisov heads one of the largest and most successful media holdings in Russia



Sports today


Новости тенниса
Елена Рыбакина

Рыбакина откровенно высказалась о выступлении за Казахстан



Спорт в России и мире
Москва

Правопорядок на хоккейном матче в столице обеспечили росгвардейцы



All sports news today





Sports in Russia today

Москва

Кокорин получил более 1,6 млн рублей штрафов за нарушения ПДД в Москве — источник


Новости России

Game News

The focus on GPU efficiency over fps means this year's gaming laptops are capable of something they've never been good at


Реклама
Top 6 nutrition questions men should ask themselves after 40

To maintain health and remain full of energy, men will be helped by this

Реклама
Top 6 nutrition questions men should ask themselves after 40

To maintain health and remain full of energy, men will be helped by this

Реклама
Top 6 nutrition questions men should ask themselves after 40

To maintain health and remain full of energy, men will be helped by this

Russian.city

Реклама
The most beautiful beach towns with cheap living

A huge number of people around the world dream of one day breaking out of the daily routine


News Every Day

Newcastle star targets emulating Dan Burn after Carabao Cup heroics and eyes England World Cup dream under Thomas Tuchel


Губернаторы России
Владимир Путин

На Поклонной горе прошел вечер, на котором вспомнили подвиг отца Путина


Билл Гейтс: благодаря искусственному интеллекту, к 2035 году мы сможем перейти на двухдневную рабочую неделю

Проведут фестиваль гиревого спорта

Москва готовится к сильной непогоде: более 70% осадков за два дня

Милорад Додик посетил Москву, несмотря на розыск Интерпола


Костолевский, Башмет, Захарова. В Сибири гремит фестиваль Вадима Репина

«Запущенная болячка». Нарколог Шуров назвал главную болезнь Волочковой

Певица Цыганова выступила против возвращения Долиной квартиры

Декларация о доходах Виталия Кима: стало на одну квартиру меньше, а жена продала биткоины


Новак Джокович готов бороться за 100-й титул ATP на Мастерсе в Майами

Игрока в пинг-понг Сидоренко дисквалифицировали за оскорбление женщины-судьи

Рыбакина откровенно высказалась о выступлении за Казахстан

Елена Рыбакина рассказала, с чем было тяжелее всего справиться после победы на Уимблдоне


Реклама
The most beautiful beach towns with cheap living

A huge number of people around the world dream of one day breaking out of the daily routine


Тариф – Большой Концертный Тур по Москве!

Педагогов Хакасии приглашают к участию в V Форуме классных руководителей

Языковые протесты - особенности речи современных студентов

Кокорин получил более 1,6 млн рублей штрафов за нарушения ПДД в Москве — источник


Тариф – Большой Концертный Тур по Москве!

На заседании Российско-Армянского делового совета в ТПП РФ обсудили укрепление сотрудничества двух стран. Фоторяд

"Кубок Кремля – это то, что всех объединяет!"

Куда пойдёт Ефремов первым делом после выхода из колонии. Маршрут красноречивый


Япония готовится к мегаземлетрясению, способному убить до 300 тысяч человек

"Санкции останутся": Пора снять розовые очки. Депутат сказал прямо

"ИИ: НАТО ЭТО СКАЙНЕТ": НАТО СДЕЛАНО, ЧТОБЫ НАНЕСТИ УДАР ПО США ЯДЕРНЫМ ОРУДИЕМ? СЕНСАЦИЯ! Дональд Трамп, В.В. Путин. Новости. Россия, США, Европа могут улучшить отношения и здоровье общества?!

В Смоленске в рамках проекта «Единой России» прошли Всероссийские соревнования по дзюдо


Реклама
Top 6 nutrition questions men should ask themselves after 40

To maintain health and remain full of energy, men will be helped by this


Путин в России и мире
Реклама
Top 6 nutrition questions men should ask themselves after 40

To maintain health and remain full of energy, men will be helped by this



Реклама
Top 6 nutrition questions men should ask themselves after 40

To maintain health and remain full of energy, men will be helped by this



Реклама
The most beautiful beach towns with cheap living

A huge number of people around the world dream of one day breaking out of the daily routine



Реклама
The most beautiful beach towns with cheap living

A huge number of people around the world dream of one day breaking out of the daily routine

Персональные новости Russian.city
Пианист

Пианист Сафронов: Паша Техник мог отравиться сосисками



News Every Day

How long was Bukayo Saka injured for and when will the Arsenal starboy return?




Friends of Today24

Музыкальные новости

Персональные новости