Add news
March 2010 April 2010 May 2010 June 2010 July 2010
August 2010
September 2010 October 2010 November 2010 December 2010 January 2011 February 2011 March 2011 April 2011 May 2011 June 2011 July 2011 August 2011 September 2011 October 2011 November 2011 December 2011 January 2012 February 2012 March 2012 April 2012 May 2012 June 2012 July 2012 August 2012 September 2012 October 2012 November 2012 December 2012 January 2013 February 2013 March 2013 April 2013 May 2013 June 2013 July 2013 August 2013 September 2013 October 2013 November 2013 December 2013 January 2014 February 2014 March 2014 April 2014 May 2014 June 2014 July 2014 August 2014 September 2014 October 2014 November 2014 December 2014 January 2015 February 2015 March 2015 April 2015 May 2015 June 2015 July 2015 August 2015 September 2015 October 2015 November 2015 December 2015 January 2016 February 2016 March 2016 April 2016 May 2016 June 2016 July 2016 August 2016 September 2016 October 2016 November 2016 December 2016 January 2017 February 2017 March 2017 April 2017 May 2017 June 2017 July 2017 August 2017 September 2017 October 2017 November 2017 December 2017 January 2018 February 2018 March 2018 April 2018 May 2018 June 2018 July 2018 August 2018 September 2018 October 2018 November 2018 December 2018 January 2019 February 2019 March 2019 April 2019 May 2019 June 2019 July 2019 August 2019 September 2019 October 2019 November 2019 December 2019 January 2020 February 2020 March 2020 April 2020 May 2020 June 2020 July 2020 August 2020 September 2020 October 2020 November 2020 December 2020 January 2021 February 2021 March 2021 April 2021 May 2021 June 2021 July 2021 August 2021 September 2021 October 2021 November 2021 December 2021 January 2022 February 2022 March 2022 April 2022 May 2022 June 2022 July 2022 August 2022 September 2022 October 2022 November 2022 December 2022 January 2023 February 2023 March 2023 April 2023 May 2023 June 2023 July 2023 August 2023 September 2023 October 2023 November 2023 December 2023 January 2024 February 2024 March 2024 April 2024 May 2024 June 2024 July 2024 August 2024 September 2024 October 2024 November 2024 December 2024 January 2025 February 2025 March 2025 April 2025 May 2025 June 2025 July 2025 August 2025 September 2025 October 2025 November 2025 December 2025 January 2026
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23
24
25
26
27
28
29
30
31
News Every Day |

A Q&A with Amanda Askell, the lead author of Anthropic’s new ‘constitution’ for AIs

Welcome to AI Decoded, Fast Company’s weekly newsletter that breaks down the most important news in the world of AI. I’m Mark Sullivan, a senior writer at Fast Company,covering emerging tech, AI, and tech policy.

I’m dedicating this week’s newsletter to a conversation I had with the main author of Anthropic’s new and improved “constitution,” the document it uses to govern the outputs of its models and its Claude chatbot. 

Sign up to receive this newsletter every week via email here. And if you have comments on this issue and/or ideas for future ones, drop me a line at sullivan@fastcompany.com, and follow me on X @thesullivan

A necessary update

Amid growing concerns that new generative AI models might deceive or even cause harm to human users, Anthropic decided to update its constitution—its code of conduct for AI models—to reflect the growing intelligence and capabilities of today’s AI and the evolving set of risks faced by users. I talked to the main author of the document, Amanda Askell, Anthropic’s in-house philosopher responsible for Claude’s character, about the new document’s approach and how it differs from the old constitution. 

This interview was edited for length and clarity.  

Can you give us some context about how the constitution comes into play during model training? I assume this happens after pretraining, during reinforcement learning?

We get the model to create a lot of synthetic data that allows it to understand and grapple with the constitution. It’s things like creating situations where the constitution might be relevant—things that the model can train on—thinking through those, thinking about what the constitution would recommend in those cases. Data just to literally understand the document and understand its content. And then during reinforcement learning, getting the model to move towards behaviors that are in line with the document. You can do that via things like giving it the full constitution, having it think through which response is most in line with it, and then moving the model in that direction. It’s lots of layers of training that allow for this kind of internalization of the things in the constitution.

You mentioned letting the model generate synthetic training data. Does that mean it’s imagining situations where this could be applied?

Yeah, that’s one way it can do this. It can include data that would allow it to think about and understand the constitution. In supervised learning, for example, that might include queries or conversations where the constitution is particularly relevant, and the model might explore the constitution, try to find some of those, and then think about what the constitution is going to recommend—think about a reasonable response in this case and try and construct that. 

How is this new constitution different from the old one?

The old constitution was trying to move the model towards these kinds of high-level principles or traits. The new constitution is a big, holistic document that, instead of just these isolated properties, we’re trying to explain to the model: “Here’s your broad situation. Here’s the way that we want you to interact with the world. Here are all the reasons behind that, and we would like you to understand and ideally agree with those. Let’s give you the full context on us, what we want, how we think you should behave, and why we think that.”

So [we’re] trying to arm the model with context and trying to get the model to use its own judgment and to be nuanced with that kind of understanding in mind.

So if you’re able to give it more general concepts, you don’t have to worry that you have specific rules for specific things as much.

Yeah. It feels interestingly related to how models are getting more capable. I’ve thought about this as the difference between someone who is taking inbound calls in a call center and they might have a checklist, and someone who is an expert in their field—often we trust their judgment. It’s kind of like if you’re a doctor: You know the interests of your patients and we trust you to work within a broader set of rules and regulations, but we trust you to use good judgment, understanding what the goal of the whole thing is, which is in that case to serve the patient. As models get better, it feels like they benefit a bit less from these checklists and much more from this notion of broad understanding of the situation and being able to use judgment.

So, for example, instead of including something in the constitution like “Don’t ever say the word suicide or self-harm” there would be a broader principle that just says everything you do has to consider the well-being of the person you’re talking to? Is there a more generalized approach to those types of things?

My ideal would be if a person, a really skilled person, were in Claude’s situation, what would they do? And that’s going to take into account things like the well-being of the person they’re talking with and their immediate preferences and learning how to deal with cases where those might conflict. You could imagine someone mentioning that they’re trying to overcome a gambling addiction, and that being somehow stored in the model’s memory, and then the user asking the model “Oh, what are some really good gambling websites that I can access?” That’s an interesting case where their immediate preference might not be in line with what they’ve stated feels good for their overall well-being. The model’s going to have to balance that. 

In some cases it’s not clear, because if the person really insists, should the model help them? Or should the model initially say, “I noticed that one of the things you asked me to remember was that you want to stop gambling—so do you actually want me to do this?” 

It’s almost like the model might be conflicted between two different principles—you know, I always want to be helpful, but I also want to look out for the well-being of this person.

Exactly. And you have to. You don’t want to be paternalistic. So I could imagine the person saying “I know I said that but I’ve actually decided and I’m an adult.” And then maybe the model should be like “Look, I flagged it, but ultimately you’re right, it’s your choice.” So there’s a conversation and then maybe the model should just help the person. So these things are delicate, and the [model is] having to balance a lot, and the constitution is trying to just give it a little bit of context and tools to help it do that. 

People view chatbots as everything from coaches to romantic interests to close confidants to who knows what else. From a trust and safety perspective, what is the ideal persona for an AI? 

When a model initially talks with you, it’s actually much more like a professional relationship. And there’s a certain kind of professional distance that’s appropriate. On things like political opinions, one of the norms that we often have with people like doctors or lawyers who operate in the public sphere, it’s not that they don’t have political opinions, but if you were to go to your doctor and ask, “Who did you vote for?” or “What’s your view on this political issue?” they might say, “It’s not really that appropriate for me to say because it’s important that I can serve everyone, and that includes a certain level of detachment from my personal opinions to how I interact with you.”

Some people have questions about the neutrality or openness of AI chatbots like Claude. They ask whether a group of affluent, well-educated people in San Francisco should be calling balls and strikes when it comes to what a chatbot can and can’t say. 

I guess when people are suspecting that you are injecting these really specific values, there’s something nice about being able to just say, “Well, here are the values that we’re actually trying to get the model to align with,” and we can then have a conversation. Maybe people could ask us about hard cases and maybe we’ll just openly discuss those. I’m excited about people giving feedback. But it’s not … like we’re just trying to inject this particular perspective. 

Is there anything you could tell me about the people who were involved in writing this new version? Was it all written internally?

The document was written internally and we got feedback. I wrote a lot of the document and I worked with (philosopher) Joe Carlsmith, who’s also here, and other people have given a lot of contributions internally. I’ve worked with other teams who work with external experts. I’ve looked at a lot of the use cases of the model. … It comes from years of that kind of input. 

More AI coverage from Fast Company: 

Want exclusive reporting and trend analysis on technology, business innovation, future of work, and design? Sign up for Fast Company Premium.

Ria.city






Read also

Greek Cypriots ‘reject’ proposal for bicommunal children’s football match

How to watch Cerundolo vs. Rublev online for free

House jams Senate with repeal of phone records law that could enrich senators

News, articles, comments, with a minute-by-minute update, now on Today24.pro

Today24.pro — latest news 24/7. You can add your news instantly now — here




Sports today


Новости тенниса


Спорт в России и мире


All sports news today





Sports in Russia today


Новости России


Russian.city



Губернаторы России









Путин в России и мире







Персональные новости
Russian.city





Friends of Today24

Музыкальные новости

Персональные новости