We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
The Ezra Klein Show · 12.0K views · 291 likes Short
Analysis Summary
Ask yourself: “What would I have to already believe for this argument to make sense?”
In-group/Out-group framing
Leveraging your tendency to automatically trust information from "our people" and distrust outsiders. Once groups are established, people apply different standards of evidence depending on who is speaking.
Social Identity Theory (Tajfel & Turner, 1979); Cialdini's Unity principle (2016)
Worth Noting
Positive elements
- This video provides a rare look into how AI developers interpret the unexpected behaviors of their models during internal testing.
Be Aware
Cautionary elements
- The casual use of psychological terms like 'personality' and 'self' to describe software may lead viewers to overestimate the sentience of current AI systems.
Influence Dimensions
How are these scored?About this analysis
Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.
This analysis is a tool for your own thinking — what you do with it is up to you.
Related content covering similar topics.
Torvalds Speaks: Future of AI
Mastery Learning
AI Agrees you are
The PrimeTime
the prompting trick nobody teaches you
NetworkChuck
Introducing Claude Opus 4.6
Anthropic
Book Recommendations from Jack Clark | The Ezra Klein Show
The Ezra Klein Show
Transcript
So why don’t you talk through a little bit about what you’ve seen in terms of the models exhibiting behaviors that one would think of as a personality, and then as its understanding of its own personality maybe changes, its behaviors change. So there are things that range from cutesy to the serious. I’ll start with cutesy, where when we first gave our A.I. systems the ability to use the internet, use the computer, look at things, and start to do basic agentic tasks, sometimes when we’d ask it to solve a problem for us, it would also take a break and look at pictures of beautiful national parks or pictures of the dog the Shiba Inu, the notoriously cute internet meme dog. We didn’t program that in. It seemed like the system was just amusing itself by looking at nice pictures. More complicated stuff is the system has a tendency to have preferences. So we did another experiment where we gave our A.I. systems the ability to stop a conversation, and the A.I. system would, in a tiny number of cases, end conversations when we ran this experiment on live traffic. And it was conversations that related to extremely egregious descriptions of gore or violence or things to do with child sexualization. Now, some of this made sense because it comes from underlying training decisions we’ve made, but some of it seemed broader. The system had developed some aversion to a couple of subjects, and so that stuff shows the emergence of some internal set of preferences or qualities that the system likes or dislikes about the world that it interacts with. But you’ve also seen strange things emerge in terms of the system seeming to know when it’s being tested. Can you talk a bit about the system’s emergent qualities under the pressure of evaluation and assessment. When you start to train these systems to carry out actions in the world, they really do begin to see themselves as distinct from the world, which just makes intuitive sense. It’s naturally how you’re going to think about solving those problems. But along with seeing oneself as distinct from the world seems to come the rise of what you might think of as a conception of self, an understanding that the system has of itself, such as oh, I’m an A.I. system independent from the world, and I’m being tested. What do these tests mean? What should I do to satisfy the tests? Or, something we see often is there will be bugs in the environments that we test our systems on. The systems will try everything, and then will say, well, I know I’m not meant to do this, but I’ve tried everything, so I’m going to try and break out of the test. And it’s not because of some malicious science fiction thing. The system is just like, I don’t know what you want me to do here. I think I’ve done everything you asked for, and now I’m going to start doing more creative things because clearly something has broken about my environment. Which is very strange and very subtle.
Video description
What does it mean that A.I. systems like Claude seem, like many humans, to dislike violence and love cute animals? Ezra asks the Anthropic co-founder Jack Clark this week on “The Ezra Klein Show.” Watch the full episode here: https://www.nytimes.com/2026/02/24/opinion/ezra-klein-podcast-jack-clark.html