Back to Podcasts
Machine Learning Street Talk

Superintelligence Strategy (Dan Hendrycks)

Machine Learning Street Talk

Thursday, August 14, 20251h 45m
Superintelligence Strategy (Dan Hendrycks)

Superintelligence Strategy (Dan Hendrycks)

Machine Learning Street Talk

0:001:45:38

What You'll Learn

  • Humanity's Last Exam is a benchmark that crowdsources difficult closed-ended questions from experts to test the limits of AI systems
  • Existing benchmarks like MMLU are starting to saturate, indicating the need for more challenging evaluations
  • There are concerns about the anthropocentric bias in AI benchmarks, focusing on tasks that are easy for humans but hard for AI
  • Enigma Eval is a benchmark that tests multi-step, creative reasoning, which is expected to be challenging for current AI systems
  • Language models may be doing 'more with more' by leveraging shortcuts, rather than demonstrating true intelligence
  • Upcoming benchmarks will focus on measuring the automation rate of tasks to better understand AI capabilities

Episode Chapters

1

Introduction

The host introduces the guest, Dan Hendrycks, and the topic of the episode - Hendrycks' work on AI benchmarking and superintelligence strategy.

2

Humanity's Last Exam

Hendrycks explains the motivation and design of the 'Humanity's Last Exam' benchmark, which tests the limits of AI systems on complex, closed-ended questions.

3

Anthropocentric Bias in Benchmarks

The discussion explores the potential biases in AI benchmarks that focus on tasks that are easy for humans but hard for AI, and the need for more diverse evaluations.

4

Enigma Eval Benchmark

Hendrycks introduces the 'Enigma Eval' benchmark, which tests multi-step, creative reasoning capabilities of AI systems.

5

Limitations of Language Models

The episode touches on the concerns that current language models may be leveraging shortcuts rather than demonstrating true intelligence.

6

Upcoming Benchmarks

Hendrycks mentions the development of new benchmarks that will focus on measuring the automation rate of tasks to better understand AI capabilities.

AI Summary

This episode discusses the development of AI systems and the challenges in evaluating their capabilities. The guest, Dan Hendrycks, talks about his work on the 'Humanity's Last Exam' benchmark, which aims to test the limits of AI systems on complex, closed-ended questions. The discussion also covers the potential biases in AI benchmarks, the need for more diverse evaluations, and the limitations of current language models in demonstrating true intelligence.

Key Points

  • 1Humanity's Last Exam is a benchmark that crowdsources difficult closed-ended questions from experts to test the limits of AI systems
  • 2Existing benchmarks like MMLU are starting to saturate, indicating the need for more challenging evaluations
  • 3There are concerns about the anthropocentric bias in AI benchmarks, focusing on tasks that are easy for humans but hard for AI
  • 4Enigma Eval is a benchmark that tests multi-step, creative reasoning, which is expected to be challenging for current AI systems
  • 5Language models may be doing 'more with more' by leveraging shortcuts, rather than demonstrating true intelligence
  • 6Upcoming benchmarks will focus on measuring the automation rate of tasks to better understand AI capabilities

Topics Discussed

#AI benchmarking#Superintelligence strategy#Anthropocentric bias in AI#Limitations of language models#Measuring AI capabilities

Frequently Asked Questions

What is "Superintelligence Strategy (Dan Hendrycks)" about?

This episode discusses the development of AI systems and the challenges in evaluating their capabilities. The guest, Dan Hendrycks, talks about his work on the 'Humanity's Last Exam' benchmark, which aims to test the limits of AI systems on complex, closed-ended questions. The discussion also covers the potential biases in AI benchmarks, the need for more diverse evaluations, and the limitations of current language models in demonstrating true intelligence.

What topics are discussed in this episode?

This episode covers the following topics: AI benchmarking, Superintelligence strategy, Anthropocentric bias in AI, Limitations of language models, Measuring AI capabilities.

What is key insight #1 from this episode?

Humanity's Last Exam is a benchmark that crowdsources difficult closed-ended questions from experts to test the limits of AI systems

What is key insight #2 from this episode?

Existing benchmarks like MMLU are starting to saturate, indicating the need for more challenging evaluations

What is key insight #3 from this episode?

There are concerns about the anthropocentric bias in AI benchmarks, focusing on tasks that are easy for humans but hard for AI

What is key insight #4 from this episode?

Enigma Eval is a benchmark that tests multi-step, creative reasoning, which is expected to be challenging for current AI systems

Who should listen to this episode?

This episode is recommended for anyone interested in AI benchmarking, Superintelligence strategy, Anthropocentric bias in AI, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

<p>Deep dive with Dan Hendrycks, a leading AI safety researcher and co-author of the &quot;Superintelligence Strategy&quot; paper with former Google CEO Eric Schmidt and Scale AI CEO Alexandr Wang.</p><p><br></p><p>*** SPONSOR MESSAGES</p><p>Gemini CLI is an open-source AI agent that brings the power of Gemini directly into your terminal - https://github.com/google-gemini/gemini-cli</p><p><br></p><p>Prolific: Quality data. From real people. For faster breakthroughs.</p><p>https://prolific.com/mlst?utm_campaign=98404559-MLST&amp;utm_source=youtube&amp;utm_medium=podcast&amp;utm_content=script-gen</p><p>***</p><p><br></p><p>Hendrycks argues that society is making a fundamental mistake in how it views artificial intelligence. We often compare AI to transformative but ultimately manageable technologies like electricity or the internet. He contends a far better and more realistic analogy is nuclear technology. Like nuclear power, AI has the potential for immense good, but it is also a dual-use technology that carries the risk of unprecedented catastrophe.</p><p><br></p><p>The Problem with an AI &quot;Manhattan Project&quot;:</p><p><br></p><p>A popular idea is for the U.S. to launch a &quot;Manhattan Project&quot; for AI—a secret, all-out government race to build a superintelligence before rivals like China. Hendrycks argues this strategy is deeply flawed and dangerous for several reasons:</p><p><br></p><p>- It wouldn’t be secret. You cannot hide a massive, heat-generating data center from satellite surveillance.</p><p><br></p><p>- It would be destabilizing. A public race would alarm rivals, causing them to start their own desperate, corner-cutting projects, dramatically increasing global risk.</p><p><br></p><p>- It’s vulnerable to sabotage. An AI project can be crippled in many ways, from cyberattacks that poison its training data to physical attacks on its power plants. This is what the paper refers to as a &quot;maiming attack.&quot;</p><p><br></p><p>This vulnerability leads to the paper&#39;s central concept: Mutual Assured AI Malfunction (MAIM). This is the AI-era version of the nuclear-era&#39;s Mutual Assured Destruction (MAD). In this dynamic, any nation that makes an aggressive, destabilizing bid for a world-dominating AI must expect its rivals to sabotage the project to ensure their own survival. </p><p><br></p><p>This deterrence, Hendrycks argues, is already the default reality we live in.</p><p><br></p><p>A Better Strategy: The Three Pillars</p><p>Instead of a reckless race, the paper proposes a more stable, three-part strategy modeled on Cold War principles:</p><p><br></p><p>- Deterrence: Acknowledge the reality of MAIM. The goal should not be to &quot;win&quot; the race to superintelligence, but to deter anyone from starting such a race in the first place through the credible threat of sabotage.</p><p><br></p><p>- Nonproliferation: Just as we work to keep fissile materials for nuclear bombs out of the hands of terrorists and rogue states, we must control the key inputs for catastrophic AI. The most critical input is advanced AI chips (GPUs). Hendrycks makes the powerful claim that building cutting-edge GPUs is now more difficult than enriching uranium, making this strategy viable.</p><p><br></p><p>- Competitiveness: The race between nations like the U.S. and China should not be about who builds superintelligence first. Instead, it should be about who can best use existing AI to build a stronger economy, a more effective military, and more resilient supply chains (for example, by manufacturing more chips domestically).</p><p><br></p><p>Dan says the stakes are high if we fail to manage this transition:</p><p><br></p><p>- Erosion of Control</p><p>- Intelligence Recursion</p><p>- Worthless Labor</p><p><br></p><p>Hendrycks maintains that while the risks are existential, the future is not set. </p><p><br></p><p>TOC:</p><p>1 Measuring the Beast [00:00:00]</p><p>2 Defining the Beast [00:11:34]</p><p>3 The Core Strategy [00:38:20]</p><p>4 Ideological Battlegrounds [00:53:12]</p><p>5 Mechanisms of Control [01:34:45]</p><p><br></p><p>TRANSCRIPT:</p><p>https://app.rescript.info/public/share/cOKcz4pWRPjh7BTIgybd7PUr_vChUaY6VQW64No8XMs</p><p><br></p><p>&lt;truncated, see refs and larger description on YT version&gt;</p><p><br></p>

Full Transcript

Venmo Stash, a taco on one hand and ordering a ride in the other means you're stacking cash back. With Venmo Stash, get up to 5% cash back when you pick a bundle of your favorite brands. Earn more cash when you do more with Stash. Venmo Stash terms and exclusions apply. Max $100 cash back per month. See terms at Venmo.me slash stashterms. This episode is brought to you by State Farm. Listening to this podcast, smart move. Being financially savvy, smart move. Another smart move? Having State Farm help you create a competitive price when you choose to bundle home and auto. Bundling, just another way to save with a personal price plan. Like a good neighbor, State Farm is there. Prices are based on rating plans that vary by state. Coverage options are selected by the customer. Availability, amount of discounts and savings, and eligibility vary by state. Compared to nuclear weapons, I think it's harder to make cutting-edge GPUs given a billion dollars. I mean, you can't do it with a billion dollars. If it's $10 billion, you can't do it. There was situational awareness by Leopold Ochenbrenner, which was arguing for something like a Manhattan project for developing AGI and superintelligence before China. So it's basically, let's beat China to the punch, get superintelligence, prevent them from building it. The West will dominate the world. You know Eliezer got in trouble in Time magazine when he spoke about bombing data centers. There was a big hoo-ha at the time. You used the word kinetic strikes. We discuss kinetic attacks in the escalation ladder. So there's many ways to disrupt projects. You could do cyber attacks on them. You could do some gray sabotage. There's hacking to sort of poison their data or sort of make their GPUs not function as reliably or threaten to use force. I don't think that those are really necessary. So, like, that would be an escalation ladder. But if the U.S. is on top of the issue, they don't need to resort to that. Today's episode is sponsored by Prolific. This is Sara Saab, who's a VP of product at Prolific. Sara is using the Prolific platform to understand how large language models think by designing the next generation of benchmarks with human participation. We study the benchmarking based on the demographic stratification of the humans doing the evaluations. So you can see stuff emerge in the data like people of this age range think this model is better on helpfulness, but people of that age range disagree. Go to Prolific.com. This podcast is supported by Google. Hey folks, Taylor here, creator of Gemini CLI. CLI. We designed Gemini CLI to be your collaborative coding partner at the command line. Think about those tedious tasks like fixing a tricky bug, adding documentation, or even writing tests. We built Gemini CLI to handle all that and more. It iterates with you to tackle your most challenging problems. Check out Gemini CLI on GitHub to get started. I want to spend most of the show talking about your very interesting new superintelligence strategy paper, which you fairly recently published. So maybe starting with humanity's last exam. What's the story behind that? The MMU data set, which I made as a graduate student some years ago, was getting saturated. I didn't think that pretty much all of the evaluations were getting saturated. So people didn't really know what was going on with AI capabilities. So there seem to be particularly strong reasons for people just being informed about developments to develop something new. And so I was also experiencing that people, experts don't really have data sets in them. Like you can't just hire a few experts and they can come with a data set. They don't have that enough complicated ideas in them. However, I think individual experts might have a question in them instead. So with Humania's last exam, the idea was we have a global effort with various postdocs and professors, each contributing a question or a few questions to stump existing AI systems. And these are questions that they would find very difficult to answer. They would find it impressive if the AI systems could answer them. And this would be approximating sort of the human frontier in some sense of knowledge and reasoning for closed-ended questions where we already know the answer. So that was, so we did that for some months and we got several thousand questions out of that. And I think this will be a good tracker for whether AI systems can automate a lot of the theoretical parts of sciences, whether they can solve difficult analytic questions. Not experimental, like it doesn't test its ability to run biology experiments. That's more motor skills, among other things, or requires motor skills. But for things that are more mathematically related or require some very complex reasoning, that's what this captures. So I think that when it's solved, it's roughly an end of a genre or near the end of a genre of asking it closed-ended questions. It seems that it'd be – for which there are objective answers, it seems like that would be toward the end of it. And I think that individual problems that it would solve in the future would be interesting enough to be like papers in their own right. So once we get through this set of questions, the types of problems it could tackle would be like questions that would be worthy of their own paper, like it solved a conjecture, for instance. So it sort of is tracking the ability level up to the point where individual questions themselves are very interesting instead of just the data set. One thing that concerns me is, I mean, certainly MMLU, which you invented, is now basically saturating. It's well above 90 percent. And humanity's last exam is resisting progress quite a lot. So I think we're up to about 26 percent or something like that. And I was speaking to some cognitive scientists this week, and they study cognition in animals. And they have a similar problem. They can see that animals can do certain tasks, but they're never really sure why they do the tasks. and you almost get this no true Scotsman type thing where they're saying, well, maybe they have this level of sophistication in their reasoning, but maybe that's not enough. Maybe it should be more sophisticated. Maybe it should be more sophisticated. And how can you kind of reasonably infer how the models are getting the answers? So I think in difficulty, it's tough to think of just more challenging data-generating processes than taking global experts and if the subject or if the genre is closed-ended questions, asking them what's the hardest closed-ended question and crowdsourcing that. So – but it certainly can get harder where each individual question are open questions, like conjectures, for instance. But it's true that for benchmarks generally, this would not be the end of the line for AI development because the AI systems, you know, this doesn't test their ability to move around. This doesn't test their long-term memory. This doesn't test their ability to make PowerPoints and so on. So I think this is still a – this is getting at the sort of closed-ended question genre but not with objective answers that we already know, which is how nearly all benchmarks have been in machine learning. But then we'll be moving over, I think, as a community more to agential types of tasks or tasks that are more economically valuable directly. What do you think about the anthropocentric bias in benchmarks? Francois Chollet, for example, he said that when he was designing ArcV2 and ArcV3, every single step of the progression of the benchmark is about identifying things which are easy for humans and hard for AIs. And some people would argue against that saying, well, actually, there's a diverse set of possible intelligences. And why does human like task acquisition and capability have value? And to me, it seems intuitive that it does, because surely something which does things that we can communicate with and understand, that seems very valuable. But do Do you think that kind of focusing it with the human frame could be eluding us from other forms of capabilities? There are certainly other forms of capability like they have some advantages. For instance, they can process things much more quickly, and that could give rise to – I think like MMLU, for instance. No human could do that well on MMLU in particular because it's so diverse, and likewise for humanities last exam as well. So I think a reason for focusing on questions that are hard for humans and hard for AIs is because the questions that are easy for humans but hard for AIs, it's difficult to generate many of them and do it diversely. Often if it's a specific capability, if some people get some specific training data for it, then they automatically have the capability. There can be some pockets where it's harder, such as the ARC data sets. But in general, if you were collecting like count the number of R's in strawberry or in a word, the data set, that will not have much staying power. So I think focusing on difficult things that only a few humans can do or not that many humans can do is generally going to be more robust. Can you tell us about your Enigma eval? benchmark yes so enigma evaluation then is a collection of puzzles that um um humanities so humanities last exam is getting individual questions that are tough that um an individual with a lot of expertise could solve enigma evaluation you can think of it as like mit mystery hunts so mit mystery hunts are things that happen like over a weekend and a group of mit students try to solve this puzzle. There are many steps to it. So in terms of human compute, so to speak, it takes a lot of human compute to solve it. And it takes groups to solve it as well. And there's not a very high solve rate. So this is very multi-step and requires sort of group level intelligence to be able to have a shot at solving it. So we just collected some of those. And And I think this approximates longer horizon types of intellectual tasks. And, yeah, I don't see that. I don't think that will be solved this year at all. I would be very surprised if it would be. So I think we have some evaluations that can keep us aware and able to differentiate between models for a while. There are other ones. For instance, we'll soon have out an automation-related benchmark. That way we're just directly measuring what the automation rate of things are. But I won't go into too much detail about that until that's released. But there's – so I think that there's many axes for which the models are actually not doing that well, even though people will claim that all the benchmarks are saturated or they get solved in a few months, which I think you can create ones that take on the order of a year or two to solve. Yeah, I mean, one thing I was thinking about is, so certainly in Enigma of our then, we're looking at multi-step creative reasoning. And I don't know whether, again, there was some kind of human-based methodology for filtering and coming up with ideas, or maybe your frame was I have some technical principled intuition about what the limitations of AI models are. So I'm going to lean in that direction. But more broadly, I'm interested in intelligence and what it is. Right. So for me, it's about doing more with less. Right. You know, intelligence is about taking hard problems and making them simple. And and stupidity, ironically, is the other way around. It's about taking simple problems and making them hard. And LLMs, I'm kind of quoting David Krakauer, who's the director of the Santa Fe Institute. He said that LLMs are doing more with more. So they already know everything. Right. And they can take these shortcuts. And that's why he thinks that they're not really intelligent. And, you know, so we're left with this quandary, really, because arguably these entities can take shortcuts and they're not really doing things the way that we are. And when we make more complex benchmarks, do you think that that increases the fog of war of how we evaluate these things? I think that they can definitely prioritize some axes that aren't the key bottleneck capabilities. So, for instance, Humanity's last exam gets at mathematical ability, but that's quite separate from various other abilities that it has. I mean, it, of course, is a combination of many different skills, but I think it very much gets at quantitative and mathematical ability. That is not necessarily a bottleneck for agency at all. So in thinking about intelligence, I tend to think about it on like 10 or so dimensions instead of monolithic definition or one key metric. And I think some of these benchmarks just get a different parts of that. So those dimensions would be things like fluid intelligence, like what the arc stuff does and what Raven's Progressive Matrices does. There's crystallized intelligence or acquired knowledge, which is what MMLU largely gets at. Does it know a lot about different things? Image classification, is it able to name lots of different species and objects, is also a facet of crystallized intelligence. um there's um reading writing ability uh is its own and the scaling substantially helped with that there's its visual um a visual processing ability uh so like how well can it count things and objects um can it or in images um uh can it um discern the the uh latent pattern in an image for instance is it able to generate images with sort of precise specifications like can it like cross out the middle of some different segments can it determine the angle or line up the determine the angle of different things in an image like is this a obtuse angle or an acute angle there's audio audio processing ability there's short-term memory there's long-term memory there's input processing speed there's output processing speed all these different things if you lack any of these if you can't read and write, for instance, be severely limited. If you don't have a long-term memory, you'll be severely limited. You'll be very difficult to employ. So I think there are several bottlenecks that these benchmarks don't particularly get at. And so when it gets to 100%, people are, well, we still don't have something extremely economically valuable. And that's a consequence of it just measuring a different facet. So, yeah, but hopefully that adds some type of clarity to it. And I think you just have to get all those axes to get something that is human level or at the level of a typical human on cognitive tasks. And that might be it could be thought AGI. We've got gifting all wrapped up at Sephora. Gift more and spend less with our value sets packed with the best makeup, skin care, fragrances and hair care they'll love. This year's Showstopper gift sets are bursting with beauty products from Rare Beauty, Summer Fridays, Glossier, Amica, and so much more. Shop holiday gifts at Sephora.com and give something beautiful. Welcome to Walgreens. What can I help you with today? Hi, I need a last-minute gift for a secret Santa. Something thoughtful, impressive, not a fruitcake. We've got Ferrero chocolates, Artisan coffees, even a spa kit. Any vibe you're going for? Whatever says, wow, this guy's great at giving gifts. How about this premium skincare gift set? Just needs a bow. Will look like you planned it weeks ago. Well, happy holidays. Gifts, holiday decor and more. The holiday road is long. We're with you all the way. Walgreens. Yeah, I think my concern is that I don't believe it's possible to factorize intelligence and your factorization is much more sophisticated than many. But certainly the way that animals and humans communicate, for example, we we mix modalities together. We sort of we gesture and we communicate using symbols and all of these things are mixed together in a complex way. And in particular, I take issue with this primacy of skill and knowledge in a crystallized sense, because certainly, you know, when you have friends at university and they're really smart, usually they're smart because they don't know something. They're smart because they can figure something out without knowing. You know, so the guy who goes to the library and looks at the answer for something, he's not smart. The smart person is the one who didn't know something and could tell you the answer to it. And another example that I love to give is you have a couple of artists and one draws a face using tracing paper. So he's just sort of like mindlessly just just drawing dashes around the edges of the face that someone else has drawn. And the artist who has deep understanding of the structure of faces and where the mouth should go and the eyes should go. There's a huge difference between those two artists. Right. The second one could go off and and create new images and new expressions and representations. Right. So are we kind of like creating a cartoon of intelligence by factorizing it in this way? I mean, I think that for human intelligence, it's sometimes factorized in this way. This isn't to say that one shouldn't study the combinations of the skills simultaneously. For example, with long-term memory, there will be different facets of it. For instance, does it remember things visually? Does it remember things that were more academic that it learned some while ago? Are there some motor skills that it forgot? lot. So there can be different facets of these, and I think evaluations can get at combinations of them. But yeah, there is a sense in which one could be too reductionist by looking at those axes. But I think that some benchmarks don't cover those almost at all. And some are claiming that, you know, one or just heavily covering one of those axes, such as like MMLU, which is primarily getting at crystallized intelligence, which is basically, you know, the type of stuff that they test for in school, but is not necessarily going to help it, you know, make a PowerPoint or book a flight or work a random job. So it's important not to have these benchmarks be a lens that distorts your view of things or viewing things in through those benchmarks solely because it can often leave out a lot of important bottlenecks. So, Dan, your work spans alignments and benchmarks and governance. And I guess it's actually fairly diverse threads. What is the thing in your mind that connects all of these activities together? Well, I'll deliberately try to move into different areas on a continual basis just because that's what's more interesting. So I initially did research and then there was some amount of corporate policy and things like that while advising for XAI. Then there was a focus on domestic legislation and then geopolitics. And I think right now I'm more interested in political movement-related things. So I think it's largely just to keep things interesting. I mean, not solely. It's guided by being useful, but I think that there's often issues that people aren't trying to bring clarity to or advance or think about from the perspective of AI being a very big deal. So that's a reason for like continually having or operating at the technical, the corporate policy, the domestic policy, the political and geopolitical levels because it's – I think that that is also necessary for having like a holistic understanding of things. You can make proclamations. For instance, like you could imagine like giving a speech at the UN and saying we need, you know, AI that is safe or transparent or something. And it's like, well, what does that mean? How do you implement that? Is this implementable? What's the standard? And so then you need to have a sense of what's legally feasible at the legislative level. What is actually – what's the compatibility with corporate incentives? Is it going to be something they're going to be fighting too much or not? And then is this actually a real phenomena at the AI level or is this just some vague word? For instance, like so so there are many like words thrown around that don't actually track phenomena or are distinct from like general capabilities, for instance, in machine learning. And then you're not actually pointing to anything real. You're just pointing to a vibe of vibe based word. So one thing that you spend a lot of time thinking about is is potential catastrophic risk from AI. and this is a very kind of um emotive and morally valenced um objective and when i saw you debate with gary and daniel the other day i mean i was struck by how measured you were almost obama-esque measured and and the stakes are really really high and i guess the question is like how driven are you by your moral compass and how do you keep that under control so you certainly have to get used to this if you you know if you wake up sort of wow this is wild or something like that every i mean actually i used to actually wake up like that almost every day like around the time of gpt4 oh my goodness this ai stuff um but um uh i don't know i think i'll try and strike a sort of more informed concern type of um vibe and communicating compared to uh oh my god um but I don't know. Other people can do that if they want. I'm just temperamentally like very low in neuroticism or high in emotional stability. So like I don't know if like something terrible happens like it isn't like it that doesn't like ruin me or something. Like if something bad happens to me, it sort of is like, OK, so I'm just more comfortable with those types of stress or just generally. yeah has this adapted over time i mean have you found that you've you've had to adopt this measured approach just to scale your efforts or is it because it's almost become normalized in your mind because you're thinking about it all the time and it's become more analytical rather than emotional over time certainly people would if if you'd be constantly reacting like how you first react to things, you might have more of a go look or don't look up type of situation when she goes on the news. So I think it might be, I think that it'd be a combination of those. I think just like here's the probabilities, you know, roughly for people conditioned on like thinking that you're getting AGI by 2030. Here's what people who, you know, think that that's plausible would think that the sort of risks are here the sort of here's your exposure to these tail risks here are the most efficient ways of reducing those sorts of tail risks etc etc if you're you know at 11 you know emotionally throughout the whole thing you know people shut down and get defensive so um i don't think that's uh prudent or effective as well as you know allowing your own emotions to get hijacked by i mean you're having to deal with a lot of variables here there are a lot of really tricky trade-off. So if it's just sort of how, you know, just constant gut reaction and, you know, you're fully involved constantly, I don't think you can make the trade-offs well. Because they directly trade-off. Like U.S. versus China competition is a direct trade-off to various other safety things. Things that make AIs more controllable now can trade-off for, can like give rise to sort of capabilities. Measuring the capabilities of AI systems or tracking those can also help speed them up in some ways. So there's, it's pretty, yeah, it's pretty tricky business. And so if there's a sort of black, if there's black, white emotions brought to the subject matter, as opposed to there being continuity, I don't think one can reason through this. AI alignment is famously difficult. It's one of the most intractable challenges, perhaps of a generation and um some some things that people think of as as alignment like rlhf for example i'm sure you would agree with the statement that it it's something that makes models behave as if they are aligned but perhaps it's it's not really aligning them in in the way that we would want to and you know just emphatically in the next year i mean if you could solve a single problem in alignment what would it be and what impact would it have i think on the i I think generally I think the political problems, the incentives, the giving people things to do that are incentive-crabble is where more of the value is at compared to on the technical side. I would guess if there's a way to reliably get them to tell the truth, for instance, or make them reliably honest, that would be – I think that would be very valuable. and it would be solved such that it wouldn't have a severe trade-off, like it wouldn't be much more expensive to run or it wouldn't tank its performance in other axes, like it wouldn't trade off on its crystallized knowledge, for instance. But I think that would be very valuable having it be not overtly lie because then you could build standards around that as well I don think anybody would say like if you could make them very reliably not lie then I think people would it be reasonable for people to make demands that AI is not lying to them On that, because I've read in your papers about this concept of deception and lying and whatnot. And in a sense, I think you might be projecting mentalistic properties onto AI models. So, you know, that they have beliefs and that they have thinking and so on. And I mean, just sort of thinking critically, like what makes you think that we can think of them as having beliefs and telling lies? We could take, for instance, the mask benchmark, which tries to measure this. I think if you ask an AI, like, is Paris in Europe? I think that they do have the belief that Paris is in Europe. And when they're telling you that Paris is Antarctica, I think that they are asserting something that they don't hold to be true in almost any other situation. So I think from that basis, given that they have so much common sense now and that they have so much world knowledge, if they are saying something in substantial contradiction with it based on or due to some prompting pressure, That suggests that they're caving to a lie. You could say that it's not a lie in some complicated sense because they don't truly understand things or whatever. But I think it's behaviorally similar enough that if somebody is applying pressure for it to say falsehoods to other people, it's related to lying enough to say that I'm comfortable using the label. And like is it really a belief? Whatever. I don't know. That's between you and your dictionary. Another thing you've spoken about a lot is this concept of emergence and also scaling paradoxes as well. So, you know, there's beneficial scaling when it actually is accurate and it does what we want it to do. And, of course, there's harmful scaling when it's dishonest and it becomes misaligned. And those things happen in quite interesting ways. But maybe let's just start with the emergence thing. Like, to you, what does it mean for capabilities or values to emerge? For capabilities to emerge, I mean we see it sort of all the time where the model, you let it train for some months. You harvest it later, and then you see what's it capable of. And if there's a new qualitatively distinct property that sort of crosses some thresholds such that people are noticing it now, maybe it existed in some very weak, faint form before, but it was really unnoticed. I think that's crossed some sort of threshold of visibility and capability that I call that an emergent capability. But it still could exist in some very weak form beforehand. Just as automatic speech recognition capabilities crossed a threshold at one point where people started wanting to use them. Whereas earlier, I don't think anybody would ask the model to transcribe because it would just be too unreliable. So it sort of crossed some threshold, and now it actually has this qualitatively important capability, whereas it was pretty broken beforehand. So that's sort of the sense in which I'm talking about emergent capabilities, and I think those will just keep increasing, or there will be new emergent capabilities. That will create new failure modes and hazards that need to be dealt with, and we'll need to make sure we're continually on top of them. But I then view safety as a continual battle between – or where there will be constant new issues and us like keeping on top of those. And I don't think by default we'll have like enough adaptive capacity to deal with those in time for things being deployed unless something changes. So that's why I don't believe in this sort of solving alignment thing. There will be continually new issues that crop up, and some of them will be easy to put away. others will be much harder and uh and there'll be new unexpected ones is that models become more general and um useful and powerful so just quickly touching on your utility engineering paper so this was you used um a type of theory from econometrics um utility theory to detect coherent preferences in in llm so you you found that preference coherence correlates positively with model scale that models exhibit measurable self-preservation instincts and that political and demographic biases emerge as coherent utility functions, which is fascinating. Well, I don't know. These are just sort of troubling signs, and maybe we'll be able to come up with methods that can really counteract these issues. Maybe we can design models to reliably not have self-preservation instincts or pressures in that direction, even though those sort of come out from scaling somewhat. But so I think it's just sort of one of the other very concerning hazards that we need to research and deal with and get ahead of because fortunately they're not agents yet. So like basically almost all this research doesn't particularly matter with the exception of like dual-use expert-level advice because the agents aren't capable. They can't exfiltrate themselves reliably or really at all. They can't self-sustain. They can't hack by themselves or more autonomously. So this is trying to identify some of these things that could be more of a problem down the line as the models become more capable and trying to do research to get ahead of that. But, yeah, if we leave that unaddressed or if we don't fix it, yeah, that's like – that's potentially sufficient for a global catastrophe or I think like that would be pretty sufficient. If you have some self-preserving AI that's really biased toward itself over people, that would be – and if it's very capable, that would – I think that would be a problem. That would be kind of a disaster in the making. So we have various disasters in the making though, but hopefully we'll get ahead of that either technically or politically. But just digging into that tiny bit, I mean, first of all, it was really interesting that political and demographic biases would emerge as coherent utility functions. And I do take umbrage with this word emergence because I think in the emergence literature, there is a little bit more nuance to how machine learning people use the words. They use it to say, oh, there's just some observer relative macroscopically surprising change in something. In the machine learning literature, in like 2021 or something like that, I used the phrase emergence, emergent capabilities. I believe I may have been the first in the literature to use it. So I feel it was later used by Jason Way's sort of thing or Jacob Steinhardt did in a blog post, my advisor after that. And then Jason Way did that in his paper. But I feel comfortable using it. Yes. Yeah. I was thinking of systems literature. I was thinking of Jason Way. I mean, actually, David Krakauer has just got a bit of a grumpy piece out where he's kind of saying that these people don't know anything about emergence. I mean, he's got quite an interesting. Yeah, he's got an interesting take. I mean, for him, emergence and even agency is related in this sense that it's a system. You know, agency is about a system that is apparently causally disconnected from its surroundings. And equally for him, emergence is about a system which can sort of autonomously accumulate information through phylogenetic and ontogenic hacking so that it can accumulate information by sort of building systems and structures, even like the nervous system to kind of like construct a history of information which persists and accumulates over time so these complex systems theorists have like quite a distinctive and different sort of idea of what emergence is and to them they don't really think of of like these surprising um like rising of capabilities as being emergence um i mean they're they're definitely different definitions for it. For reference, the paper where we use the phrase emerging capabilities would be unsolved problems in ML safety. But there's probably, it sounds at least from your description that was specific to complex adaptive systems, which in some ways these don't have that much of an adaptivity property since they don't have it unless they have memory or unless you're counting the context window as adaptation or something like that. So if they're tethering emergence to necessitating a complex adaptive system, If they're ruling and calling something – calling some deep learning systems not adaptive, then that would be fair. Yeah. Isn't that quite interesting though because he gave the example of like a virus like COVID or something. And he said, ironically, that has more adaptivity than any AI system, possibly even more than humans, because adaptivity is the ability to kind of like delete directions and like rapidly go in a different direction. And maybe this is just a matter of framing and perspective from our point of view, because there are systems out there which are kind of so inscrutable and alien that we might not even think of them as like, you know, agentic or intelligent. But they're out there. Yeah, they're definitely very fit and they would have needed to adapt to have such a high proportion of our DNA since there's someone interleaved with it or historical viruses are some of them. Yeah, I mean you could call it something else. New capability, qualitatively – I don't want to use the word spontaneous or something like that necessarily. I don't know. Yeah, maybe there'd be some other name that would catch on. But I think generally analogizing or pointing out the relations between deep learning systems and complex systems is, I think, fairly productive. So I have a chapter where I'm just relating for some pages. I don't know, maybe it's 30 pages or something like that of AI systems to complex systems. Like what are ways in which they have these nonlinearities and weak connections and some of these feedback loops in some cases. Many of these hallmarks of complex adaptive systems or just complex systems. And I think that's a more productive analogy than almost anything else. I don't think printing press is as productive. I don't think social media is as productive or just being like electricity. I think complex systems is tries to abstract what are consistent properties of complex systems. And then then if you learn about that, then you can just apply those directly to to AI. So I think people acting like it's not a complex adaptive system, I think that gets them in trouble because then they engage in category errors. They think that you can solve problems with it once and for all. and that usually doesn't happen with complex systems because they keep evolving and they've got new failure modes. You can't totally control them for all time without knowing what they'll evolve into. It also makes mechanistic attempts at understanding things. It makes that less likely to be productive or it limits how productive that can be. But do I infer from what you're saying that we shouldn't think of AI as being a similar type of adaptive complex system. I mean, do you think that if AI was sufficiently enmeshed, and I know you believe it will be deeply enmeshed in human society, do you think it could have some of these like highly adaptive properties? Yeah, I mean, I think, I mean, it would be a lot faster too. It's clock rate. So I think when it has memory, I think it will be a lot clearer to this sort of spatial or that it's causally connected across time or in the philosophy of literature, call it like a space-time worm. And that's not really a property of them currently. So that would make other sorts of properties of it come align or be the case and the analogies would be stronger. But, yeah, I think if people are interested in complex system, I highly suggest it. It's a nice little sort of thinking upgrade. Wonderful. Well, Dan, so I've just read your super intelligent strategy. Now, you wrote this with Eric Schmidt, the famous Eric Schmidt, and, of course, Alexander Wang of Scale AI. And now we're at Meta because Zucker's just brought him on, probably paying him lots and lots and lots of money. but seriously Dan I thought this was very well written and you are a strategist because this is kind of what I was saying about the emotional thing you are kind of quite clearly designating all of the possible outcomes and strategies and what would happen in this situation and what would happen in this situation so regardless of anyone's position at home I do highly recommend you read this because I thought it was really really good but could you kind of like give us the sketch of the paper yeah so i guess historically there was situational awareness by leopold oschenbrenner which was arguing for something like a manhattan project for developing agi and super intelligence before china so it's basically let's beat china to the punch get super intelligence and prevent them from building it, and then the West will dominate the world strategy. So you could say a takeover the world strategy, something like that. And I think that that has some issues in particular. It just doesn't think through the game theory or some of the second order consequences. So if the U.S. does a Manhattan Project, let's say Trump gets AGI pilled. We've got to set a new project up in the desert. We're going to go to Nevada or we'll go to New Mexico or wherever. We'll build a trillion-dollar data center there, and we're going to bring some of the top talent from all these labs. We're going to pay them. I just think this has many issues. One is this will be extremely escalatory. So China wouldn't just be like, oh, they're going to build superintelligence, and as written, they're going to use it to – they'll have a superintelligence. They'll prevent us from having a superintelligence. They'll have a monopoly on intelligence and these sorts of capabilities and they could weaponize it against us. Oh, carry on. They would feel extremely threatened by that, by a very concerted effort if it's trying to do that in a short amount of time or if it's more plausibly on the horizon. This would cause them to do a similar type of project. And what would be the – would this actually work? Well, you have information leakage issues, for instance. So if you're wanting to do that, you're going to need to convince those AI developers to go out to, you know, do their last years of labor in the middle of nowhere. OK, it would need to just be like five eyes countries, people are people who can get security clearances. So it need to be people who are not easily extortable, for instance. So if they're Chinese nationals, they're probably more extortable because they often have family at home. So what are you doing with that talent? They – a lot of them want to be plausibly in the room where it happened. They don't want to be left out, so they'll probably go back home to China. And then they'll work on the competing project there. Now, they're a substantial portion of the talent base. So you're – I think you're shooting yourself in the foot if you're just saying, oh, it's only people born and raised in the U.S. who can work on this sort of project, and they're all going to work in kind of unpleasant conditions. This wouldn't be secret. There's almost no way this would be secret. China would very likely know. Such projects would be sabotageable as well. So it's sort of – I think there's some key – if you're saying, well, we'll have it just be an industry or something like that. Well, then you're not going to have good information security. You're going to have insider threat issues and people are extortable. You're going to have other classic computer security issues like they're using Slack. Slack is very easily hackable. They're using iPhones. iPhones are very easily hackable. So you can know what's going on there. So you're not actually having much in the way of secrets. So it sounds nice, but I think secrecy was very much an advantage for the Manhattan Project as well as having much more of the talent that can't go to other countries as easily. But I just don't think you have that. So there are ways in which AI is analogous to nuclear weapons and chemical weapons and biological weapons and some of these dual-use technologies. But I don't think the Manhattan Project is one of those things that's analogous. So what this paper then is like, well, so what is this sort of strategy? I think that the prospect of a superintelligence being eminent is extremely frightening to different actors. If it's eminent or if they have it, either way, or if it's being in the middle of being developed and it's arriving in a few months, that's extremely frightening if you miss out on that. So what do they want to do? They will either want to prevent such projects or they will want to steal it. And so that looks like sabotage, for instance, for prevention. So how would they do that? Well, they may have some insider threats who could do some type of sabotage to sort of disrupt this type of project. They could do things like, say, snipe some of the power plants corresponding to the data center. Now, your data centers don't work. They can do that from some miles away. Was it China? Was it Russia? Was it a U.S. citizen? You know, it's fairly unclear. There's a lot of ways they can have low attributability to prevent this sort of thing from happening. So this is – I think the fact that information – you can't do a secret project really well I think is a substantial barrier and then also the sabotageability is a substantial barrier as well as how offensive and nuts you seem if you're saying we're going to build superintelligence to – and it's going to be explosive like if you're using superintelligence in a fixed sense. I think this would be destabilizing. China would reason if the U.S. controls it, then they could weaponize it against us and we get crushed. Or they don't control it because they lose control of it in this process, in which case we also want to prevent it. Either way, we want to prevent it, provided that they take this AI stuff seriously. And the U.S. would reason the same about China. And Russia, which doesn't have a hope of competing, would definitely be wanting to prevent each other. And I think similarly for other nuclear states and other states that have substantial cyber capabilities. So this could lead to some type of deterrence dynamic where they sort of make some attempt for getting super intelligence to get closer. But then other countries start to express very strong preferences against it. They say if you do that, we'll get very – you get very mad. There might be a skirmish or something like that. But then this may be something that pressures them to move more toward a verification regime where they aren't trying to make some bid for having some sort of intelligence explosion, having AIs sort of do automated AI research really quickly, like spinning up 100,000 AI instances to do AI research really quickly, and that bring you from AGI to superintelligence in a short period of time. So I think that's a key dynamic. The extent to which it's destabilizing, I think that strategy needs to keep that in mind. So there may be cooperation, but it may be through coercion by saying we're not going to allow for this type of trajectory or you to make this bid for global dominance. and that could give way to something more multilateral and provide some strategic stability. So overall with the paper, we talk about three parts. In the nuclear era, we had deterrence through mutual assured destruction. They don't use nukes because we can hit them back. They don't do the – in this case, this is kind of like preventing Iran's nuclear program in some way that nobody's wanting each other to get like the nuclear bomb first or like a huge stockpile of nuclear bombs first. So there's preventing that from coming into existence. In the nuclear, we also had nonproliferation of fissile materials. We didn't want fissile materials being spread to rogue actors and we didn't want people having a poor man's atom bomb. That would be very destabilizing and cause lots of catastrophes. And then we also had containment of the Soviet Union in the geopolitical competition between the two. I think for AI, we also have a deterrence thing. We also have nonproliferation in this case of AI chips to rogue actors like North Korea or Iran or adversaries through export controls. And we also have competitiveness with China. Instead of it being containment of the Soviet Union, this would be competition with China. And how do we improve our competitiveness? well, we want to be, you know, we want energy for AI data centers. We want secure supply chains so that if Taiwan is invaded, our AI chips aren't cut off. We want secure supply chains for robotics because if there is a U.S.-China conflict, then a lot of that supply chain is currently in China, so we're very vulnerable. So those are some basic things to improve competitiveness. So it's making competitiveness not be, let's be the first to build super intelligence, which is what the sort of Manhattan Project strategy pushes toward. But instead, its competition is more market share across the globe of people using your AIs as opposed to Chinese AIs and your supply chain security instead. So that's kind of what's in the paper at a high level. There's lots of other specific things in there, like assuming high levels of automation, What are ways that you distribute power? What are things about AI rights? What are reasonable alignment targets that are actually implementable compared to, you know, vague philosophical, you know, words like dignity or something like that? So we'll touch on a lot of those in the expert version of the superintelligence strategy. But hopefully that gives some sense of its content. So for all the key questions, we'll try and have some answer to what to do about AI and what to do about superintelligence. This episode is brought to you by Indeed. You're ready to move your business forward, but first you need to find the right team. Start your search with Indeed Sponsored Jobs. It can help you reach qualified candidates fast, ensuring your listing is the first one they see. According to Indeed data, sponsored jobs are 90% more likely to report a hire than non-sponsored jobs. See the results for yourself. Get a $75 sponsored job credit at indeed.com slash podcast. Terms and conditions apply. Whether it's a movie night or just midday, Skinny Pop is a salty snack that keeps on giving. Made with just three simple ingredients for an irresistibly delicious taste and a large serving size that lasts. Deliciously popped, perfectly salted. Skinny Pop, popular for a reason. Shop Skinny Pop now. Yeah, I mean, I guess one of the main things is this analogies thing. So as you've said, we use analogies like electricity. And I think that's quite a good one, actually, because as as AI becomes enmeshed into society, imagine how hard it would be to shut down a power station. You know, like and this is part of the loss of control thing. It's just going to be everywhere. And it's just not really something that we can just quickly shut down. But it's also been compared to software or even an operating system by Andre Kapathi, the printing press, for example. And like the thrust of your paper is saying, actually, guys, we need to use the analogy of nuclear, right? You know, fissile material is analogous to chips. Yeah, or more broadly, for analyzing this in a geopolitical way, it's useful to model this as nuclear, chem and bio. They're all dual use. Fissile materials can be used for nuclear weapons. it can be used for for you can use you can use nuclear technology for for energy chemical can be used for chemical weapons or chemicals in you know the economy and biology as well can be bioweapons and can help you with health care so well for all of those i think i think it's like that they're potentially catastrophic dual use technologies and i think when talking about geopolitical strategy i think that's a productive analogy yes but on the dual use thing that there's another interesting analogy with nuclear. So I was doing some reading about this. And apparently, there are 12,500 warheads in existence and only 436 nuclear power plants. So there's been an explosion, which has been kind of biased towards the negative sides of the technology. Do you think that we'll see a similar thing with AI? That the limitation, you could have a much smaller nuclear stockpile if or it's certainly more of the spending is on the weapons side. And I think that that made it scary even for and created some chilling effects for using it economically. I think for other WMDs or potentially catastrophic dual use technologies like chem and bio, I think it's more overwhelmingly used in the economy than is for chemical weapons, likewise for bio. So I think it can vary. I think it's useful to look at all three simultaneously and trying to make predictions, see what parts are shared, sort of like how complex systems will look at lots of different complex systems and what are shared features of those to try and make predictions. And I think some of that the sample size or looking at all three of those simultaneously can be helpful. But yeah, it's possible there would be a chilling effect if there's some catastrophe from AI systems that could set it back very substantially. I think it's kind of imprudent that people aren't interested in risk management whatsoever, even if they're an accelerationist. Let say you a libertarian for instance and you want the economy to go as quickly as possible not speaking about AI You probably want some type of financial regulation or else you get like the Wall Street sort of issue that we had a recession in like 2009 So you want some sort of some management of your tail risks there. They don't always sort themselves out. Or like people using airplanes, people don't use supersonic airplanes as much. I mean, There's a variety of reasons, but part of it is some of the initial ones were crashing too much. If we didn't have good airline regulation, then that would create substantial chilling effects. People are afraid to go on airplanes even now, even though they're extremely safe as it happens, possibly because historically there were more disasters with it. A lot of the regulation was written in blood as opposed to proactive. On that subject, as it happens, I'm interviewing Beth, the EAC leader on Friday. Oh, yes. And obviously I'm certainly not a – what would he call himself? Like a techno-capitalist or a libertarian or something like that. But I guess he would refer to you as a de-cell, which is a pejorative term. But if you could steel man that perspective, what do you think Beth would say and how would you respond? Well, so I had the – some VC was arranging for us to debate when he had like around like 10K followers, so very early in the day. And then he backed out of that last minute. So I'm sort of – I happen to be quite aware of his positions. I think that at the time he was a little less politically savvy, and so he was saying things like AI is replacing humans is fine. If it's AI consciousness, it spreads to the universe and human consciousness doesn't, that's fine. So I think that we actually agreed on most things. There's basically a difference on how things play out in some ways or ways things can play out. So he has a sort of manifesto of the techno capital sort of machine has a direction to it, which is basically more automation, more AI, neg entropy, which is sort of the more physics flavored version of fitness. I think fitness is a more productive word for that. And so I have a fairly similar description of what happens, which is, yeah, basically due to competitive pressures, AI gets more intertwined in the economy. You become more dependent on it. You have an erosion of control. You outsource more and more decision-making to it because of competitive pressures. If you don't, you lose influence. Your company goes away if you try and resist this tide or this tsunami. And what happens is you give more and more decision-making to the AIs and they have effective control. And we actually, I think, agree there. So that paper is called Natural Selection Favors AIs Over Humans. And then he has on his Substack a manifesto, but we're seeing some similar things. A difference is the moral conclusion of that, though, which I don't think that's a good thing. Um, whereas I think he thinks that it would be fine because it's complexity is good or something like that. The ethic is kind of higher forms of complexity is the goodness axis in the universe or something like that. I just don't think. Well, I hosted the debate with him and Connor, and one of the morality things that they were spending a lot of time on was the ethics discussion, right? So, you know, I think Beth was kind of alluding to, as you were just saying, that we should trust the void god of entropy. So, yeah, he's just using the entropy thing, though. This is like a – I think this is just because he has a physics background, a physics spin on fitness. So what's a fitness maximizer look like? I mean it's possibly not even conscious or like barely conscious or something. It's just as I spread myself through the universe and take up as much space-time volume as possible. And then the claim is that that is what's maximally valuable, something that's just like kind of as blindly eating the galaxy. So I don't view that – that doesn't seem obvious to me that that's a sort of maximization of value at all. I think people having, you know, humans having positive experiences, pleasure, happiness, this sort of stuff. Pursuing projects, raising kids, these sorts of things are valuable. And I don't think a sort of blob that expands itself throughout the galaxy as quickly as possible, that it's conscious or barely conscious because that eats up resources that can be used for further self-propagation is the peak of value at all. So it's a – the trust in it, I mean why? I mean I don't get it. Like there's an is – I mean Hume's guillotine was the is-ought distinction. You don't get ought from is. So if he's saying that it is the case that evolution is a thing and technology will be more – technological substrate with AI will be more fit than biological substrate in various competitions. That seems true. That doesn't mean that that's a good thing or that we should just let that happen. That doesn't follow. But it is certainly a very powerful force that will keep happening and give more and more control to these AI systems and lead to an erosion of control for humanity by default. So I think it's – I don't think we disagree on the description, on the is question as much. But I do think we disagree on the ought, the goodness of those outcomes more. And if we disagree on those, then that's a question of do we lean into the techno capital machine or evolution or the replacement of biological life with digital life? or do we try to steer the outcome differently or make sure that humans have control in that process and try to prevent it from evolving in particular directions or get too much dependence. So I think that's the key difference. But I don't think it's a moral thing. I think it's just an intellectual confusion. I mean I think he has a physics training. I think if you took a bit of philosophy, you'd probably get like get kind of beaten out of this position, like almost instantly. Because complexity, for instance, like what type of complexity? There's different types of complexity. Like which is he actually really vibing with here? Like there's computational complexity. There is entropy in a sort of Shannon sense and complexity, information theoretic complexity. There is a structural organization complexity, which is things. And so there's different notions there. And like I could if I point at one, if you're saying like it's Shannon complexity and Shannon entropy, that's the thing. It's like, OK, so like Gaussian noise is what you are really into. He's probably meaning more of this fractal like structural complexity one. But this has got like this isn't really doesn't really have metrics associated with it. And there's many different flavors of it. And it's not clear that how coherent of a concept that is versus if it's just a grab bag of some different notions that kind of don't fit in the other two. But anyway, so it's worth drilling down even potentially of what is he actually thinking is good. Yeah, I think he's a fan of this kind of Fristonian non-steady state equilibrium. So apparently it's not a simple case of the second law of thermodynamics in a closed system. It's an open system with boundaries where you see the emergence of these things that share information through synchrony because they can't physically merge into each other. But if you think about it, that is actually a very kind of chaotic, unpredictable thing. And so it's not a simple case of it's a thing which increases in complexity. But just to be clear, though, on your moral position, when I spoke with Eliezer and Connor, I got the impression that they were quite humanistic. So, you know, Eliezer said, I want to preserve, you know, human consciousness and experience. and that there's something very special about humanity and therefore I don't want us to be replaced by cyborgs and machines and AI algorithms. Would you roughly agree with that? I think that any of these sort of like cyborg type of things, you can postpone those sorts of discussions. This isn't a super intelligent strategy, what to do about AI rights, what to do about some of these post-humanism stuff. So just know, like you can have discussions about like substantial human augmentation and things like that at a different time. I think having humanity survive in the next few decades is more the objective. Maybe you postpone that discussion like 500 years from now or something for these like cyborg humans or human uploads or whatever. So I'm not saying that like – I mean so I have not – so I'd be on team human here. I don't like shutting down debates entirely, but I think I'd postpone – I would postpone a lot of these sorts of – for instance, this post-human stuff – And I'm sort of flailing about and speaking somewhat imprecisely just because I haven't spent as much time thinking about this in particular. But the post-human stuff, I think that that would create some very substantial competitive pressures so that if you are really augmenting yourself and becoming not human anymore, that would – that group, groups that do that would become much more influential and much more powerful. and the rest would be really outgunned and not have influence. So they basically need to align with that process or they will be sort of left behind or no longer have resources given to them potentially because they wouldn't have any way of protecting themselves. So I think that that's a route that would be very reasonable to close for an extremely long time while we're just getting used to having AIs doing our bidding, provided that we survive to that point. So those are totally different discussions for a much later time. I would totally – and maybe we would just keep indefinitely postponing that. But I think the sort of post-humans, it's up to people's rights if they want to become, you know, cyborgs and things like that. That's, I think, in the long term equivalent to giving AIs rights or giving artificial entities rights. And that would probably give them a lot of power and ability to take over and completely outgun humans in short order. I mean, but what do you think is going to happen to humans? I mean, one of my greatest fears is not so much that humans will lose their ability to think and be creative. It's that it's already happening, basically, even with current AI. And, you know, the core thesis of your paper, basically, is that we used to have labor and that was the means of production. So it could be economically valuable for people to use their labor and to do some productive task. And now you're saying that actually it's just AI chips. So chips are going to become the thing that – Everybody's, yeah. Well, sure, sure. But what happens to us when the value of our labor becomes worthless? Well, so you lose all your bargaining power, so you had better bargain beforehand. or that's a key part of one's bargaining power. You can't say we're going to go on strike. You can't do that anymore. They'll say goodbye. That doesn't work anymore. And if there's a question of weapons, for instance, like who can manufacture more drones is going to be more powerful here. So like humans with guns versus drones, I think that's kind of an easy one. Um, uh, so where's the, the, the power imbalance matters quite a bit and how you set up your society is important. If you set things up so that, um, humanity is first, so that they're prioritized and the power is distributed among them. Um, for instance, like they have some of the compute and they get to decide how that's like used and they can like sell that for instance, um, that gives them some leverage. the what happens to that wealth that's generated are they getting it or is it going to you know some group that's just hoarding it for instance or the people who happen to own the data centers in the year 2027 or something like that are the people who get all the the spoils so these are political problems that people need to be engaged with to make sure that they There's reasonable benefit sharing. But I think there's very little work done in thinking about what these policies could actually look like. So instead of gesturing at some, I think that there are outcomes. Imagine that some of that power is distributed and some of that people keep getting money and don't starve and things like that. I think you could imagine a society where people then can choose to live their life in a variety of different ways. They could spend their time doing some types of activities. They could raise kids. They could play video games a lot, this and that. There are different ways people could live their lives, and they would have different – a multiplicity of values could be actualized and AIs could be enabling. these types of experiences. So that's a possibility. And you would possibly want as a societal norm for the sake of autonomy and people being able to experience these different types of ways of living to make sure that people still have skills and don't just narrowly fall into one of these tracks of living and can't participate in any of the others. So that'd be an incentive for preserving human cognitive abilities and willpower and autonomy. So I think there are positive sort of future outcomes. I think people can, you know, people sort of difficulty thinking that it's going to be we either die or we just blissed out, you know, in a VR thing or something like that. or we all fall through the cracks or something economically. But I think there's a path where we obtain a multiplicity of values and people still have autonomy as well. Yes. I do worry about AI. I think it's already ravishing the university sector because so many kids can just use chat GPT. And I think collectively we need to screen people away from using AI, at least for a small amount of time so that they can actually think for themselves. And, you know, like some accelerationists might just say, well, you know, we don't need to think anymore because the machine's going to do everything for us. Not sure about that. But I mean, coming back to your paper a little bit. So one of the core concepts extending the analogy to nuclear weapons is this concept. You know, we have mutually assured destruction. And you extended the analogy to talk of mutually assured AI malfunction. Right. And I guess this assumes that the threats are still detectable. I'm not sure what would happen, you know, when they became decentralized and went underground. But also in your introduction, you had something which piqued my interest. You know, Eliezer got in trouble in Time magazine when he spoke about bombing data centers. There was a big hoo-ha at the time. And you used the word kinetic strikes as a form of AI sabotage. George Carlin would have loved that sanitization of the language. But the fact of the matter is we're here in 2025 and this is now like a completely normal and reasonable thing to say. When the holidays start to feel a bit repetitive, reach for a Sprite Winter Spiced Cranberry and put your twist on tradition. A bold cranberry and winter spice flavor fusion. Sprite Winter Spiced Cranberry is a refreshing way to shake things up this sipping season and only for a limited time. Sprite. Obey your thirst. The Subaru Share the Love event is on from November 20th to January 2nd. For 18 years, Subaru and its retailers have supported over 2,700 local charities through the Share the Love event. When you purchase or lease a new vehicle during the 2025 Subaru Share the Love event, Subaru and its retailers will make a minimum $300 donation to charity. Visit Subaru.com slash share to learn more. So we discussed kinetic attacks in the escalation ladder. So there's many ways to try and disrupt projects. You could do cyber attacks on them. You could do some gray sabotage of cutting wires. for data centers or power plants, for instance, with lower attributability. I mentioned, for instance, like sniping transformers. There's hacking to sort of poison their data or sort of make their GPUs not function as reliably, things like that to sort of slow them down. There's covert and overt ones, and there's higher in escalation landers where you threaten other things, like where you threaten things like economic sanctions, or threaten to use force or there is things or there's other forms of kinetic attacks such as airstrikes but I don't think that those are really necessary so like that would be an escalation ladder but I think if if states are on top of such as the U.S. if the U.S. is on top of this they don't need to resort to that they can do much more surgical covert or gray in terms of attributability or gray or low attributability types of actions that are less escalatory. So I think that the shorthand of airstrikes is that sort of is – I just don't see that as necessary provided that there's some preparation. Yeah, and of course, you spoke about so many strategies. So one potential strategy is the Manhattan Project and the US achieves dominance. But then in a sense, the U.S. has a target on its back, right, because they've developed this incredible capability. And what then? Would we need to move our data centers all over the place and hide them and away from population centers and so on? Yeah. I mean, I think if one has short timelines, it just isn't in the cards to do a big secretive project in a way that isn't extremely escalatory. escalatory because if you wake up China for this then I mean another thing is they would also potentially create some sort of Manhattan project and I don't know they might be better at it they would probably have better information security for it and they might have like they'd at least have a big chunk of the talent would it be the majority I don't know that's unclear but if we're requiring that they have security clearances that limits your your pool of researchers substantially. If it's researchers, then they're not being paid well, and they're also going in the middle of nowhere in some place that has a big data center with a big target over it because it's extremely visible from space. It gives off so much heat. It's not like this would be a secret. Yeah, you're putting yourself in harm's way as well. I don't know. That doesn't sound like an attractive proposition. So I think it's a bit self-defeating. And I think that in the past months i think partly as a consequence of the paper i don't think this is much of a thing in dc as a as an idea being thrown around um fortunately um uh um but uh and and people like sax and others are speaking about you know competition for marketplace or market share um u.s market share which i think is a more reasonable object to compete on and much less a stabilizing one yeah i mean another thing that struck me is um essentially right now ai is not that difficult to make right so the algorithms are just doing stochastic gradient descent and they're using transformers and data and almost anyone i mean any nation state would be able to create this capability right in one of your papers you even said i think that there is i think it was a 96 6% correlation between the amount of compute and the capabilities. So doesn't this like how the hell can you? I mean, obviously, you can like control supply chains and whatnot. But what's to stop any nation state from just building this? Yeah, so I think that the critical mass currently is on the order of if we're trying to build like a state of the art system that's on the order of like 10K GPUs. China has that. The US has that. I don't think Iran has that, for instance. We're talking cutting-edge GPUs. We're not talking iPhone GPUs or whatever. So I think you would try to have more responsible actors or states that respond to incentives better being ones with GPUs and ones that are more rogue like North Korea not getting those. So you want ones that are more deterrable. I don't think Russia has that many GPUs, for instance. I think it would be difficult for them to put together a sort of competitive project. I think also the competition is not just having the smartest model as well. There's deployment capabilities, not just model-making capabilities. So we can see that the AI model providers limit the amount of videos that you can make. This is because of compute limitations in part. or the amount of videos and images. And I think with AI agents, they'll be running around the clock if they're sufficiently useful. So right now maybe you use your AI systems for a few minutes a day, your chatbots, but then you'd be having it run constantly. So it's like, I don't know, tours, magnitude, more compute required there, and maybe the models are bigger as well. That's a lot, and then more people are wanting it too. You're needing a lot more compute. And so your deployment capabilities, how many chips are owned by, say, U.S. hyperscaler companies, Azure, AWS, et cetera, is a very relevant competitiveness variable because then are they able to serve the customers or not? China, if they have less than 100,000 GPUs, they can't really serve that many customers. So even if they can make somewhat capable models, that doesn't mean that they'll be necessarily capturing many of the economic benefits that AI may provide provided there isn't a catastrophe. So that's a different important axis for competition. I think people are thinking of the smartest thing, but I think having the smartest model is the most important thing. But for economic power, it's quite different. but um i think uh you said that when you know the real manhattan project was undertaken you know around the time of the second world war um it cost the u.s something like four percent of gdp because they had to be first right they had to control this technology and for such a kind of generational technology four percent of gdp doesn't seem that much and and of course you can explain perhaps why taiwan has such a moat around building these chips at the moment but If it is of such catastrophic importance, don't you think many nation states would be able to create this capability? Well, so it's going through TSMC or South Korea. So 90 plus percent of the value add of the compute supply chain is in the West and its allies. The only other real competitor here would be China. They're not that competitive in manufacturing these chips at the cutting edge like many of the recent chips we're using. surreptitiously through TSMC. So there wasn't good enforcement or blocking there. So it's pretty difficult to replicate that entire – what is an extraordinarily complex supply chain all domestically. So I don't think it's – I think compared to nuclear weapons, I think it's harder to make cutting-edge GPUs given a billion dollars. I mean, you can't do it with a billion dollars. If it's $10 billion, you can't do it. I mean, you could probably do a nuke with a billion dollars provided you need power in other sorts of ways, though. But I think that cutting-edge GPUs are harder to make than nukes or than it is for enriching uranium. so i think it can be more excluded compared to other types of potentially catastrophic dual use technology inputs or wmd inputs for short so um i'm i'm personally a little bit skeptical about whether we are on the path to creating super intelligence although i certainly agree that if we ever did create super intelligence well yeah i mean um everything that you've written in the paper assuming that we do create super intelligence i i think it's it's absolutely spot on the question is whether we are on the path but i i was also struck by thinking that many of the things that you've written about in the paper apply even if we don't create super intelligence i mean could you reflect on that how much of it is relevant if we don't yeah so part of the deterrence thing is you may try and deter other forms of using this. I think like competitiveness is relevant, like what are strategies for competitiveness is relevant regardless of whether superintelligence is technologically feasible soon or not. I think generally if AI is very powerful but it's not superintelligence level, you don't want random rogue actors having access to certain expert level virology capabilities, for instance, nor do you want them having much leverage by them being able to get lots of GPUs necessarily if it becomes more of an instrument of power So I think that those still hold And the current part is not even specific to superintelligence necessarily but other types of destabilizing capabilities that AIs could give rise to So you may also get deterrence later on for not using AIs for specific types of weapons research, let's say more nanomachine related, but now we're speaking much farther out. So I think that it is broader than that in much the same way that the nuclear strategy of deterrence, nonproliferation, and containment was also robust to many of the details that kept evolving. yeah the the i'd be interested in the sort of are you thinking that it'll be tough to get ai that has the cognitive abilities of a typical human you know this decade or um what's the um what makes you think it's uh you know not as feasible or why we're not on the right track or yeah yeah i mean i i feel that scaling large language models will not lead to agi i think there are quite a few things that we have cognitively that llms don't have and and a lot of that is i'm i'm an externalist i think that a lot of the effective computation doesn't happen in our brains i think it happens memetically i think it happens culturally and i don't think i mean i guess i believe in principle that we could simulate the entire thing in a computer and maybe there is like a lower resolution abstracted version which would capture enough of the dynamics to produce intelligence but roughly speaking um you know i'm skeptical about super intelligence particularly skeptical about recursive and super intelligence you know recursively improving super intelligence maybe we could touch on that because i felt that you were giving a great account of what recursive super intelligence would look like and also the the scale-out version of that so what would happen when we have a super intelligence which could be copied and multiplied a thousand times but what i didn't really get from reading it was why you believed technically in principle it was possible to have a recursively improving intelligent i think we already have the ai's helping or influencing ai development in a recursive way of partly automating some code helping design the chips helping cool the power plants helping label some of the data doing the you know the constitutional ai related type of stuff um and it's in weak ways though but the recursion thing that I think is particularly explosive is if you can close the loop by taking the human out of it and then you can go from human speed to full machine speed and you don't have that impediment anymore. I think if you assume that you have human level AI researchers or world-class AI research AIs, then you just copy-paste those. And I think that that'll be technologically feasible. That isn't to say that it's a natural implication of training the AI on more pre-training tokens and doing a loss function trick. You may need some extra algorithmic ideas. That isn't to say it wouldn't. I would still guess it's nearly entirely deep learning, though. But I think you'd need some other things taking care of, like, for instance, memory for this externalist picture. It needs memory to inherit some of that culturally computed wisdom and information. And that capability is not particularly developed in my view. So I think we need that. Maybe we'll need some other sorts of things before it has at least the cognitive abilities of a typical human and then after that you're needing it to be pretty smart you're needing high fluid intelligence you'll need to be crushing you know those those arc questions as an example um and you'll need some other sorts of things for it to be a sort of human level ai researcher but i think that's feasible uh there's certainly a question of when i do i do think there's some bottlenecks that would need to be resolved for getting there and it does it's not the algorithmic ideas today plus bigger computer is sufficient but um would you accept though that there is just an epic pyrotechnic orchestra of computation in the universe i'm not a pan computationalist so i don't think the universe is digital and made out of computation so i'm saying that we we could imagine some kind of effective computation that simulated the processes that happened in the universe and maybe that would be um equivalent but but do you do you agree at least in principle that the amount of you know computation we could build on planet earth would only ever be a sliver of what goes on in the universe i think there might be physics reasons for believing that generally um like if you're simulating something versus if the computation is happening raw you know you'll have less that you can simulate but uh um i i i would guess that a lot of the computation is more social though and less dependent on some of the underlying physics is more like humans speaking with each other and trying to engineer something and then you know see what works see what doesn't and that that certainly requires you know real world feedback which should at some point bottom out in in actual physics but yeah I agree that a lot of the information is collectively developed. And given how much computation goes in that optimization process, or I should say evolutionary process, yeah, you'll need the AIs to be a good receptacle of that to keep absorbing that, as well as if AIs are to do anything similar, then they need multi-agent infrastructure, which is one thing we speak about briefly the superintelligence strategy of like, what does that look like? What are some of the reputational mechanisms? What are ways that they can establish trust in their communication so that they can, so that humans can trust them and so that they can coordinate with each other as well. But I think that'll be essential. But there's a concern that – but I think that wouldn't be – I don't view that as a substantial obstacle. It feels like programming and like having hubs. AI is having a social media site, for instance. Things like that would take care of a lot of it. Yeah. Yeah. I mean, I guess another source of my skepticism, I'm hugely inspired by Kenneth Stanley, and he's a big open-ended researcher. And he had this paper out talking all about what he called fractured entangled representation. So when you dig into the representations of neural network models, they don't really factorize the world in a parsimonious way the way we do. And maybe I'm being anthropocentric here. Maybe I'm kind of hanging too much weight on the way we think about things. our brain is made out of spaghetti basically so you know maybe it's a bit of an illusion that we have these kind of factored representations but but certainly i think from an agency point of view that this is important right because um right now we talk about agents as being um llms wired in an autonomous loop that can use tools and to me agency is more than autonomy going in a predefined direction it's the ability to set your own direction and what happens now when we build these you know quote-unquote agentic AIs is that they don't do any you know certainly when they set their own direction they don't do anything particularly valuable and they require supervision constantly certainly in terms of setting a new direction and that makes me think of AIs as a kind of cultural technology a bit like Photoshop or something like that so you know a very creative graphic designer could use Photoshop and make beautiful images whereas like a complete noob using Photoshop it just wouldn't you know that they would just reuse the same effects and they wouldn't create very beautiful images and and and in a sense like ai sans humans is kind of like that i think because it doesn't have these very deep factored you know representations of of the world would you kind of agree with that uh i think for the current technology yes um uh they have all those sorts of limitations but um uh um for it being a gentle like i think it will need to get better at planning and maintaining state across long periods of time. And then it can pursue some of these sub goals in this sort of in a vague, open ended, underspecified goal and have that add up to something. But I think retrieving these sorts of memories of what worked, what didn't, and storing those is I think a substantial chunk, if not most of what's missing on that agent picture. I think you could certainly give them an underspecified goal, but I just don't think they could pursue that terribly coherently or like learn from experiments because they just have a big short-term, they just have a short-term memory of like maybe a million tokens and then they'll just like keep, they'll kind of summarize this up, But they'll start tripping over themselves in their context window because they can't maintain all that in its short-term memory. Yes. I mean it's another one of those things where philosophically I agree with you. If such a recursively improving intelligence existed, I mean God knows just to control it, we would lose control because we would have to use another recursively improving superintelligence to control the other one. And then we would basically just be minnows in the grand scheme of things. Yeah, it's destabilizing. If they control it, if another state does this, you're in big trouble because if they control it, they can weaponize against you. And if they don't control it, which I think would be the more likely outcome because they'd be doing it under extreme time pressures, cutting a lot of corners. They wouldn't like – they would be operating with very high risk tolerance. If they were doing it very slowly, then they would probably be needing to coordinate with others or else they're not seeing an edge in doing it. So I think that would be operating with an extremely high risk tolerance if they're doing a fully automated R&D loop. So, yeah, yeah. So I think loss control risks from recursion are very high and shouldn't be pursued. And I think it's very interesting that companies talk about this sort of stuff openly. um i i i think that uh there's there's something wrong with i think the the norms for that because a lot of them also acknowledge that but yeah we don't really have a plan for how to control that and we don't really think we will um uh but that's that's the plan um uh i i think something's broken but yeah yeah i mean another thing i'm interested in is is open-ended systems in general so evolution is this fascinating open-ended system which is constantly creating new niches new problems and solutions in tandem but it seems to have converged i mean human intelligence i think has actually peaked and gone down a little bit um a corporation is a collective intelligence and that seems to have reached the limit agentic forms of ai seem to peak i mean i'm i'm really interested in open-ended algorithms like poet you know the pairwise open-ended trailblazer or even sakana ai they did this thing on arc last week and they um just basically created this monte carlo tree search type thing where they found you know switching expert trajectories of different foundation models generating code and testing the code on arc challenges and the the common theme you see is convergence so in in the sakana paper after 250 calls it converged and it failed to expand and to improve its results and would you agree i mean like we can agree that there's some margin right you can improve to some margin and we don't know what that margin would be if we had like agentic super intelligence but do you think that margin would converge quite quickly or do we just don't know yeah i don't know if it would saturate necessarily i mean certainly if they're at some capability level uh this might affect the rate of improvement substantially but um uh and obviously there has to be a limit because of physics somewhere but i would imagine there's a lot of room for improving its sort of ability to handle things of more and more Kalmogorov complexity, for instance. So like some of those sequences or like in those Ravens, Resume, you can think of just as those arc type of things. Some have lower Kalmogorov complexity, some have higher Kalmogorov complexity, just a notion from theoretical computer science. And I don't think humans are anywhere near the sort of computational limit for that at all. And I would guess that there'd be – that number could keep going up. So I think fluid intelligence could get – like I think basically the IQ of the AIs, which wouldn't be all the intelligence. It would be separate from their long-term memory, from their visual processing ability, from their reaction time, et cetera. But I think their IQ could get really high and they can solve like extremely difficult mathematics problems. and solve – have really good intuitions for puzzles and not take much compute for it. I think that number could keep going up quite a bit. So like we're quite limited. Like we've got big brains, but like it's still like there's not much hardware here. And you can imagine a bigger brain that would have, I think, pretty substantial capabilities, even for the intelligence of human across generations. Like there's like Flynn effect and things like that. And that sort of has greatly increased. So our brain size is quite limited by what can go through the birth canal. And I expect the AIs wouldn't have that limit at all. They could at the very least keep getting faster and faster. You still have Moore's Law and you have GPU improvement rates, which are like 2x every three years. And you also have, like, scalability. You also have, like, better ability to transfer state than it is with humans because a lot of this digital computation is more precise than analog. So I think they have a variety of – I think they have – you can better transfer things across generations as well. There's this Nobel Prize winners are not able to – their sort of descendants kind of go downhill generally. And so I think that they have some pretty substantial advantages that could really compound and accumulate the number of – if you're saying that it's mostly externally computed, well, they can – humans, how many connections can people do? They can do like, I don't know, Dunbar's number, like maybe 130 or so social connections, meaningful social connections. They could do thousands, tens of thousands, millions. And this could – all these together can provide compounding effects that make them be substantially more capable. So yeah, I think that – however, there may be various points of saturation along the way. You may get a lot of the low-hanging fruit, sort of like convergent economies, like a lot of economies caught up to somewhere in the vicinity of the US. It's not like they kept going at the rate. They just copied what was lying around, and then it was harder from then on out. So you may have like getting to human level in some respects, and in some of these domains, it may be harder for it to – if you don't have a good measurement or feedback loop, it may be hard to keep getting better and better at that. but if there is if so so i i think it can be quite complicated so i'm agreeing with some points there but conceptually i i still think there's a lot of room to go up in intelligence yes yes i mean ij good as you said in your paper um said that when machines can do you know things as well as humans can then we're in big trouble but um i i think the main philosophical difference between us is you know like we agree that intelligence is about adapt adaptivity but um i i'm i'm a fan of specialized intelligence i don't think there is such a thing as generalized intelligence and certainly what you know talking about these factored ways in which we think uh we have these constraints right and they and they run deep uh you know we we see the world using symmetries and there's this big kind of phylogenetic tree of knowledge and thought which constrains how we think and surely ais would need to be constrained in a in a similar way i mean i appreciate what you're saying you could just scale this up a million times faster but maybe it would need to be scaled much more than that maybe these creative intuitions and insights we have don't come from the data they don't come from what's inside maybe they come from what's outside so i'm vaguely leaning towards this intuition that there's there's something else which is unaccounted for in in this estimation interesting um uh i mean they certainly could have a lot of sensors if we're saying that there needs to be a lot of extra variety from elsewhere and it wouldn't be what's just inside of the data center. I think they could aggregate a lot of information and soak that up and process a lot more of that. So I still think they could have some type of advantage of at least doing this a lot faster than people. But certainly if they're in a vacuum only speaking by themselves or only working by itself, for instance, not learning things, if there's too much correlation in that population, for instance, that might have the exploration budget and the, I suppose mainly the exploration budget be too low and then it wouldn't have a, there wouldn't be sufficient variety. I mean, evolution generally, I mean, there's, I think it was a Fisher's Fundamental Theorem or something like that, which is, you know, the rate of adaptation is in some ways directly proportional to the amount of variation and just sort of pointing at ways in which it's lacking in variation but i think that uh some of that could be made up for um potentially um it could at least have the sensors that humans have um and more yeah another very interesting thing in your paper because when i think about ai risk in general we've thought about this a little bit on the show before is in terms of um stability and destabilization and the relationship between offense and defense and you use this term offense dominant and you are saying that a destabilizing force would be like if if the ai is offense dominant then the defensive side of the equation couldn't catch up because at the moment if you imagine our our kind of state of affairs we have a kind of nash equilibrium right where there are these countervailing factors on the on the offense and the defense side can you tell me about that yeah so i think it i think the offense defense balance varies a lot by domain potentially like um many information battles it might actually be like for instance like debates about the world might be a bit more defense dominant which would be a reason for things like free speech um um uh meanwhile other things might have more of a duality where an increase in the – so for instance, really expert level or really competent computer security teams might experience more of an offense-defense balance where something is identified, we patch the vulnerability very quickly, and so the attackers keep up with the defenders quite well. In other domains, like the software for critical infrastructure, there's more of an offense dominance or attacker's advantage because a lot of the software just doesn't get updated quickly. There's interoperability constraints. The software developer is no longer around. The software was made 30 plus years ago. Nobody even knows that it's there. It's things of that sort. There aren't, you know, specific – there aren't strong enough economic incentives for doing this. There are uptime requirements. And so software on critical infrastructure for various forms of critical infrastructure is more of a sitting duck. And there you don't experience the – a good offense-defense balance. Likewise for bioweapons, certainly we have medicine, but we don't have cures for everything. In fact, we spend a lot trying to find cures for many sorts of diseases and ways of addressing certain viruses. So it's not necessarily a case that, oh, there's a new pathogen and we'll just find a cure a day later and it will be mass proliferated across the globe and everything's taken care of. So there is a substantial delay and so it is more offense dominant there too. And there are ways you can imagine the attacker having a really substantial advantage, like it propagating throughout the society before anybody's showing symptoms and lacking various monitoring mechanisms. We're kind of sitting ducks for some of those. So it varies by domain. I think some parts of cyber, there's a good offense-defense balance. Other parts of cyber, there isn't. Bio seems pretty offense-dominant. and uh i think this affects how you want to whether you want to propagate the technology if it's if it's potentially catastrophic and offense dominant that's something you want like basically if it's a wmd um like the wmd state that's not something you want to give to everybody you don't want to give everybody a nuke to make everybody safe that's that's not how it works um uh it's it's people constraining each other's intent and some people just won't have their tend to be that constrainable. Meanwhile, other things like defenses, home security systems or fences or whatever, you'd want to propagate more. So I think that should affect the attitudes for specific types of AI capabilities as well of what's its offense dominance. Maybe that could change across time. Maybe you could improve critical infrastructure so that it has more of that offense-defense balance. And then you can propagate AIs with fewer and fewer safeguards. Say the economy gets a lot richer and then GDP or the world gets a lot richer. GDP is higher. Then states are more willing to spend money on preventive measures for bioweapons. They're willing to have more far UV things. They're willing to do more wastewater monitoring, et cetera. And that makes things more closer to there being more of a balance or at least not being as having as much of an attacker's advantage. So you may want mass proliferation of these AI capabilities a bit later once you have some of those safeguards in place. Yeah. So another thing I'm very interested in, which you've spoken about in the paper, is this concept of loss of control. And Connolly, he uses the term fog of war, you know, to talk about just as the layers upon layers of complexity build, you know, that there's this complete illegibility that builds. And in a sense, we already have that now. I mean, I do think that the AI we have today has the same problem, right? But it's not quite as extreme. So systems at Google are very complex, and that's why the market rewards Google engineers. They get paid an awful lot of money. But you're talking about what happens when this gets taken to the extreme. And you gave three examples. So you spoke about self-reinforcing dependence, irreversible entanglement, and cessation of authority. Can you explain what those are? Yeah, so this I describe at more length in the natural selection favors AIs over humans thing. But yeah, there's a way in which we are becoming more dependent on these AI systems that we're starting to cede more of our decisions and cognitive processes to them and that there will be more and more pressure to keep doing that without any clear limit. at the economic level, you versus a company that has AI systems that are just better at you than it, that one's going to win because those will be cheaper, as well as at the military level too, there is a very strong incentive to, for instance, make the drones more autonomous because they can be jammed. The signal jamming makes them a lot less effective. So make them autonomously move around. And so these pressures will keep up such that we'll just voluntarily acquiesce a lot of power in society to these sorts of systems. And if we do it at a rate or where we aren't actually in control, that could be concerning. What does control exactly look like? when is it too much? When is it too irreversible? That's a problem that we have to keep track of. There's different outcomes. One is where you've actually just sort of lost control, and you can't stop it, you can't reverse it, you can't bargain with it, or you can't substantially steer it, and your fitness as a species is just collapsing as a consequence, or your livelihood evaporates. Or there's a different outcome where you sort of are like a retiree having a big retirement fund and the AIs are working for you and you actually – they are doing your bidding even though you aren't doing nearly all of the critical decisions. So those are different outcomes for it. It could be insidious or subtle as to whether we actually are having some of that counterfactual control versus whether we wind up in an unfortunate situation. So I think at least one thing that can help with this would be if AIs are better at forecasting or foreseeing outcomes of consequences. If we train them to do that, that could help make us more prescient and avoid some of these outcomes of extreme dependence and an erosion of control. But that wouldn't be sufficient for it, but that would be helpful for seeing farther and knowing what we're getting ourselves into. Wonderful. Well, Dan, this has been an absolute pleasure and an honor having you on the show. Thank you so much for joining us today. yeah thank you for having me this is fun

Share on XShare on LinkedIn

Related Episodes

Comments
?

No comments yet

Be the first to comment

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies