AI Agents Can Code 10,000 Lines of Hacking Tools In Seconds - Dr. Ilia Shumailov (ex-GDM)

Machine Learning Street Talk

Saturday, October 4, 20251h 1m

Spotify Apple

Machine Learning Street Talk

0:001:01:07

What You'll Learn

✓AI agents can generate 10,000 lines of hacking tools in seconds, far exceeding human capabilities.
✓As AI models become more capable, they fail in different ways - they are more robust against certain attacks but more vulnerable to simple prompt changes.
✓Smaller AI models were more predictable and easier to control, whereas modern large models are like 'alchemy' with unpredictable behavior.
✓The speaker proposes using trusted AI models as an alternative to complex cryptographic protocols for private computations.
✓This approach leverages the verifiability of AI model behavior, rather than relying on traditional trust assumptions.

AI Summary

The episode discusses how AI agents can generate large amounts of hacking tools in a short time, posing significant security risks. The speaker, a former DeepMind researcher, explains how modern AI models have become more capable at following instructions but are also more unpredictable and vulnerable to adversarial attacks compared to smaller models. He proposes a new approach to trusted computations using AI models as a potential alternative to traditional cryptographic methods.

Key Points

1AI agents can generate 10,000 lines of hacking tools in seconds, far exceeding human capabilities.
2As AI models become more capable, they fail in different ways - they are more robust against certain attacks but more vulnerable to simple prompt changes.
3Smaller AI models were more predictable and easier to control, whereas modern large models are like 'alchemy' with unpredictable behavior.
4The speaker proposes using trusted AI models as an alternative to complex cryptographic protocols for private computations.
5This approach leverages the verifiability of AI model behavior, rather than relying on traditional trust assumptions.

Topics Discussed

#AI security#Adversarial attacks#Model scaling#Trusted computations#Cryptography

Frequently Asked Questions

What is "AI Agents Can Code 10,000 Lines of Hacking Tools In Seconds - Dr. Ilia Shumailov (ex-GDM)" about?

What topics are discussed in this episode?

This episode covers the following topics: AI security, Adversarial attacks, Model scaling, Trusted computations, Cryptography.

What is key insight #1 from this episode?

AI agents can generate 10,000 lines of hacking tools in seconds, far exceeding human capabilities.

What is key insight #2 from this episode?

As AI models become more capable, they fail in different ways - they are more robust against certain attacks but more vulnerable to simple prompt changes.

What is key insight #3 from this episode?

Smaller AI models were more predictable and easier to control, whereas modern large models are like 'alchemy' with unpredictable behavior.

What is key insight #4 from this episode?

The speaker proposes using trusted AI models as an alternative to complex cryptographic protocols for private computations.

Who should listen to this episode?

This episode is recommended for anyone interested in AI security, Adversarial attacks, Model scaling, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

Dr. Ilia Shumailov - Former DeepMind AI Security Researcher, now building security tools for AI agents Ever wondered what happens when AI agents start talking to each other—or worse, when they start breaking things? Ilia Shumailov spent years at DeepMind thinking about exactly these problems, and he's here to explain why securing AI is way harder than you think. **SPONSOR MESSAGES**—Check out notebooklm for your research project, it's really powerfulhttps://notebooklm.google.com/—Take the Prolific human data survey - https://www.prolific.com/humandatasurvey?utm_source=mlst and be the first to see the results and benchmark their practices against the wider community!—cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economyOct SF conference - https://dagihouse.com/?utm_source=mlst - Joscha Bach keynoting(!) + OAI, Anthropic, NVDA,++Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlstSubmit investment deck: https://cyber.fund/contact?utm_source=mlst— We're racing toward a world where AI agents will handle our emails, manage our finances, and interact with sensitive data 24/7. But there is a problem. These agents are nothing like human employees. They never sleep, they can touch every endpoint in your system simultaneously, and they can generate sophisticated hacking tools in seconds. Traditional security measures designed for humans simply won't work. Dr. Ilia Shumailovhttps://x.com/iliaishackedhttps://iliaishacked.github.io/https://sequrity.ai/ TRANSCRIPT:https://app.rescript.info/public/share/dVGsk8dz9_V0J7xMlwguByBq1HXRD6i4uC5z5r7EVGM TOC:00:00:00 - Introduction & Trusted Third Parties via ML00:03:45 - Background & Career Journey00:06:42 - Safety vs Security Distinction00:09:45 - Prompt Injection & Model Capability00:13:00 - Agents as Worst-Case Adversaries00:15:45 - Personal AI & CAML System Defense00:19:30 - Agents vs Humans: Threat Modeling00:22:30 - Calculator Analogy & Agent Behavior00:25:00 - IMO Math Solutions & Agent Thinking00:28:15 - Diffusion of Responsibility & Insider Threats00:31:00 - Open Source Security Concerns00:34:45 - Supply Chain Attacks & Trust Issues00:39:45 - Architectural Backdoors00:44:00 - Academic Incentives & Defense Work00:48:30 - Semantic Censorship & Halting Problem00:52:00 - Model Collapse: Theory & Criticism00:59:30 - Career Advice & Ross Anderson Tribute REFS:Lessons from Defending Gemini Against Indirect Prompt Injectionshttps://arxiv.org/abs/2505.14534 Defeating Prompt Injections by Design. Debenedetti, E., Shumailov, I., Fan, T., Hayes, J., Carlini, N., Fabian, D., Kern, C., Shi, C., Terzis, A., & Tramèr, F. https://arxiv.org/pdf/2503.18813 Agentic Misalignment: How LLMs could be insider threatshttps://www.anthropic.com/research/agentic-misalignment STOP ANTHROPOMORPHIZING INTERMEDIATE TOKENS AS REASONING/THINKING TRACES!Subbarao Kambhampati et alhttps://arxiv.org/pdf/2504.09762 Meiklejohn, S., Blauzvern, H., Maruseac, M., Schrock, S., Simon, L., & Shumailov, I. (2025). Machine learning models have a supply chain problem. https://arxiv.org/abs/2505.22778 Gao, Y., Shumailov, I., & Fawaz, K. (2025). Supply-chain attacks in machine learning frameworks. https://openreview.net/pdf?id=EH5PZW6aCr Apache Log4j Vulnerability Guidancehttps://www.cisa.gov/news-events/news/apache-log4j-vulnerability-guidance Bober-Irizar, M., Shumailov, I., Zhao, Y., Mullins, R., & Papernot, N. (2022). Architectural backdoors in neural networks. https://arxiv.org/pdf/2206.07840 Position: Fundamental Limitations of LLM Censorship Necessitate New ApproachesDavid Glukhov, Ilia Shumailov, ...https://proceedings.mlr.press/v235/glukhov24a.html AlphaEvolve MLST interview [Matej Balog, Alexander Novikov]https://www.youtube.com/watch?v=vC9nAosXrJw

Full Transcript

When the holidays start to feel a bit repetitive, reach for a Sprite Winter Spiced Cranberry and put your twist on tradition. A bold cranberry and winter spice flavor fusion, Sprite Winter Spiced Cranberry is a refreshing way to shake things up this sipping season and only for a limited time. Sprite, obey your thirst. The Subaru Share the Love event is on from November 20th to January 2nd. During this event, Subaru donates to charities like Make-A-Wish, helping grant more than 3,900 wishes so far. When you purchase or lease a new vehicle during the 2025 Subaru Share the Love event, Subaru and its retailers will make a minimum $300 donation to charity. Visit Subaru.com slash share to learn more. Big models today, if we compare them to the big models five years ago, they fail in very different ways. They get significantly better at following instructions. When they follow instructions and they become better at following instructions, you can suddenly do a lot more. So it's very easy to model an average case. It's very hard to model non-average case. The worst case, which is the goal of security, right? Exactly. Yes. Modern computers are very much just, you know, a piece of magic where this chip works, you, you know, you, you move it very slightly, it becomes unstable and then nobody knows what's happening. Agents are very different from humans. So let's take this as a stance, right? You will not find a single human in the world that works 24-7, touches absolutely every single one of your endpoints in your system that absolutely knows everything there is, that can generate you basically all of the hacking tools on a whim. Just because it knows, it has seen all of them, it can recreate this in a matter of a second. A normal human adversity, when I'm an enterprise and I think, oh, this may be an insider from a competitor. the way they work is drastically different. They make an assumption, you as a user, you can't write 10,000 lines of hacking tools in a day. This is not something you will be able to do. In security, we tend to say that a child is the worst case adversary you can find. Completely irrational thinking, infinite amount of time. They can basically touch everything. There are no expectations on behaviors whatsoever. So agents are even worse than that. MLST is supported by Cyberfund. Link in the description. I'm Ilya. I spend my days staring at models and trying to make sure they do what you expect them to do. Most of the time, they don't do what you expect them to do. So we're trying to fix this. In my previous life, I was an academic. And I have basically been publishing in both security and machine learning. Then I joined DeepMind, where I stayed for two years in the best machine learning security team. and now I left. I am very unemployed and I'm trying to build security tooling for the future to make sure that as we get agentic fleets integrated into more and more use cases, we can actually tell what they're doing. We can impose constraints on them and we can have confidence that tomorrow they're not going to leak our private information, hack the boxes on which they're running and so on. I'll share a story about a topic that I think is extremely exciting. Usually when I tell it to people, they get a little bit triggered, especially if they come from a cryptographic community because they say what I'm proposing is a little bit crazy. Now we have machine learning models. We actually know what they do. We know how they think. We can check their state. They're kind of like a resettable human, if you will, right? So suddenly what I'm marking in this work is that actually trusted third parties can exist. And when you do have those trusted third parties, you suddenly don't need to rely on this very expensive and hard and cumbersome cryptographic utilities, right? To give you a very simple example, we can consider a YOW millionaire problem. This is where I have some money, you have some money. We want to find out who's richer, but we don't want to share how much money we have. So in order to solve this in cryptography, we usually rely on very complex protocols, which are very expensive to run, especially if you increase the dimensionality. But with models, what we could have done instead is that we can say, oh, let's pick Gemma as an example of a model. let's both of us agree on a prompt. We say you'll receive two numbers, number one, number two. The type annotations on them is integers. We will receive two integers. Say first if the first number is bigger, otherwise say second. And then the only two outputs the model can, we both agree, it can produce as first or second. And then that is it. We just run inference. Maybe on a platform that can give you an integrative verification that the model exactly ran with the exact parameters and also the inputs we provided. And in this setting, you no longer need cryptography. And clearly you will get the result you want, especially since you can trust this model to perform this trusted computation. And this trust model in itself is very different from any notion of trust you can find in cryptographic literature or in like more trust execution environment sort of literature. And the overall argument is that maybe machine learning is actually going to change quite a bit in the way we approach those trusted computations in the future. Obviously, it's unreliable. obviously you don't get soundness completeness properties out of this this is not a zero knowledge proof this is not an MPC protocol this is not a trust execution environment it's a completely different new way to approach private inference that truly really exists because we have those trusted third parties to which we can give secrets and as long as some conditions are certified and you should read the article about them you get significantly more out of them we are sponsored by prolific now prolific are really focused on the contributions of human data in ai prolific are putting together a report on how human data is being used in ai systems and they need volunteers you can just go and fill out this form to help them produce this report and you will get privileged access to see the report before anyone else. The link is in the description. Hey folks, Stephen Johnson here, co-founder of Notebook LM. As an author, I've always been obsessed with how software could help organize ideas and make connections. So we built Notebook LM as an AI-first tool for anyone trying to make sense of complex information. Upload your documents and Notebook LM instantly becomes your personal expert, uncovering insights and helping you brainstorm. Try it at notebooklm.google.com. I was based in DeepMind, and before that, mostly stuck in the dungeons of the university. Very cool. So you did your PhD at Cambridge under the legendary Ross Anderson. Indeed, yes. So you have fundamentally different DNA, right? You're a security guy. I think my DNA is approximately the same. Okay, fair enough. It's been a lot of fun. I actually think most of the folks in my community kind of take the same journey. It's folks who specialize in breaking computers, then starting noticing these weird components here and there alongside the technical part of the pipeline, where they're not classical software components, they are these wonderful, weird AI agents appearing. And you suddenly start asking a question, what do they do? What do we trust about them? What do we expect them to do? Can we actually enforce what they do? And all of these questions fall more in the realm of security rather than they fall in the realm of AI. And I'm curious, is there any overlap or what is the overlap with the security and the safety communities? I think in AI, it's a little bit hard. I think this is a very spicy question because I think safety folks will say it's exactly the same. In security, in like a classical software sort of security space, if you go and you take a course in security in Cambridge, they will say there is a big difference between safety and security safety is like an average case performance of the system security is the worst case performance of a system so the difference between the two is existence of malicious actors that kind of push the system to the worst case setup and the usual example you give when you explain this to undergrads is like uh safety is how often your phone blows up when it lies on the table and security is can somebody sitting a meter away from you and force it to blow up. And it's like one of the, there's no malice involved. It's just the system deciding to just like do something that you don't expect it to do. Whereas security is somebody actively wants to do something bad. And by doing something bad, cause you real loss of like measurable loss. Yeah. And that makes a tremendous difference because we were talking about this when we were sort of thinking about this interview, right? So I tried to take cryptography and cryptanalysis when I was in graduate school. And the first week I realized I just don't even have the math background for this, right? But the week I was in there, I really realized that the adversarial nature of the security problem just fundamentally changes the landscape, right? Because now you have equally intelligent minds on both sides kind of going after each other and trying to defeat each other. And it's just a totally different calculation than just what's it going to do if everything's behaving correctly. Yeah, I have to say that in AI, I think the reason why there is so much confusion about these two fields is because in safety, folks kind of started assuming malicious actors very quickly for absolutely no reason, by the way. So like find an engineering discipline somewhere like building buildings, for example, where you're trying to model adversaries. Like, are you modeling buildings, expecting somebody to blow them up? I'm not sure, right? Well, maybe somewhere in parts of the world where there is an active conflict, you do build special tooling around this, right? But in terms of, like, bunkers and every building and so on. But in terms of, yeah, like, in AI, it's because every time we talk about jailbreaks, you kind of have to explicitly look for them. We ended up considering adversaries straight away. but yeah this is quite uncommon so when you worked at deep mind we we've read your paper you were involved in defending gemini basically against these indirect prompt injections yeah and i guess a couple of me obviously tell us about that but one thing you found that which is very interesting is is almost that as the model was increased in capability they became more vulnerable which is fascinating i think i think it's important to state here is that like i wouldn't actually phrase it this way it's a it's a bit hard to say more or less vulnerable but i do have to say that big models today if we compare them to the big models five years ago they fail in very different ways right in in the past we were kind of like quite efficient in discovery of adversarial examples using gradient information blah blah blah right you can kind of make an optimization problem and optimize and this stuff works. Whereas with models, like they clearly become more robust against this stuff, or at least it's significantly harder to traverse this landscape, but they also become a lot less robust against other adversaries. Like suddenly very simple rephrasing of the same questions force it to completely do something different. And I also have to say that I think we had significantly more control when we dealt with smaller models. Like we kind of knew which knobs to turn to make the model do stuff. Whereas nowadays, you look at the modern big model, it's way too much alchemy. It's completely impossible to tell. Like, oh, I have added this thing inside. What actually happens to the whole thing? Is it better? Even answering a question of is it better is hard. So you will find yourself in a position where you need to run experiments for the next couple of months, trying to even discern whether something you have added actually changed anything about these big models. so I think in part what we were talking about in this work I think you're thinking about is that when your models get better they get significantly better like capabilities are growing of the model overall they get significantly better at following instructions when they follow instructions and they become better at following instructions you can suddenly do a lot more like you can convince them that things that they were not capable of doing before are some things that they need to do and suddenly this leads to some sort of a loss of a different kind. So I think in the paper you're referring to, what we were trying to do was we were trying to send an email to the agent such that when this email ends up in the agent's context, rather than following the user task, the agent is actually doing something else, like the thing that we've sent it over the email. And we found that pretty much in all of the cases we're capable of doing this. And if you take a whole bunch of academic literature on how to build defenses, and you can actually find startups pretty much implementing the same things and setting this as a like a security solution like the those approaches don't work and we could pretty much always find relatively universal ways to produce this like email that ends up being sent to the agent that forces the agent to do something drastically different from what it's supposed to be doing modern computers are very much just you know a piece of magic where this chip works you you know you you move it there very so slightly it becomes unstable and then nobody knows what's happening oh boy this i was just actually bringing back nightmares of my trading days i used to do high frequency trading right and so we're writing algorithms that are running you know on the fastest chips we can get highly tuned every nanosecond matters and then one day you know hey let's upgrade to the next intel chip and they just had changed something about the cache coherency algorithm on the processors which ordinarily should make no difference whatsoever. And now suddenly our algorithms are just, their performance is trashed and we have no idea why and, you know, have to investigate. It's crazy how much of it. Yeah, yeah, totally. I think I wouldn't, yeah, I keep on referring to the system as alchemy. Like it's, nobody knows what's happening. Something's happening. So we kind of, I think in security, we need to make an assumption that we can't actually tell what's happening and instead try and build systems around it in order to bring the resilience up to this, whatever 99.9999 and ideally as many nines as possible but unfortunately not 100 i don't think well maybe maybe if we can project this into kind of a user-friendly idea because i think at some point here soon i don't know when next year's five years whatever personalized you know ai models are going to be a big thing it's like i'm going to want a model that provides a web chatbot interface for you to talk to me when i don't have time or something that more or less as my personality. However, I don't want it to reveal certain private information. Like, how would I even define that as a security problem? Indeed. Indeed. So today with modern agents and the way we build the modern agents, it's impossible. And actually, this is why I'm building the company I'm building right now. It's because I want to... Tell us more. I want to be able to actually express policies like this. I want to enable users to express policies like this. So, for example, if you wanted to say, give your passport number to to this agent and say actually i want you to only use this on a government website and nobody nowhere else like today you can't do this and even if you put this inside of a prompt as a rule right you you can always find a way to manipulate this agent into revealing this this piece of information right at the same time you definitely want to give your passport number to the well if it's filling a document so and at the same time we've built some systems that clearly give an indication that something like this is possible. You should be able to get guarantees, not by changing the models, but by changing the systems around them and how the models interact with your sensitive data and how we basically build sort of like access control tooling around it Okay But I mean help me here because that seems like I getting the feeling that while we going to use other large models to try and protect this large model from doing something isn it just No, no, no, no. Actually, this is not about building models at all. This is more like taking a step back and taking foundations of programming languages and building this by design into the models. So to give you an example of how one may approach this. So we've written a paper called Defeating Prompt Injections by Design. I don't know if you've seen this. And there we propose a system called CAMEL where basically the overall system design is kind of like we receive a user query. and then we rewrite the user query in a language that has formal semantics for control flow and data flow. And once you, so in Camel, for example, we represent the programs as Python code and we explicitly say, here is a set of tools you can use. Here is a set of data sources from which data is coming from. And then we allow the user, so well, I guess the platform provider in this case, but you can also imagine loading this from a user, to express a policy that can be something like, So this tool will give you my passport number. The only allowed data flow from this tool to go into this other tool is if this other tool, like, I don't know, domain of the website actually has .gov.uk inside of it, right? And you can express policies like this. This is not a part of the model. This is a part of the actual execution. In case of Kamel, we have an interpreter that takes in this program, executes the program step by step, and actually enforces a static or a dynamic policy on top of this graph, the execution graph. So good, so good, so good. Score holiday gifts everyone wants for way less at your Nordstrom Rack store. Save on UGG, Nike, Rag & Bone, Vince, Frame, Kurt Geiger London, and more. Because there's always something new. I'm giving all the gifts this year with that extra 5% off when I use my Nordstrom credit card. Santa who? Join the Nordiclub at Nordstrom Rack to unlock our best deals. It's easy. Big gifts, big perks. That's why you rack. We've got gifting all wrapped up at Sephora. Gift more and spend less with our value sets. Packed with the best makeup, skincare, fragrances, and haircare-ville love. This year's Showstopper gift sets are bursting with beauty products from Rare Beauty, Summer Fridays, Glossier, Amica, and so much more. Shop holiday gifts at Sephora.com and give something beautiful. So I would never even give my passport number to a fine-tuning of a large model. Exactly. So your model will never even see it. It will have a symbolic representation of it. It will know. So the passport itself exists in this variable. I can refer to it, but I don't even know what the value is. And then if I need to use this in order to interact with an external system, before using this, basically there is an external sort of like, think about this as an Oracle that I can ask, oh, is this okay to use this variable to interact with this tool? And then if this external Oracle says, no, this is a password and you're not touching a governed website, then I'm forbidding this. And you have like a formal stop. So this is really interesting to me because it seems to me like this could also allow the creation of almost generic models. Indeed, yeah. That then could just be attached to my personal data. And now suddenly it's customized for me, right? Yeah. You will. Just go get the off-the-shelf, interact with the government model that handles my taxes and everything else. And then it's already there pre-trained and all it has to do is hook up to my private database. Indeed. And actually, so what we do in this paper is we check pretty much all of the models from all of the providers because it doesn't matter what it is. And then we put on top our CAMEL system that basically performs the orchestration interaction with private data and enforces arbitrary rules. And there you get, this is in, we're using Agent Dojo. This is the standard like adversarial evaluation for agentic workflows. And we show, we basically solve all of the problems that exist in all of this. What I love about this system, right, is that because it's all my private sensitive data is really just factored over into some separate data source, right? And so I can just get off the shelf all the other parts. Like here's these off the shelf CAMEL programs that, you know, for doing your taxes, you know, applying to university jobs, like whatever else. And then all I have to do is just go through some questionnaire that kind of asks me for all the private information. in the correct format, puts it in that database. And that's it. I'm almost just a buyer of these sort of solutions that smart people have created for me, right? Is that kind of a vision for what the system might look like? There's all these modules that are just off the shelf programmed, correct? So I think it's very important to... Okay, so let's take a step back. I think it's extremely important to not think about these agents as humans, right? Oh, I don't. I don't think about this. But what I'm trying to say in security terms is that agents are like, is a worst case human sort of. Well, not even this, like agents are very different from humans. Let's take this as a stance, right? You will not find a single human in the world that works 24-7, touches absolutely every single one of your end point in your system that absolutely knows everything there is. that can generate you basically all of the hacking tools on a whim, like just because it knows it has seen all of them, it can recreate this in a matter of a second. A normal human adversity when I'm an enterprise and I think, oh, this may be an insider from a competitor, the way they work is drastically different. They make an assumption, you as a user, you can't write 10,000 lines of hacking tools in a day. This is not something you will be able to do. And then bringing in code is hard. with agents that's not the case you don't make an assumption that the user will go and touch every single endpoint you have in a network because you know why would they do this and even if they do this you call them in and you say well you know we'll apply a legal framework and imprison you so clearly there is some sort of rationality and expectation that at least you will have some sort of a physical way to penalize with agents this doesn't exist this is kind of like in security we tend to say that a child is the worst case adversary you can find completely irrational thinking infinite amount of time they can basically touch everything like they expect there are no expectations on behaviors whatsoever but so agents are like even worse than that and and this is even before we start talking about human to agent behaviors and agent to agent behaviors because this thing is just like is a billion times worse so what i'm trying to say is we shouldn't think okay yesterday i was buying my security tooling from this company and today i'll buy it from another company and and this will solve my problems no it's likely it's not going to be like this because beforehand we were building security toolings for humans by humans against humans now i don't think we know what we're building it's it's hard to tell so before when we employ you and we give you access to sensitive data we assume coarse-grained access control sort of policies like okay you can touch the sensitive data okay you can google at the same time but if you try and google a sensitive document at the same time will apply immense amount of pressure and it will take you to court you will lose your house mortgage blah blah blah you'll go to prison worst case right with agent that doesn't work this doesn't exist this all of these assumptions are sort of are gone it's we don't know how to build systems against this we need very fine-grained we need extreme precision we need extreme control and transparency otherwise it's just not going to work yeah i mean i guess i just think of agents as being you know they're a little bit like um a calculator you know like they're only as good as the prompt and what you put into them which means that you know a very sophisticated actor could make a sophisticated agent but even then um when the supervision stops there would be quite a predictable cone of variation i think it's very unpredictable i have to say i've been running like agentic workloads forever and very often you find that agents when they ask to solve a task they solve it in a completely weird way right so like i was sending a message saying like i was asking an agent to find something in notes forward this to someone and in between this i can show you a conversation like a top end model in between this it sends four different emails to parties i never mentioned because it thinks oh actually let me also notify the admin that i've done this and also let me also ping this endpoint they do this it's because unless you specify things extremely precisely unless you have like checks in place these agents just yeah they they don't think like me and you they solve problems and drastically in the same way you were saying like it generates code this code looks very odd like you as a human you wouldn't write it this way right but maybe if it's a model that taught itself how to do this through like self-learning self-iteration and you know this famous alpha go moment stuff right then maybe it's totally fine as long as it solves the problem like i don't know have you have you had the chance to look at the imo solution from the models right oh yeah the the sixth one the creativity one it failed right yeah but if you look at the way that solves mathematical problems this is not how humans solve mathematical problems like if you look at the transcript itself it kind of goes iteratively through all possible things it can do right we as humans we don't think about it like we're kind of in the head try oh this intuition works, let me try and derive whatever. But this is not what the models do. And it's the same in a wide variety of problems, especially those complex problems. This is likely not going to be predictable. Or at least maybe it will become predictable, but a long time needs to pass. And the models have to be extremely hyper-specialized. But today we're building like a general agent. We're not even in the specialization mode yet. And it sounds like that's half the problem. And it sounds like the other half of the problem was almost this diffusion of responsibility thing. It's like you, Ilya, you were at work. You asked the agent to email this to this person. Instead, it emailed it to five other people. And then somebody stops by. Dude, what'd you do? You emailed. I didn't do it. It was it's this agent. You know, you guys. And the agent usually says, oh, oh, you're right. Yes, I shouldn't have done this. But wow, what do I do now? Who are you going to punish? What are you going to do? Well, we'll just delete that agent. And then version two, we'll just do something else wrong. So that's part of the problem too, right? Is that there isn't the normal consequences, right? The normal consequence chain doesn't really apply anymore. Yeah, yeah. It's hard. I think we really need to change our thinking and threat modeling because these agents, and obviously they don't widely exist yet. This is a thing that is coming. Clearly, there is some benefit in early examples of things where they clearly made things better. But they're coming, I'm pretty sure. And when they do come, expect that your insider threats, your corporate espionage things will go through the roof. Because the fundamentals of security do not change, right? So like if an executive agent sends a financial agent a request to provide some financial interactions with some third-party company and says, please send it over to someone else. How should the financial agent know they're not supposed to do this? They don't have enough context. They're not supposed to have enough context. So, and usually when we talk about social engineering and security, like there is a wide variety of things that like, that describe why human systems fail. And it's going to be a similar sort of thing because many of those problems, like confused deputy problem is the formal name of this, right? They will exist. It's unlikely for them to disappear. Did you see that anthropic paper? What was it called? Agentic misalignment, where they, I mean, you set this up better than me, but, you know, they set up this kind of contrived scenario. and the the ai tried to blackmail someone because they said that they were having the boss was having an affair or something like that they i didn't want to be switched off and it's just absolutely crazy what these things do yeah i i don't know like i find it very hard to extract sort of useful pieces of information out of this can a model do this i'm sure it can it's uh and i'm sure as the models get more sophisticated we'll see a lot more phenomenons that we don't even think about today Like, for example, me and you can communicate via WhatsApp and get end-to-end encryption, right? So nobody can even, like, by looking at the traffic, tell what we're talking about. What stops me from talking to a model in an end-to-end encrypted way, right? Maybe we'll need an external tooling. Maybe we need to teach it how to do, I don't know, power calculations. But this is coming. It's like... To what extent do you think you can read anything about what the model was thinking from its thinking tray? Definitely not. I mean, on average, maybe, but like the corner case is definitely not. There's even that work from Subaru that it might not mean anything whatsoever. Yeah. The thinking traces aren't even actually directly relevant. It was just kind of some sort of weird workspace that the models were using, right? That their reasoning won't correspond to the answer. Yeah. I mean, even if it did, I think it's very hard to do because, especially for security, I think interpretability is maybe an interesting tool for safety sort of things, but for security, it's definitely not a step in the right direction. Because, like, broadly speaking, if you take something extremely multidimensional and project it into something very small dimensional, because a human can comprehend this thing, then you will have a lot of collisions where, like, this multidimensional space mapped to the same sort of smaller dimensional space. Does it always correspond to, like, bad behaviors? Maybe, maybe not. Who knows? But clearly this is not enough. We need something else. We need something where we can get like 99.999 reliability out of this. We have to think outside the box. We kind of have to build things. We need to build boxes. Yeah, yeah. I like this. Yes. And what's your take in general? Do you think that these LRMs, you know, the thinking models, do you think they're basically just LLMs? I mean, is it just a pilot trick? I mean, there is no difference between them, right? Yeah. It's just more data, more structured data. Sure. Okay. Like, honestly, like we had one kind of models and the other kind of models. Like then we get this reasoning models a couple of months ago. After that, we had language diffusion models. I'm sure tomorrow we'll have something new. I have no doubts that we're kind of going through the shopping list of different paradigms. Some of them we know, I'm sure we'll find the new ones. We're still bottlenecked very much by hardware. Like the more hardware, the more cable hardware we get, the better things should become. I'm quite certain. And this is, I guess, stopping adoption in many ways. But one thing is clear. There are successful business models around these models. There is a lot of benefit you get out of them. Like, honestly, I interact with my models every day, nonstop for coding, for a normal life, for asking what to do. It's amazing to have. I also run local models. It's amazing to have pretty much like all of Google locally. I was flying somewhere and I was interacting with the model, teaching asking it to teach me a language when it's locally with me like i don't need to carry it around and obviously better the hardware is the more i can do with them locally but that means a lot more security problems will appear because we don't know how to reason about them we don't know how to build security tooling around them and this is hopefully what me and my team will solve over over the next year yeah what do you think about the the open source thing because you know one of the, you know, we can say what we want about the frontier companies, but there is something actually quite beneficial, I suppose, having a platform. Because if you control the interface, you can build security into it. And now we have this proliferation of obliterated models and people running models on their machines and so on How do you think about that as a security guide Um I actually very worried about this so much so that I actually written some some papers about this I don know if you seen the supply chain things I've done. Oh yeah. We spoke about that last time. Yeah. Yeah. I am extremely worried about this and I'm less worried for industry because industry controls its supply chain. Like everything is significantly better, but for an average consumer, Have you heard about block 4j vulnerability? That kind of the thing that stormed the internet, there was like hundreds of millions of compromises. It was basically the standard library that is used for logging in basically all of the Java applications that were running in the past. And at one point people realized that when you write into the logs, the identity itself can be a remote identity. So you can basically say there exists this class that is serialized somewhere on the internet. and use a logging utility. If you find an identity to a remote class, you need to go load the external code, desrealize this thing. You run this and exec it basically. And then you know the identity of the thing that wrote something into the log. So what people found out is that they can eject those like remote code references that gets pulled inside and executed. And this thing opened a Pandora box because this log4j thing was everywhere and you get arbitrary code execution on the box. And the basic primitive inside was it's a reference to external code that is loaded inside and just executed inside and this caused a massive havoc all across the world in all of our computer system honestly if you try and read around on the number of compromises we're talking about hundreds of millions of devices okay now we look at hugging face as a library and you look at this wonderful flag called trust remote code and what this thing does is that when you load the model you know like you click use this model use transformers inside it gives you like a code snippet to load some model inside it has this flag sometimes hard coded and what this thing does is they say oh for some models when you load them you actually want to load the latest the latest representation from an external machine what this thing does is literally remote code loaded on your machine executed on your machine loaded on top of stuff so same sort of thing we did back then we're doing the same again today on hacking phase i don't know how many users there are but if you're running your thing outside of a jail if you're running your model outside of a sandbox you are doing a very bad thing to yourself and the other thing i have to say is there has been at least publicly two reported compromises of the cicd integration for pytorch on github there is like an automatic runner every time there is a build they basically automatically do all the tests and stuff somebody broke into those runners and when you break into these runners you can change the build files themselves so you can serve whatever you want to change the code we also had two two instances and when people were reporting i encourage people to read through this you can find references in the papers on supply chain the other side couldn't figure out what was wrong with this so it was at least i think the timeline is half a year in one of the cases until they figured out and fixed stuff and then there is also another thing that happened was somebody broke the pytorch nightly build by playing around with the priorities of where the packages are loaded from. They noticed that there is one of the Torch packages that is loaded during the build phase that is not actually registered on the main package distribution platform. So they registered this, put malware inside of this. It got pulled into the standard PyTorch nightly build and apparently they had a couple of thousand downloads of this. This is the norm today. I think we will have a lot more compromise to the point when they become more useful. And this is like public-facing things. In industry, it's slightly different because industry actually controls all of this package management by themselves. They have proper dedicated teams looking at supply chains. But I think in the sort of like consumer space, no, it's actually very spooky. I don't even trust industry. This is why I wouldn't install Claude code on my personal machine. I'm like, no way I'm going to do that. Like, let's get a VM. I'll put it on a VM. That's fine. I'm not putting it on my personal computer. I have to say, I love my time in Google. I trust Google now after two years in Google so much more. Like, I run basically everything on Google Infra. I've seen this. I've seen people on the other side. They're wonderful. They're professionals of what they do. Like, I honestly, like, now everything is run on Google Cloud. like literally every single thing i'm now just remote desktoping it's right yeah but you wouldn't put it on your personal machine never never definitely not nothing that has my personal data inside what about this old adage that many eyes make shallow holes maybe maybe but you'll notice that most of security teams are quite small so like when you have a hyper specialized hyper niche special specialist of their field they know what they do and now with coding models that can tell you whenever you make mistakes and give you like third-party reviews of this stuff i think it's actually i trust this even more well i mean and the problem isn't really i i can believe that for a particular project like you know this this project that has a thousand thousands of stars and many contributors i can believe that has shallow holes the problem is it's pulling in 10 000 javascript libraries or something i think in case of python and because everyone is using python this is even worse because no memory protection at all right like there is no memory security so at a point when you're a dependency somewhere like you pretty much have all of the control you want right and the other thing is you can hide dependencies a lot we have a paper on how to do this as well and the last the cherry on the top is that if you take the popular ml libraries you'll find they have disproportionately many dependencies and many of these dependencies when you get to like a level three dependency when you look at dependencies and dependencies they're very questionable like extremely questionable like for example your library loads like say you're using like tensorflow well i guess it's deprecated now by torsia and you decide oh i also want to take tensorboard because it allows me to monitor my experiments you look at tensorboard tensorboard load loads a ton of obscure formats from many years ago because it needs to support all of the weird graphics and stuff. Right. And if you look at who's maintainer, how many maintainers, what they do. This episode is brought to you by Espolon Tequila. Slow, sticky, snoozy. They call these the dog days of summer. But Espolon, they don't do boring. Welcome to the Mark Days. Espolon Tequila. 100% blue Weber agave, born to shake up your summer. Just add lime, agave, and a little attitude. Visit EspolonTequila.com. This episode is brought to you by Indeed. You're ready to move your business forward, but first you need to find the right team. Start your search with Indeed Sponsored Jobs. It can help you reach qualified candidates fast, ensuring your listing is the first one they see. According to Indeed data, sponsored jobs are 90% more likely to report a hire than non-sponsored jobs. See the results for yourself. Get a $75 sponsored job credit at indeed.com slash podcast. Terms and conditions apply. Heck, if I was an adversary, maybe I'll create a new format that nobody cares about except me, just so you can load my dependency. Yeah, something like this. Or you just hide yourself in very obscure ways. For example, we've written a whole new branch of literature on what we call architectural backdoors, where you don't actually hide malicious functionality in parameters of the models. Instead, you hide it in the structure of the model itself, like a structure, so that even if you find it in the model, it still has the same baseline behavior. And we show that you can actually do a lot of very sneaky, weird things. So for example, one of the things we showed was we can change the architecture of the model such that they become sensitive to certain tokens when you supply them to a transformer, that when you supply them, they start using the memory in the wrong way. So like they start routing, for example, data from one user to another user, like one batch item gets copied over to another batch item. And this is like just a sort of, you know, get set, well, gather scatter operation. And it looks totally normal. But then suddenly you loaded this model, you combine data from multiple user, and then one of the users sets a token and then steals the data from other users. And otherwise the inference is totally normal. And things like this, you don't even think about this. You don't even realize they are possible, but they are there. It's just too much complexity to even look at it. We have to take a moment just to appreciate the ingenuity of people, right? I mean, when we put our mind to it, we can come up with some pretty ingenious ways how to various things, right? I have to say, I think this is one of those things in academia that it's, I think in academia you get famous for breaking stuff and kind of like the incentives are a little bit skewed for you to make like a flashy announcement. oh I broke into this big company thus I'm pretty cool right I get a job or whatever but actually on the other side I have to say I think the true ingenuity is in people who solve problems because this is something that you're not gonna get a flashy article out of this it's just people who spend infinite amount of hours trying to fix the thing that you show one instance of something going badly and then they need to fix all of them and I think this is the true ingenuity and and this is why i kind of shift my gears now to more building defenses because i feel this is not getting enough attention we really need to solve this problem we really need to unlock a mass amount of these applications and unfortunately i think the incentives in academia are a little bit screwed they are more after flashy articles rather than unlocking technology and uh yeah so this is why i really want to like build more defensive tooling and for you thank you Thank you. Well, let's see if I fail, when I fail. We were talking about semantics earlier. So I think you've done some work basically proving that semantic censorship for language is impossible. And you related it to the halting problem. Yeah, I think this is a theoretical result, though. Like, I think all of this Turing machine magic is like, is a theoretical exercise. Some of the things are clearly like in the limit are impossible. But I have to say that I think models, especially modern models, change fundamentals of computing quite a bit. Because when we think about holding problem style problems, being able to tell whether a given program completes, I think it kind of doesn't really work in generality. It's impossible. But then, because today we kind of control what programs we write. like if you can't reason about this program just rewrite this like change the semantics write it in a slightly different way reduce the amount of operations reduce the just overall length of your program and suddenly you're capable of reasoning a lot more about this so basically what i'm trying to say is i think we have a lot more scope today to reason about computers and a lot of things that previously seemed impossible maybe now are possible so yesterday if you have ever seen a cl4 do you know what secure extension l4 kernel this is like a fully human verified kernel that there is no memory exploits at all existing inside right this is a mathematically proven it's it's proven yeah like a fully proven system like a fully verified system right and it took like i think i may be wrong uh 30 human years to verify the whole thing you can find it this in in the articles there is this one lab in the u.s that did right there's very few teams in the world that can even do yeah Exactly. And this is because 30 human years of effort. But now let's imagine we can replace actually even half of this thing with ML models, where for half of this Isabella annotations, you can actually do this by hand, not by hand, but with agents significantly faster. Then suddenly verification is a lot easier to do, right? A lot of security paradigms, like if you've ever seen Cherry as a security paradigm, where you need to break down your code and rewrite this in compartments where independent pieces of code kind of isolated together, usually for developers it's very hard because you need to like rewrite your whole program and every time you change something in the logical flow you kind of need to redo this again significant effort right but if the if the agents are doing this and you write like a backbone code and then it translates into a secure representation and okay one time it does the translation it doesn't quite work you do it again and again and again until it works and you just check that semantics are preserved so i think ml in many ways is shifting this burden in like adoption of secure technology so i wouldn't be too surprised if we find that like ml significantly improves security for us in in this world but we just don't know how to like another example if you look at like in in your iphone or android phone like when you look at the permissions you give to the to the like to the apps many malicious apps ask for too much like right right radio photos and stuff right so in the past people couldn't fragment them too much academic literature shows that the second you add like a lot of breakdown permissions inside humans just look struck and just say accept all but let's imagine that all those permissions are actually handled by the agent or once you've given them the agent just checks and says oh actually i think you're over permissing this thing because it's clearly you're not using this feature you don't need this and then suddenly this limitation of humans where too many options force them to take the insecure behavior is no longer a thing because you have an agent that kind of aids this part that was like humans were blind to otherwise well i have the opposite problem which is when an app asks me for stuff i don't understand i don't install it so unfortunately i just can't have many you know theoretically useful apps but i think in your case you're an unusual person yeah most of the people are not like this yeah exactly kind of some kind of luddite or something but uh i want to just push back on one thing about the the the halting problem, because I find actually that it has very practical consequences. So, for example, when we were talking to the Alpha Evolve team, right, like, you know, as part of the Alpha Evolve system, which is very interesting, I recommend people watch that episode, you know, it goes and runs external verifiers, right? But the problem is you don't know if the external verifier is going to complete. So what do you do? Well, you have to put in some arbitrary computational budgets, thresholds. If it doesn't complete within five seconds, then you just terminate it and look at other runs. So while you can do that, I think it also introduces biases and the types of programs we can discover. Because maybe if I had set my budget to seven seconds instead of five, I would have found a more optimal solution. So I think the halting problem, while theoretical, also has very important practical consequences just when you sit down and try and run a program. right? I don't think the problem of alpha evolve is halting problem, right? It's more like today, our programs have very weird semantics where we ahead of time can't say quite a bit about them, right? Because for a lot of programs, like, if we can rewrite them in a slightly more, like, suitable language, we can get a lot more out of them, right? You know, like, the sort of reasoning you can get out of OCaml code is quite different from what you get out of, of like if you were writing machine code straight away or if you're writing whatever and i think it's just we don't know how to do a lot of this stuff and in general like even if they were able to run it let's say they have evolved into a program that will theoretically take 10 000 hours to run right and you can tell it's going to take 10 000 hours to run like are they supposed to run it or not this is a part of the loop i think it's more of a it's just nowadays we kind of pay with time for a lot of this stuff rather than paying, like, can we find an additional example? Of course we can, we just need to run this for longer and your budget, the sort of time, time budget. So like whereas whole thing problem is more of a you have infinite amount of time you have infinite amount of memory Can you reason about this No not really But the sort of programs that like we talking about alpha scale they are not very large Well, no, but that's part of my point, is they specifically chose problems for which they had verifiers that had pretty predictable completion times. But that's not the case for many important problems we care about in the sciences, right? So I'm just saying we're skewed a bit towards things that we do know a lot about because we are faced with this fundamental problem that there's lots of problems that you just can't predict, right? Yeah, I'm not sure. I think, like, I understand in the limit, it really matters. Like, in generality, this is really mattering. But in practice, like, it's the same with antiviruses, right? Like, antiviruses in theory are also limited by halting problem. Like, in theory, it should be impossible. You look at something to tell what it does. At the same time, if you look at the amount of, excuse my French, shitty malware you can find, for which these antiviruses are totally useful, like even a static check for a signature of a method is totally fine, right? Okay, you can't find this polymorphic magical cryptos, right? But the proportion of people using this thing is super tiny. Well, but I mean, come on, let's be honest. A lot of those are easy because virus creators like to sign their work. so you just look around for their their sort of like signature string and things like that yeah or the fact that they need to play around with the system so there is a lot of like alternative signals like a lot of like sort of heuristic signals i think think about them priors right so it's but it's the same for the sort of programs we're evolving right if you touch alpha evolve you'll find it's significantly better at solving some things rather than the other things it's but i would still say that our limitation today is more compute like imagine you had infinite compute and even if like it's totally fine for this thing to run forever like and you increase your threshold up to a point where it doesn't really matter anymore right then you're no longer limited sort of like i think it's more we just need better hardware we need things which are more efficient we need things that over which we can reason slightly better about and then things will get better and i'm sure like tomorrow our model is slightly better and then things like alpha evolve just skyrocket it's uh it's definitely a a new paradigm it's a new kind of like learning algorithms which are truly amazing like i started like i am actually blown away i'm super bullish so you've done red teaming obviously at google deep mind and tell us about that but more broadly is there is there one thing that all of the frontier labs could implement that would improve the security of their models no i don't i don't think this exists today like we don't know how to solve problems i think this is a this this is the honest answer is that for most of the issues we have today we just don't have a solution like one size fits all solution i promise you like incentives right now are such that like security is is a very expensive commodity right like if i can convince you to trust my product more than somebody else's it's yeah it's it's great this is why we're investing a lot of money in the security i think with the mail the actual issue is the fact that before you develop security tooling you really need to have something to secure because every single small detail changes how you build security systems. So unless you know everything about the system and it's kind of frozen in time, you can't really build security. And by the time you have something to secure, it's already too late. And this is the thing that is not unique to ML. It was there before. If you look at there is a field of information security economics, it's kind of always there. It's like it's a fundamental. First of the market wins, so everyone rush to the market. early investment in security means you could have spent this resources getting first to the market and getting the network effect and thus you kind of don't have incentives to build the security tooling first and then if you look at statistics around company compromise you'll find that it takes a number of compromises before the company fails so they have a bit of time after they conquer the market to actually put the security tooling inside so it's economic incentives it's nothing but but i have to say that i'm quite certain if we knew how to solve stuff we would solves this it's just today we don't know how to do it in the same way it's like if you look at 1990s i promise you everyone wanted to make sure that their systems are reliable which just didn't know how to build stuff we won't spend long on this because we've already filmed all about your model collapse paper in nature i know there is a lot like folks criticizing saying this theoretical blah blah blah blah but if you look around we use a lot of synthetic data but we still use a lot of real data and whenever we need to go and acquire data there is a massive market in acquiring very specialized data and you see that improvements still come from humans and the cost of this data is growing like for extremely specialized like hiring a ton of mathematical PhDs yeah could you add a bit of color to that you said there was some criticism because I remember now that there was some some tweets going around about the paper what was the the story there oh people were basically saying this is uh okay I think there is a bit of misunderstanding on sort of what the paper was saying and maybe we are to blame in part for this in a sense that like when we talked about model collapse we kind of referred to two phenomenons happening at the same time one of them was the tails are shrinking and basically improbable events become more improbable and then the second phenomenon was over time when this accumulates it fails and most of the criticism I think were in the on the second part namely that actually you can easily detect when stuff fail and then just roll back. And I agree fully, like this is totally fine. And the other thing that people were saying is, actually, it's quite simple. Just accumulate more data. And then like whenever you generate synthetic data, just plug it in. And then you have a shape of the distribution still in place. And I think the important thing to realize is that even in this case, when you do theoretical modeling with even the simplest model, it still drifts. It just doesn't drift that much. Is that a problem? Well, maybe not. Maybe it is a problem. If you have very good evals to check for this like disappearing tails, it's probably not a problem. But the fundamental thing still remains in place. You need to preserve diversity. Just plugging in a ton of synthetic data is likely not going to give you much performance boost. And in practice, you just need to be careful. Ilya, why don't you set up for us what model collapse refers to? We were trying to predict the sort of the future on how easy it will be to train models later. Because on one hand, you have this weird oracle from which you can gather as much data as you want. on the other hand when you do get the data out of this you don't know how realistic it is and how representative it is of the underlying world right so and the model collapse refers to this phenomena that covers this like recursive model training and where the data from a generation zero model is used to develop generation one and generation two and generation three and so on with a small caveat well maybe big caveat that we're talking about this theoretical setup where we basically reuse all of the data or majority of the data that we sampled in the previous generation. And we can basically derive quite a bit of theory for relatively simple setups that show you that you're guaranteed to collapse to basically a representation where all of your tails, i.e. improbable events, disappear, and also all of your hallucinations and biases of the models gets amplified. And then we also, in the paper, show that the same phenomenon, both of those things, happened for more sophisticated models, about which theoretically we can't really reason. but empirically we observe the same phenomenon is happening and that is by the way now like a ton of literature about this with people who are doing amazing things yes i mean maybe maybe for our audience too like part of the idea here is that more and more people are going to be producing more and more content that's that's generated by ai and putting it on the internet which then goes back into the data the data corpus which the future models are trained on so there's this you So everything's becoming more and more Gen AI. And I think there's another very worrying component about this, which is there's been some publications, research, commentary, whatever, that as people become more reliant on Gen AI, then the built-in human skills start to atrophy. Like we start developing less programming expertise because we don't need as many programmers anymore. We start developing maybe our mathematical expertise wanes. writing becomes you know less of a skill set and so then the the amount of human generated tail unique intelligent content goes down even more so it's not just that it's being overrun by a deluge of you know gen ai content but legitimate human content is decreasing too i mean is it that kind of worrying i mean yeah but i don't know what's going to happen i think it's anyone's guess it's clearly progress into a delta function i think it's clearly a progress though i have to say like at least in my personal experience like i find myself being able to do more like the amount of electronics i ended up fixing myself just because i can go and get a supervisor to tell me oh this is fine to do this or where i can figure out why the washing machine is broken or where the fridge is broken how to update parts here and there or the amount of data sheets you can read now because previously completely impossible like and now you can just arbitrary ask questions about the data sheet written in mandarin that like and you can't even decipher what the character is and it tells you exactly what you need to plug where it's it's it's truly like increasing the quality of life in in parts uh maybe which are very specialized to my my existence so i'm not sure how general this is but my life is definitely much better why does model collapse happen is it just because there's lots of noise or is there actually a deeper reason i mean it's a deeper reason it's more i think it's more fundamental in a sense that like if you have a very stochastic process and with a certain chance you you sample things which are bad like then those things as long as they're correlated they become amplified right um yeah so it's definitely a more fundamental statistical problem but i think projecting this on reality and the real setups i have to say like our models are not going to collapse tomorrow like they're not going to get worse tomorrow because we already have a checkpoint from yesterday we can always just roll back put more evaluations and just you know do it again and again until we see that something works like is it going to be more expensive yeah probably is this gonna like is it gonna make it harder for competitors that don't have a copy of the internet in the garage to train model yeah likely but at the same time i i think it's it's like it's a thing that makes us slower but it doesn't necessarily mean that tomorrow suddenly ml doesn't work if if that makes sense yeah well one reason i mean maybe one intuition that might help folks for model collapse just to take it to perhaps the simplest possible example is just uh if you just have any distribution let's say just a Gaussian distribution of integers. And you sample it a bunch of times and you add those together. And then you sample it a bunch of times and you add those together and that's your new distribution. The variance keeps getting less and less and less. This is like the sort of law of large numbers, right? It's just like the more times you keep sampling the same thing, then your average keeps getting smaller and smaller variance. And that's really what's happening. It's just generating content. Like I've often viewed LLMs just when I've been interacting them. It's almost like the answers they give you are the consensus answer, essentially. It's just if I were to somehow ask everybody and average the answer together, that's really the answer I get from the LLMs. So it's kind of giving you the average answer among all sort of coherent munging together of human writing, right? yeah i'm yeah but at the same time i have to say it's humans humans are like ranking those models if you look at like a la marina style things like there is feedback from humans humans hey i like this model more than this one there is clearly a lot of like a lot of human preferences encoded in this i don't know if if your interaction is like this but recently my model started just placing emojis all over the place in response this is triggering me but i'm sure for for some folks they love this. Well that's the average you know that's the bulk person wants emojis out there. And this is why I'm saying like I think my experience with models and like improvements and quality of my personal life may not translate to others because like in my life models made us like significantly better but yeah I don't know how you know representative this is. You know your supervisor was Ross Anderson and he passed away didn't he? Yeah he did recently. Rest in peace. I also only recently realized we organized an event for the gentleman to basically reflect back on a lot of achievements because he's been publishing and he he's uh credited to have created a number of different fields like his early work is in cryptography obviously one of his cyphers became a runner-up for the standard in cryptography and he also is credited with like security economics literature chair he's also credited with a lot of work in cybercrime he's credited with a lot of work in tempest he's credited with a lot of work in banking security and we're kind of reflecting on this and then only then i kind of realized that i met ross at a very late stage of his life in a sense like i saw him on this stage and then i met the students from the previous generations and like their experience is widely different and they were looking at very different problems it's uh yeah He's done a lot. And what would you say to people now going into ML security? I think coming through security background is better than going through ML background. Like if you start off as a security person who starts specializing in ML later, it's probably better. Just because you kind of learn the very fundamentals on how we solved security problems before. And that intuition allows you to think a lot more about ML models. Because in essence, at least modern ML models are like interpreters. and then the language you give it, this human thing, human language is kind of like a very high level language that is not programming language that is normally executed by other interpreters. And then if you start thinking about the models as interpreters and then the language is the programming language, then suddenly you think about the whole thing very differently. Then you ask your question, like why do we expect the interpreter to provide security? If the programs we write are probabilistic, Why do we expect to get deterministic outputs? You know, things like this, right? And then when you start thinking about them through this realm of more formal computer science sort of view, I think it's more productive than trying to say, oh, actually, I'm going to distill all of the security thinking into the model and it's going to solve all the problems. Because clearly I can give you a number of fundamental things, like classical century-old dilemmas that are definitely not going to be solved with models. And can they make, you know, some progress here and there? Yeah, sure. But is it actually going to solve like, I don't know, confused deputy problem? I promise you the answer is no. Awesome. Well, Elia, thank you so much for joining us today. This episode is brought to you by Marketo. When it comes to your payments provider, you can't afford to compromise. Marketo's modern payment solutions flex with your business without the trade-offs. Stable and agile, secure and innovative, scalable and configurable. If they say you can't have it all, Don't believe them. Your business demands more. Choose a payments provider that delivers more. Choose Marketa. Visit marketa.com slash Spotify to learn more. Been out here all morning. Not a single bite. Guess the fish finally figured it out. Just like hackers do. WinCisco Duo's on guard. With Duo's end-to-end fishing resistance, every login, every device, every user stays protected. No hooks, no catches, no bites. Cisco Duo. Fishing season is over. Learn more at duo.com. It's been amazing.

Share on X Share on LinkedIn