Back to Podcasts
Last Week in AI

#214 - Gemini CLI, io drama, AlphaGenome, copyright rulings

Last Week in AI • Andrey Kurenkov & Jacky Liang

Friday, July 4, 20251h 33m
#214 - Gemini CLI, io drama, AlphaGenome, copyright rulings

#214 - Gemini CLI, io drama, AlphaGenome, copyright rulings

Last Week in AI

0:001:33:32

What You'll Learn

  • Google launched Gemini CLI, a command-line interface tool that leverages large language models for software engineering tasks
  • Gemini CLI is Google's answer to Anthropic's Cloud Code, with some differences in capabilities and usage models
  • The hosts discuss the rapid progress of large language models and their integration into various workflows and applications
  • Anthropic has released the ability to publish interactive web apps, or 'artifacts', that can leverage Cloud's AI capabilities
  • The episode highlights the ongoing evolution of AI agents and the increasing integration of these technologies into software engineering and application development

Episode Chapters

1

Introduction

The hosts introduce the episode and preview the topics to be discussed, including tools and apps, open AI drama, and research advancements.

2

Gemini CLI

The hosts discuss the launch of Google's Gemini CLI, a command-line interface tool that leverages large language models for software engineering tasks, and compare it to Anthropic's Cloud Code.

3

Anthropic Artifact Publishing

The hosts cover Anthropic's new feature that allows users to build interactive web apps, or 'artifacts', that can leverage Cloud's AI capabilities.

AI Summary

This episode of the Last Week in AI podcast covers a range of AI-related news and developments, including the launch of Google's Gemini CLI, a command-line interface tool that leverages large language models for software engineering tasks. The hosts discuss the capabilities and limitations of Gemini CLI compared to Anthropic's Cloud Code, as well as the broader trend of AI agents being integrated into various workflows and applications. The episode also touches on Anthropic's new artifact publishing feature, which allows users to build interactive web apps that leverage Cloud's AI capabilities.

Key Points

  • 1Google launched Gemini CLI, a command-line interface tool that leverages large language models for software engineering tasks
  • 2Gemini CLI is Google's answer to Anthropic's Cloud Code, with some differences in capabilities and usage models
  • 3The hosts discuss the rapid progress of large language models and their integration into various workflows and applications
  • 4Anthropic has released the ability to publish interactive web apps, or 'artifacts', that can leverage Cloud's AI capabilities
  • 5The episode highlights the ongoing evolution of AI agents and the increasing integration of these technologies into software engineering and application development

Topics Discussed

#Large language models#AI agents#Software engineering tools#Application development#Anthropic Cloud

Frequently Asked Questions

What is "#214 - Gemini CLI, io drama, AlphaGenome, copyright rulings" about?

This episode of the Last Week in AI podcast covers a range of AI-related news and developments, including the launch of Google's Gemini CLI, a command-line interface tool that leverages large language models for software engineering tasks. The hosts discuss the capabilities and limitations of Gemini CLI compared to Anthropic's Cloud Code, as well as the broader trend of AI agents being integrated into various workflows and applications. The episode also touches on Anthropic's new artifact publishing feature, which allows users to build interactive web apps that leverage Cloud's AI capabilities.

What topics are discussed in this episode?

This episode covers the following topics: Large language models, AI agents, Software engineering tools, Application development, Anthropic Cloud.

What is key insight #1 from this episode?

Google launched Gemini CLI, a command-line interface tool that leverages large language models for software engineering tasks

What is key insight #2 from this episode?

Gemini CLI is Google's answer to Anthropic's Cloud Code, with some differences in capabilities and usage models

What is key insight #3 from this episode?

The hosts discuss the rapid progress of large language models and their integration into various workflows and applications

What is key insight #4 from this episode?

Anthropic has released the ability to publish interactive web apps, or 'artifacts', that can leverage Cloud's AI capabilities

Who should listen to this episode?

This episode is recommended for anyone interested in Large language models, AI agents, Software engineering tools, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

Our 214th episode with a summary and discussion of last week's big AI news! Recorded on 06/27/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: Meta's hiring of key engineers from OpenAI and Thinking Machines Lab securing a $2 billion seed round with a valuation of $10 billion. DeepMind introduces Alpha Genome, significantly advancing genomic research with a model comparable to Alpha Fold but focused on gene functions. Taiwan imposes technology export controls on Huawei and SMIC, while Getty drops key copyright claims against Stability AI in a groundbreaking legal case. A new DeepMind research paper introduces a transformative approach to cognitive debt in AI tasks, utilizing EEG to assess cognitive load and recall in essay writing with LLMs. Timestamps + Links: (00:00:10) Intro / Banter (00:01:22) News Preview (00:02:15) Response to listener comments Tools & Apps (00:06:18) Google is bringing Gemini CLI to developers’ terminals (00:12:09) Anthropic now lets you make apps right from its Claude AI chatbot Applications & Business (00:15:54) Sam Altman takes his ‘io’ trademark battle public (00:21:35) Huawei Matebook Contains Kirin X90, using SMIC 7nm (N+2) Technology (00:26:05) AMD deploys its first Ultra Ethernet ready network card — Pensando Pollara provides up to 400 Gbps performance (00:31:21) Amazon joins the big nuclear party, buying 1.92 GW for AWS (00:33:20) Nvidia goes nuclear — company joins Bill Gates in backing TerraPower, a company building nuclear reactors for powering data centers (00:36:18) Mira Murati’s Thinking Machines Lab closes on $2B at $10B valuation (00:41:02) Meta hires key OpenAI researcher to work on AI reasoning models Research & Advancements (00:49:46) Google’s new AI will help researchers understand how our genes work (00:55:13) Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks (01:01:54) Farseer: A Refined Scaling Law in Large Language Models (01:06:28) LLM-First Search: Self-Guided Exploration of the Solution Space Policy & Safety (01:11:20) Unsupervised Elicitation of Language Models (01:16:04) Taiwan Imposes Technology Export Controls on Huawei, SMIC (01:18:22) Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task Synthetic Media & Art (01:23:41) Judge Rejects Authors’ Claim That Meta AI Training Violated Copyrights (01:29:46) Getty drops key copyright claims against Stability AI, but UK lawsuit continues See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Full Transcript

Hello and welcome to the Last Week in AI podcast where you can hear us chat about what's going on with AI. As usual in this episode, we will summarize and discuss some of last week's most interesting AI news, and you can check out the episode description for the links to that and the timestamps. I'm one of your regular hosts, Andrei Kerenkov. I studied AI in grad school and now work at a generative AI startup. And I'm your other host, Jeremy Harris, co-founder of Gladstone AI, AI National Security, blah, blah, blah, as you know. And I'm the reason this podcast is going to be an hour and a half and not two hours. Andrei is very patiently waiting for like half an hour while I just sorted out. My daughter's been teething and it's wonderful having a daughter, but sometimes teeth come in six or eight in a shot and then you have your hands full. And so she is the greatest victim of all this. But Andre is a close second because, boy, that was – I kept saying five more minutes and it never happened. So I appreciate the patience, Andre. I got an extra half hour to prep, so I'm not complaining. And I'm pretty sure you had a rougher morning than I did. I was just drinking coffee and waiting, so not too bad. But speaking of this episode, let's do a quick preview. It's going to be, again, kind of less of a major news week. Some somewhat decently big stories, tools and apps. Gemini CLI is a fairly big deal. Applications and business. We have some fun open AI drama and a whole bunch of hardware stuff going on. And not really any major open source stuff this week. So we'll be skipping that. Research and advancements. Exciting new research from DeepMind. DeepMind and just various papers about scalable reasoning, reinforcement learning, all that type of stuff. Finally, in policy and safety, we'll have some more interoperability, safety, China stories, the usual, and some pretty major news about copyright following up on what we saw last week. So that actually would be one of the highlights of this episode towards the end. Before we get to that, I do want to acknowledge a couple of reviews on Apple Podcasts as we do sometimes. Thank you to the kind reviewers leaving us some very nice comments. Also some fun ones. I like this one. This viewer said, I want to hear a witty and thoughtful response on why AI can't do what you're doing with the show. And wow, you're putting me in the spot being both witty and thoughtful. And it did make me think. I will say I did try Notebook LM a couple months ago, right? And that's the podcast generator from Google. It was good, but definitely started repeating itself. I found that LLMs still often have this issue of losing track of where they're at, like 10 minutes, 20 minutes in, repeating themselves or just otherwise. And also Andre, and repeating themselves too. And they'll just keep saying the same thing and repeating over and over, like they'll repeat and repeat a lot. So yeah. That kind of repetition was solved a couple of years ago, thankfully. Honestly, you could do a pretty good job replicating last week in AI with LRMs these days. I'm not going to lie, but you're going to have to do very precise prompting to get our precise personas and personalities and voices and so on. So I don't know. Hopefully, we're still doing a better job that AI could do, or at least doing a different job than the more generic kind of outcomes you could get trying to elicit AI to make an AI news podcast. Dude, and what AI could compete with starting 30 minutes late because it's daughter's teething? Like, I challenge you right now, try it. You're not going to find an AI that can pull that off? You can have AI that says it does, but will the emotion of that experience actually be in it? I don't think so. I think the copium way, right? People are often like, oh, it won't have the heart. It won't have like the soul, you know, the podcast. It will. It will. In fact, I think arguably our job is to surface for you the moment that that is possible that you can stop listening to us. One of the virtues of not being like a full-time podcaster on this too is we have that freedom maybe more than we otherwise would. But man, I mean, it's, I would expect within the next 18 months, hard to imagine that there won't be something comparable. but then your podcast hosts won't have a soul. They'll be inside a box. Well, in fact, I'm certain. I believe as of quite a while ago, there are already AI-generated AI news podcasts out there. I haven't checked them out, but I'm sure they exist. And nowadays, they're probably quite good. And you get one of those every day as opposed to once a week, and they're never a week behind. So in some ways, definitely superior to us. But in other ways, can they be so witty and thoughtful in responding to such a question? I don't know. In fact, can they be so lacking in wit and thought as we can be sometimes? That's right. That's a challenge. They'll never out-compete with our stupid. Yes, as is true in general. I guess you'd have to really try to get AI to be bad at things when it's actually good. Anyways, a couple more reviews lately. So I do want to say thank you. another one that's called this the best ai podcast which is quite the honor and says that this is the only one they listen to at normal speed most of the other podcasts are played in 1.5 or 2x speed so good to hear we are using up all our two hours at a good pace that's right funny a while ago there was a review that was like i always speed up through andre's talking and then have to listen No worries for Jeremy. So maybe I've sped up since then. So yeah, as always, thank you for the feedback. And thank you for questions that you bring in. I think it's a fun way to start the show. But now let's go into the news, starting with tools and apps. And the first story is, I think, one of the big ones of this week, Gemini CLI. So this is essentially Google's answer to cloud code. It is a thing you can use in your terminal, which for any non-programmers out there is just the text interface to working on your computer. So you can look what files they are, open them, read them, type stuff, et cetera, all via non-UI interface. And now this CLI is Gemini in your terminal. And it has the same sorts of capabilities at a high level as cloud code. So it's an agent and you launch it and you tell it what you want it to do and it goes off and does it. And it sort of takes turns between it doing things and you telling it to follow up, to change what it's doing or to check what it's doing, etc. With this launch, Google is being pretty aggressive, giving away a lot of usage, 60 model requests per minute and 1,000 requests per day. It's a very high allowance as far as caps. And there's also a lot of usage for free without having to pay. I'm not sure if that is the cap for free, but for now, you're not going to have to pay much. I'm sure sooner or later, you get to the cloud code type of model where to use cloud code at the highest level, you have to pay $200 per month or $100 a month, which is what we at our company already do because cloud code is so useful. From what I've seen on conversations online, the vibe, the eval, is that this is not quite as good as cloud code. It isn't as capable of software engineering, of using tools, just generally figuring things out as it goes. But it was just released, could be a strong competitor soon enough. Yeah, I'm still amazed at how quickly we've gotten used to the idea of a million token context window, by the way, because this is powered by Gemini 2.5 Pro, the reasoning model. And that's part of what's in the back end here. So that's going to be the reason also that it doesn't quite, you know, live up to the Claude standard, which is obviously a model that's a lot. I don't know. It just seems to work better with code. I'm curious about when that changes, by the way, and what Anthropik's actual recipe is. Like, why is it working so well? We don't know, obviously, but someday, maybe after the singularity, when we're all one giant hive mind, we'll know what actually was going on to make the cloud models this good and persistently good. But in any case, yeah, it's a really impressive play. The advantage that Google has, of course, over Anthropic currently is the availability of just a larger pool of compute. And so when they think about driving costs down, that's where you see them trying to compete on that basis here as well. So a lot of free prompts, a lot of free tokens, I should say, good deals on the token counts that you put out. So, you know, it's one way to go. And I think as the ceiling rises on the capabilities of these models, eventually cost does become a more and more relevant thing for any given fixed application. So that's an interesting dynamic, right? The frontier versus the fast followers. I don't know if it's quite right to call Google a fast follower. They're definitely doing some frontier stuff. But anyway, yeah, so interesting next move here. Part of the productionization, obviously, of these things and entering workflows in very significant ways. I think this is heading in slow increments towards a world where agents are doing more and more and more. And context windows, coherence lengths are all part of that. Right. Yeah, we discussed last year, like towards the beginning of last year was real kind of hype train for agents and the agentic future. And I think Cloud Code and Gemini CLI are showing that we are definitely there. In addition to things like Repl.it, Lovable, broadly speaking, LMS have gotten to a point, partially because of reasoning, partially presumably just due to improvements in LMS, where you can use them in agents and they're very successful. For what I've seen, part of the reason Cloud Code is so good is not just Cloud. It's also just Cloud Code, particularly the agent, is very good at using tools. It's very good at doing text search, text replacement. It's very keen on writing tests and running them as it's doing software engineering. So it is a bit different than just thinking about an LLM. LLM, it's the whole sort of suite of what the agent does and how it goes about its work that makes it so successful. And that's something you don't get out of a box with LLM training, right? Because tool usage is not in your pre-training data. It's something kind of on top of it. So that is yet another thing similar to reasoning where we are now going beyond the regime of you can just train on tons of data from the internet and get it for free, more and more things in addition to alignment now you need to add to volume beyond just throwing a million gigabytes of data at it it really is a system right like at the end of the day it's not it's also not just one model a lot of people have this image of like you know there's one monolithic model in the back end assume that there's a lot of like models choosing which models to answer a prompt and i'm not even talking about moe stuff like just literal software engineering in the backend that makes these things have the holistic feel that they do. So, yeah. FYI, by the way, I didn't remember this, so I looked it up. CLI stands for command line interface, command line, another term for terminal. So again, for any non-programmers, fun detail. And speaking of cloud code, the next story is about Anthropic and they have released their ability to publish artifacts. So artifacts are these little apps, essentially, you can build within Cloud. You get a preview and they're interactive web apps, more or less. And as with some other ones, I believe, Google allows you to publish gems, is what they call it. Now you can publish your artifacts and other people can browse them. They also added the support to building apps with AI built in, with Cloud being part of the app. So now if you want to build like a language translator app within Cloud, you can do that because the app itself can query Cloud to do with translation. so you know not a huge delta from just having artifacts but another sort of seemingly a trend where all the llms tend to wind up at similar places as far as you add things like artifacts when you make it easy to share what you build and you know it's something that anyone can do Most users on their free Pro Max tiers can share, and they'll be interested to see what people build. And if I'm Replit, I'm getting pretty nervous looking at this. Granted, obviously Replit has, so Replit, right, that platform that lets you essentially like launch an app really easily, takes abstracts away all the like server management and stuff. And like you've got kids launching games and all kinds of useful apps and learning to code through it. really, really powerful tool and super, super popular. I mean, it's 10X year over year. It's growing really fast, but you can start to see the frontier moving more and more towards, let's make it easier and easier at first for people to build apps. So we're going to have, you know, an agent that just writes the whole app for you or whatever, and just produces the code. But at what point does it naturally become the next step to say, well, let's do the hosting. Let's abstract away all the things. You could see OpenAI, you could see Anthropic launching a kind app store. That's not quite the right term, right? Because we're talking about more fluid apps, but moving more in that direction, hosting more and more of it, and eventually getting to the point where you're just asking the AI company for whatever high level need you have, and it'll build the right apps or whatever. That's not actually that crazy sounding today. And again, that swallows up a lot of the Replik business model, and it'll be interesting to see how they respond. Yeah, and this is particularly true because of the converging or parallel trend of these context model protocols that makes it easy for AI to interact with other services. So now if you want to make an app that talks to your calendar, talks to your email, talks to your Google Drive, whatever you can think of, basically any major tool you're working with, AI can integrate with it easily. So if you want to make an app that does something with connection to tools that you use, you could do that within Cloud. So as you said, I think both Replit and Lovable are these emerging titans in the world of building apps with AI. And I'm sure they'll have a place in the kind of domain of more complex things where you need databases and you need authentication and so on and so on. But if you need to build an app for yourself or for maybe just a couple of people to speed up some process, you can definitely do it with these tools now and share them if you want. And on to applications and business, as promised, kicking off with some OpenAI drama, which we haven't had in a little while. So good to see it isn't ending. This time, it's following up on this IO trademark lawsuit that happened. We covered it last week. We had OpenAI, Sam Altman, announce the launch of this IO initiative with Johnny Ive. And there's another AI audio hardware company called IYO, spelled differently, I-Y-O instead of I-O. And they sued, alleging that, you know, they stole the idea and also the trademark. The names sound very similar. And yeah, Sam Altman hit back, decided to publish some emails. Just a screenshot of emails showing the founder of IO, let's say, being very friendly, very enthusiastic about meeting Rob Altman and wanting to be invested in by OpenAI. And the basic gist of what Sam Altman said is this founder, Jason Rugelow, who filed the lawsuit, was kind of persistent in trying to get investments from Sam Altman. In fact, he even reached out in March prior to the announcements with Johnny Ive. And apparently Sam Altman let him know that the competing initiative he had was called IO. So definitely, I think, an effective pushback on the lawsuit, similar in a way to what OpenAI also did with Elon Musk. Just like, here's the evidence. Here's the receipts of your emails. Not too sure if what you're saying is legit. This is becoming, well, two is not yet a pattern, is it? Is it three? I forget how many it takes to make a pattern, they say. Then again, I don't know who they are or why they're qualified to tell us it's a pattern. But, yeah, this is an interesting situation. One interesting detail kind of gives you maybe a bit of a window into how the balance of evidence is shaping up so far. We do know that in the lawsuit, EO, so not IO, but EO. I was going to say Jason Derulo. Jason Rugolo's company did end up – sorry, where was it? They were actually – yeah, they were granted a temporary restraining order against OpenAI using the IO branding themselves. So OpenAI was forced to change the IO branding due to this temporary restraining order, which was part of EO's trademark lawsuit. So at least at the level of the trademark lawsuit, there has been an appetite from the courts to put in this sort of preliminary temporary restraining order. I'm not a lawyer, so I don't know what the standard of proof would be that would be involved in that. So at least at a trademark level, maybe it's like sounds vaguely similar enough. So, yeah, for now, let's let's tell OpenAI they can't do this. But there's enough fundamental differences here between the devices that you can certainly see OpenAI's case for saying, hey, this is different. They claim that the I.O. hardware is not an in-ear device at all. It's not even a wearable. That's where that information comes from that was itself doing the rounds. This big deal, OpenAI's new device is not actually going to be a wearable after all. But we do know that apparently, so Rigolo was trying to pitch a bunch of people about their idea, about the IO concept, sorry, the EO concept, way back in 2022, sharing information about it to former Apple designer Evans Hankey, who actually went on to co-found IO. So there's a lot of overlap here. The claim from OpenAI is, look, you've been working on it since 2018. You've demoed it to us. It wasn't working. There were these flaws. Maybe you fixed them since, but at the time it was a janky device. So that's why we didn't partner with you. But then you also have this whole weird overlap where, yeah, some of the founding members of the EO team had apparently spoken directly to EO before. So it's pretty messy. I think we're going to learn a lot in the court proceedings. I don't think these emails give us enough to go on to make a firm determination about what, because we don't even know what the hardware is. And that seems to be at the core of this. So what is the actual hardware and how much of it did OpenAI, did LoveFrom, did IO actually see? Right. And in the big scheme of things, this is probably not a huge deal. This is a lawsuit saying you can't call your thing IO because it's too similar to our thing, EEO. And it's also seemingly some sort of wearable AI thing. So worst case, presumably the initiative by Simon and Johnny Ive changes. I think more than anything, this is just another thing to track with OpenAI, right? Another thing that's going on that for some reason, right, we don't have these kinds of things with Anthropik or Mistral or any of these other companies. Maybe because OpenAI is the biggest, there just tends to be a lot of this, you know, in this case, legal business drama, not interpersonal drama, but nevertheless, a lot of headlines and honestly, juicy kind of stuff to discuss. Yeah, yeah, yeah. Yeah. So another thing going on and another indication of the way that Samothman likes to approach these kinds of battles in a fairly public and direct way. Up next, we have Huawei Matebook contains Kirin X90 using SMIC 7nm N2 technology. If you're a regular listener of the podcast, you're probably going, oh, my God. And then, or maybe you are, I don't know, this is maybe a little in the weeds, but either way, you might want a refresher on what the hell this means, right? So there was a bunch of rumors actually floating around that Huawei had cracked, sorry, that SMIC, which is China's largest semiconductor foundry or most advanced one, you can think of them as being China's domestic TSMC. There's a bunch of rumors circulating about whether they had cracked the five nanometer node, right? That critical node that is what was used or a modified version of it was used to make the H100 GPU, the NVIDIA H100. So if China were to crack that domestically, that'd be a really big deal. Well, those rumors now are being squashed because this company, which is actually based in Canada, did an assessment. So Tech Insights, we've actually talked a lot about their findings. sometimes while mentioning them by name, sometimes not. We really should. Tech Insights is a very important firm in all this. They do these teardowns of hardware. They'll go in deep and figure out, oh, what manufacturing process was used to make this component of the chip, right? That's the kind of stuff they do. And they were able to confirm that, in fact, the Huawei X90, so system on a chip, was actually not made using 5nm equivalent processes, but rather using the old 7nm process that we already knew SMIC had. So that's a big, big deal from the standpoint of their ability to onshore domestically GPU fabrication and keep up with the West So it seems like we like two years down the road now from when SMIC first cracked the 7 nanometer node and we still not on the 5 nanometer node yet That really really interesting And so worth saying, like Huawei never actually explicitly said that this new PC had a 5 nanometer node. There was just a bunch of rumors about it. So what we're getting now is just kind of the decisive quashing of that rumor. Right. Right. And broader context here is, of course, that the U.S. is preventing NVIDIA from selling top of line chips to Chinese companies. And that does limit the ability of China to create advanced AI. They are trying to get the ability domestically to produce chips competitive with NVIDIA. Right now, they're, let's say, about two years behind, is my understanding. and this is the real one of the real bottlenecks is if you're not able to get the state-of-the-art fabrication process for chips there's just less compute you can get on the same amount of chip right it's just less dense and this arguably is the hardest part right to get this thing it takes forever as you said two years with just this process and it is going to be a real blocker if they're not able to crack it. Yeah, the fundamental issue China is dealing with is because they have crappier nodes, so they can't fab the same quality of nodes as TSMC, they're forced to either steal TSMC fabbed nodes or find clever ways of getting TSMC to fab their designs, often by using subsidiaries or shell companies to make it seem like they're, you know, maybe we're coming in from Singapore and asking TSMC to fab something, or we're coming in from a clean Chinese company, not Huawei, which is blacklisted. And then the other side is because their alternative is to go with these crappier seven nanometer process nodes, those are way less energy efficient. And so the chips burn hotter or they run hotter rather, which means that you run into all these kinds of heat induced defects over time. And we covered that, I think last or two episodes ago, last episode I was on. So anyway, there's a whole kind of hairball of different problems that come from ultimately the fact that SMIC has not managed to keep up with TSMC. Right. And you're seeing all these 10 billion, 20 billion dollar data centers being built. Those are being built with, you know, racks and racks and huge amounts of GPUs. The way you do it, the way you supply energy, the way you cool it, etc. All of that is conditioned on the hardware you have in there. So it's very important to ideally have the state of art to build with. Next story also related to hardware developments, this time about AMD, and they now have an ultra Ethernet ready network card, the Pensando Polara, which provides up to 400 gigabits per second, is that it? Per second performance. And this was announced at their advancing AI event. It will be actually deployed by Oracle Cloud with the AMD Instinct A350X GPUs and the network card. So this is a big deal because AMD is trying to compete with Nvidia on the GPU front and their series of GPUs does seem to be catching up or at least has been shown to be quite usable for AI. This is another part of the stack, the inter-chip communications, but it's very important and very significant in terms of what NVIDIA is doing. Yeah, 100%. This is, by the way, the industry's first ultra-Ethernet compliant NIC, so a network interface card. So what the NIC does, you've got, and you go back to our hardware episode to kind of see more more detail on this but in a rack say at the rack level at the pod level you've got all your gpus that are kind of tightly interconnected with accelerator interconnect this is often like the the nvidia product for this is nvlink this is super low latency super expensive interconnect but then if you want to connect like pods to other pods or racks to other racks you're now forced to hop through a slower interconnect part of what's known sometimes as the back-end network and when you do that the nvidia solution you'll tend to use for that is infiniband right so you've got you've got nvlink for the really like within a pod but then from pod to pod you have infiniband and infiniband has been a go-to de facto like kind of gold standard in the industry for a while companies that aren't nvidia don't like that because it means that nvidia owns more of the stack and has an even deeper kind of de facto monopoly on different components and so you've got this thing called the Ultra Ethernet Consortium that came together. It was founded by a whole bunch of companies, AMD, notably Broadcom, I think Meta and Microsoft were involved, Intel. And they came together and said, hey, let's come up with an open source standard for this kind of interconnect with AI-optimized features that basically can compete with the InfiniBand model that NVIDIA has out. So that's what Ultra Ethernet is. It's been in the works for a long time. We've just had the announcement of specification 1.0 of that ultra Ethernet protocol. And that's specifically for hyperscale AI applications and data centers. And so this is actually a pretty seismic shift in the industry. And there are actually quite interesting indications that companies are going to shift from InfiniBand to this sort of protocol. And one of them is just cost economics. Like Ethernet has massive economies of scale already across the entire networking industry. And InfiniBand is more niche. So as a result, you kind of have ultra Ethernet chips and like switches that are just so much cheaper. So you'd love that. You also have vendor independence. You have, because it's an open standard, anyone can build to it instead of just having NVIDIA own the whole thing. So the margins go down a lot and people really, really like that. Obviously all kinds of operational advantages. It's just operationally more simple because data centers already know Ethernet and how to work with it. So anyway, this is a really interesting thing to watch. I know it sounds like it sounds boring. It's the interconnect between different pods in a data center. But this is something that executives at the top labs really sweat over because there are issues with the InfiniBand stuff. This is one of the key rate limiters in terms of how big models can scale. Right, yeah. To give you an idea, Oracle is apparently planning to deploy these latest AMD GPUs with a Zeta-scale AI cluster with up to 131 and 72 Instinct MI355X GPUs. GPUs. So when you get to those numbers, like think of it, 131,000 GPUs. GPUs aren't small, right? The GPUs are pretty big. They're not like a little chip. They're, I don't know, like notebook-sized-ish. And there's now 131,000 that you need to connect all of them. And when you say pod, right, typically you have this rack of them, like almost a bookcase, you can think where you connect them with wires, but you can only get, I don't know how many, typically 64 or something on that side. When you get to 121,000, this kind of stuff starts really mattering. And in their slides, in this event, they did, let's say, very clearly compare themselves to the competition that said that this has 20X scale over in feeding band, whatever that means has performance of 20 over competition stuff like that so md is very much trying to compete and be offering things that are in some ways ahead of nvidia and others like broadcom and so on and next up another hardware story this time dealing with energy amazon is joining the big nuclear party by buying 1.92 gigawatts of electricity from Talon Energy's Suskiana nuclear plant in Pennsylvania. So nuclear power for AI, it's all the rage. Yeah. I mean, so we've known about, if you flip back, originally this was the 960 megawatt deal that they were trying to make. And that got killed by regulators who were worried about customers on the grid. So essentially everyday people who are using the grid, who would, in their view, unfairly shoulder the burden of running the grid. Today, Susquehanna powers the grid. And that means every kilowatt hour that they put in leads to transmission fees that support the grid's maintenance. And so what Amazon was going to do was going to go behind the meter, basically link the power plant directly to their data center without going through the grid. So there wouldn't be grid fees. And that basically just means that the general kind of grid infrastructure doesn't get to benefit from those fees over time, sort of like not paying toll when you go on a highway. And this new deal that gets us to 1.2 gigawatts is a revision in that. It's got Amazon basically going through in front of the meter, going through the grid in the usual way. They're going to be, as you can imagine, a whole bunch of infrastructure needs to be reconfigured, including transmission lines. Those will be done in spring of 2026. And the deal apparently covers energy purchased through 2042, which is sort of amusing because like, imagine trying to pick people up at a time. But yeah. I guess they are predicting that they'll still need electricity by 2042, which assuming X-Risk doesn't come about, I suppose that's fair. Yeah. Yeah. Next story, also dealing with nuclear and dealing with NVIDIA. It is joining Bill Gates and others in backing TerraPower, a company building nuclear reactors for powering data centers. So this is through NVIDIA's venture capital arm and ventures. And they have invested in this company, TerraPower, investing, it seems like, $650 million alongside Hyundai. And TerraPower is developing a 345 megawatt natrium plant in Wyoming right now. So they're, I guess, in the process of starting to get to a point where this is usable, although it probably won't come for some years. Your instincts are exactly right on the timing too, right? So there's a lot of talk about SMRs, like small modular reactors, which are just a very efficient way and very safe way of generating nuclear power on site. That's the exciting thing about them. They are the obvious, apart from like fusion, they are the obvious solution of the future for powering data centers. The challenge is when you talk to data center companies and builders, they'll always tell you like, yeah, SMRs are great, but we're looking at first approvals, first SMRs generating power like at the earliest, like 20, 29, 20, 30 type thing. So if you have sort of shorter AGI timelines, they're not going to be relevant at all for those. If you have longer timelines, even kind of somewhat longer timelines, then they do become relevant. So it's a really interesting space where we're going to see a turnover in the kind of energy generation infrastructure that's used. And this, you know, people talk a lot about China and their energy advantage, which is absolutely true. I'm quite curious whether this allows the American energy sector to do a similar leapfrogging on SMRs that China did, for example, on mobile payments, right? When you just like do not have the ability to build nuclear plants in less than 10 years, which is the case for the United States. We just don't have that know-how and, frankly, the willingness to deregulate to do it and the industrial base. Then it kind of forces you to look at other options. And so if there's a shift just in the landscape of power generation, it can introduce some opportunities to play catch up. So I guess that's a hot take there that I haven't thought enough about, but that's an interesting dimension anyway to the SMR story. By the way, one gigawatt, apparently equivalent to 1.3 million horsepower. So not sure if that gives you an idea of what a gigawatt is, but it's a lot of energy. Or one gigawatt is a lot. Yeah. One million homes for one day or what does that actually mean? So gigawatt is a unit of power. So it's like the amount of power that a million homes just consume on a running basis. Yeah, exactly. So one gigawatt is a lot. So it's 345 megawatts. now moving on to some fundraising news amira morati her company thinking machines lab has finished up their fundraising getting two billion dollars at a 10 billion dollar valuation and this is the seed round so yet another billion round billion dollar seed round and this is of course The former CTO of OpenAI left in 2024, I believe, and has been working on setting up Thinking Machines Lab, another competitor in the AGI space, presumably planning to train their own models, recruited various researchers, some of them from OpenAI, and now has billions to work with that are deployed, presumably to train these large models. Yeah, it's funny. Everyone just kind of knew that it was going to have to be a number with billion after it, just because of the level of talent involved. It is a remarkable talent set. The round is led by Andreessen Horowitz. So A16Z on the cap table now. Notably, though, Thinking Machines did not say what they're working on to their investors. At least that's what this article, that's what it sounds like. The wording is maybe slightly ambiguous. I'll just read it explicitly. You can make up your mind. Thinking Machines Lab had not declared what it was working on, instead using Marati's name and reputation to attract investors. So that suggests that A16Z cut, they didn't cut the full $2 billion check, but they led the round. So hundreds and hundreds of millions of dollars just on the basis of like, yeah, you know, Mira's a serious fucking person. John Shulman's a serious fucking person. You know, Jonathan Lackman, like all kinds of people, bears off. These are really serious people. So we'll cut you $800 million check, whatever they cut as part of that. That's both insane and tells you a lot about how the space is being priced. The other weird thing we know, and we talked about this previously, but it bears kind of repeating. So Maradi is going to hold, is a mirror is going to hold board voting rights that outweigh all other directors combined. This is a weird thing, right? This is not what is with all these AGI companies and the really weird board structures. A lot of it is just like the OpenAI mafia, like people who worked at OpenAI did not like what Sam did and learned those lessons and then enshrined that in the way they run their company, in their actual corporate structure. And Anthropic has their public benefit company set up with their oversight board. And now Thinking Machines has this Mira Morati dictatorship structure, where she has final say basically over everything at the company. By the way, everything I've heard about her is exceptional. Every open AI person I've ever spoken to about Mira has just glowing things to say about her. And so even though $2 billion is not really enough to compete, if you believe in scaling laws, it tells you something about, you know, the kinds of decisions people will make about where they work include who will I be working with? And this seems to be a big factor, I would guess, in all these people leaving OpenAI. She does seem to be a genuinely exceptional person. Like I've never met her, but again, everything I've heard is just like glowing and both in terms of competence and in terms of kind of smoothness of working with her. So that may be part of what's attracting all this talent as well. Yes. And on the point of not quite knowing what they're building, if you go to thinkingmachines.ai, this has been the case for a while, you'll get a page of text. The text is, let's say, reads like a mission statement that Shura is saying a lot. There's stuff about scientific progress being a collective effort, emphasizing human AI collaboration, more personalized AI systems, infrastructure quality, advanced multimodal capabilities, research product co-design, empirical approach to AI safety, measuring what truly matters. I have no idea here. This is like just saying a whole bunch of stuff and you can really take away whatever you want. Presumably it'll be something that is competing with OpenAI and Anthropic fairly directly is the impression. And near the bottom of the page at Thinking Machines.ai, founding team has a list of a couple dozen names, each one. with you can hover over it to see that background, as you say, like real heavy hitters. And then there are advisors and a join us page. So yeah, it really tells you what if you gain a reputation and you have some real star talent in Silicon Valley, that goes a long way. And on a note, next story quite related. Meta has hired some key open AI researchers to work on their AI reasoning models. So a week ago or two weeks ago, we talked about how Meta paid a whole bunch of money, invested rather in Scale.ai and hired away the founder of Scale.ai, Alex Wang, to head their new super intelligence efforts. Now there are these reports. I don't know if this is highlighting it particularly because of Open.ai or perhaps this is just with juicy details. I'm sure Meta has hired other engineers and researchers as well. But I suppose this one is worth highlighting. They did hire some fairly notable figures from OpenAI. So this is Lucas Baer, Alexander Kolesnikov, and Shihau Jai, who I believe founded the Sweden office. Sweden office, was it? Anyway, they were a fairly significant team at OpenAI. or so it appears to me. And I think Lucas Baer did post on Twitter and say that the idea that we were paid $100 million was fake news. This is another thing that's been up in the air. Sam Altman has been taking, you could say some gentle swipes, saying that has been promising insane pay packages. So all this to say is, this is just another indication of Mark Zuckerberg very aggressively going after talent. We know he's been personally messaging dozens of people on WhatsApp and whatever, being like, hey, come work for Meta. And perhaps unsurprisingly, that is paying off in some ways in expanding the talent of this super intelligence team. Yeah, there's a lot that's both weird and interesting about this. The first thing is anything short of this would be worth zero. When you're in Zuck's position, and I'll just sort of like, this is colored by my own interpretation of who's right and who's wrong in this space, but I think it's increasingly sort of just becoming clear in fairness. I don't think it's just my bias is saying that. When your company's AI efforts, despite having access to absolutely frontier scales of compute, so having no excuses for failure on the basis of access to infrastructure, which is the hardest and most expensive thing, when you've managed to tank that, So catastrophically, because your culture is taken is screwed up by having Jan LeCun as the mascot, if not the leader of your internal AI efforts, because he's not actually as influential as it sounds or hasn't been for a while on the internals of Facebook. But he has set the beat at Facebook, at Meta, being kind of skeptical about AGI, being skeptical about scaling and then like changing his mind in ego preserving ways without admitting that he's changed his mind. I think these are very damaging things. They destroy the credibility of meta and have done that damage. And I think the fact that meta is so far behind today is a reflection, in large part, a consequence of Yan LeCun's personality and his inability to kind of update accordingly and maintain like epistemic humility on this. I think everybody can see it. He's like the old man who's still yelling at clouds and just like as the clouds change shape, he's like trying to pretend they're not. But but I think just like speaking as like if I were making the decision about where to work, that would be a huge factor. And it has just objectively played out in a catastrophic failure to leverage one of the most impressive fleets of AI infrastructure that there actually is. And so what we're seeing with this set of hires is people who are, I mean, so completely antithetical to Yan LeCun's way of thinking. Like, Meta could not be pivoting harder in terms of the people it's poaching here. First of all, OpenAI, obviously one of the most scale-pilled organizations in the space, probably the most scale-pilled. Anthropic actually is up there too. But also, ScaleAI's Alex Wang. So, okay, that's interesting. Very scale-pilled dude, also very AI safety-pilled dude. Daniel Gross, arguably quite AI safety-pilled. At least that was the mantra of safe superintelligence. Weird that he left that so soon. A lot of open questions about how safe superintelligence is doing, by the way, if Daniel Gross is now leaving. I mean, DG was the CEO, right? Co-founded it with Ilya. So what's going on there? So that's a hanging chad. But just Daniel Gross being now over on the meta side, you have to have enough of a concentration of exquisite talent to make it attractive for other exquisite talent to join. Like, if you don't break that critical mass, you might as well have nothing. And that's been Meta's problem this whole time. They needed to just, like, jumpstart this thing with a massive capital infusion. Again, these massive pay packages, that's where it's coming from. Just give people a reason to come, get some early proof points to get people excited about Meta again. And the weird thing is, with all this, like, I'm not confident at all in saying this, but you could see a different line from Meta on safety going forward, too, because Yann LeCun was so dismissive of it. But now a lot of the people they've been forced to hire because there is, if you look at it objectively, a strong correlation between the people in teams who are actually leading the frontier and the people in teams who take loss of control over AI seriously. Now Meta is kind of forced to change in some sense its DNA to take that seriously. So I think that's just a really interesting shift. And I know this sounds really harsh with respect to Yann LeCun. Like, you know, take it for what it is. It's just one man's opinion. But I have spoken to a lot of researchers who feel the same way. And again I think the data kind of bears it out Essentially Mark Zuckerberg is being forced to pay the Yen Lekun tax right now And I don know what happens to Yen Lekun going forward but I do kind of wonder if his meta days may be numbered or you know if there going to be a face measure that has to be taken there Right For context Yen LeCun is Meta chief AI scientist He been there for over a decade hired like, I think around 2013, 2012 by Meta, one of the key figures in the development of newer networks really over the last couple of decades and certainly is a major researcher and contributor to the rise of deep learning in general, but as you said, a skeptic on large language models and a proponent for sort of other techniques. I will say not entirely bought into this narrative personally. The person heading up the effort on LAMA and LLMs was not, Yanukun, as far as I'm aware. There was another division within Meta that focused on generative technology that has now been revamped. So the person leading the generative AI efforts in particular has left. And now there is an entirely new division called AGI Foundations that is now being set up. So this is part of a major revamp. Yanukun is still leading his more research research publication type side of things. And perhaps, as far as I know, not very involved in this side of scaling up LAMA and LLMs and all of this, which is less of a research effort, more of an R&D kind of compete with OpenAI and so on. I absolutely agree. And that was what I was referring to when I was saying Yanlokun is not sort of involved in the day-to-day kind of product side of the org. It's been known for a while that he's not actually doing the heavy lifting on LAMA, but he has defined what it means, like essentially articulated Meta's philosophy on AI and AI scaling for the last, you know, however many years. And so it's understood that when you join Meta, or at least it was that you were buying into a sort of Jan-Lakuna-aligned philosophy, which I think is the kind of core driving problem behind where Meta finds itself today. Yeah, that's definitely part of it. I mean, that's part of the reputation of Meta as an AI research club. But also, I mean, part of the advantage of Meta and why people might want to go to Meta is because of their very open source friendly nature. They're only very open source friendly because they're forced to do that because it's the only way they can get headlines while they pump out media. But regardless, it's still a factor here. One last thing worth noting on this whole story. I mean, you could do a whole speculative analysis of what went on in Meta. They did also try to throw a lot of people at the problem, scale up from a couple hundred to like a thousand people. I think probably had a similar situation to Google where it was like big company problems, right? OpenAI and FrawlBick, they're still, they're huge, but they don't have big company problems. That's a great point. They have scaling company problems. So this revamp could also help. Yeah. All righty. On to research and advancements. No more drama talk, I guess. Next, we have a story from DeepMind, and they have developed Alpha Genome, the latest in their alpha line of scientific models. So this one is focused on helping researchers understand gene functions. It's not meant for personal genome prediction, but more so just general identification of patterns. So it could help identifying causative mutations in patients with ultra rare cancers. So for instance, which mutations are responsible for incorrect gene expression? I'm going to be honest, you know, there's a lot of deep science here with regards to biology and genomics, which I am not at all an expert on. and the gist of it is similar to alpha fold similar to all other alpha efforts on the benchmarks dealing with the problems that geneticists deal with the kind of prediction issues the analysis alpha genome kind of beats all existing techniques out of a park on almost every single benchmark it is superseding previous efforts. And this one model is able to do a lot of things all at once. So again, not really my background to comment on this too much, but I'm sure that this is along the lines of alpha fold in terms of alpha fold was very useful scientifically for making predictions about gene folding, protein folding. Alpha genome is presumably going to be very useful for understanding genomics, for making predictions about which genes do what, things like that. It's a really interesting take that's, I guess, a fundamentally different way of approaching the let's understand biology problem that Google DeepMind and then its subsidiary, I guess, its spawned company, Isomorphic Labs, which, by the way, Dennis is the CEO of and very focused on, I hear, has kind of been very focused on anyway. When you look at AlphaFold, you're looking at essentially predicting the structure and And to some degree, the function of proteins from the Lego blocks that make up those proteins, right? The amino acids, the individual amino acids that get chained together, right? So you got 20 amino acids you can pick from, and that's how you build a protein. And depending on the amino acids that you have, some of their positive charge, some of their negative, some of their polar, some of them not. And then the thing will fold in a certain way. That is distinct from the problem of saying, okay, I've got a strand of 300 billion base pairs, sorry, 3 billion base pairs of DNA. And what I want to know is if I take this one base pair and I switch it from, I don't know, like from an A to a T, right, or from a G to an A, what happens to the protein? What happens to the downstream kind of biological activity? What cascades does that have? What effects does it have? And that question is a it's an interesting question because it depends on your ability to model biology in a pretty interesting way. It also is tethered to an actual phenomenon in biology. So there's a thing called the single nucleotide polymorphism. There's some nucleotides in the human genome that you'll often see can either be like a G or a T or something. And you'll see some people who have the G variant and some people have the T variant. And it's often the case that some of these variants are associated with a particular disease. And so there's like a, I used to work in a genomics lab doing cardiology research back in the day. And there's like famous variant called 9P21.3 or something. And, you know, if some people had, I forget what it was, the T version, you'd have a higher risk of getting coronary artery disease or atherosclerosis or whatever, and not if you had the other one. So essentially what this is doing is it's allowing you to reduce in some sense, the number of experiments you need to perform. If you can figure out, okay, like we have all these different possible variations across the human genome, but only a small number of them actually matter for a given disease or effect. And if we can model the genome pretty well, we might be able to pin down the variants we actually care about so that we can run more controlled experiments, right? So we know that, hey, you know, patient A and patient B, they may have like a zillion different differences in their genomes, but actually for the purpose of this effect, they're quite comparable or they ought to be. So this is anyway, really, I think, interesting next advance from Google DeepMind. And I expect that we'll see a lot more because they are explicitly interested in that direction. Right. And they released a pretty detailed research paper, a preprint on this as they have of AlphaFold, 55-page paper describing the model, describing the results, describing the data, all that. Also released an API, so a client-side ability to query the model. And it is free of charge for non-commercial use with some query limiting. So yeah, again, similar to AlphaFold, they are making this available to scientists to use. They haven't open sourced this yet, the model itself, but they did explain how it works. So certainly exciting and always fun to see DeepMind doing this kind of stuff. And up next, we have direct reasoning optimization, DRO. So we've got, you know, GRPO, we've got DPO, we've like, you know, there's so many, so many POs or ROs or O's, so many O's. So LLMs can reward and refine their own reasoning for open-ended tasks. I like this paper. I like this paper a lot. It's, I think I might've talked about this on the podcast before. I used to have a prof who would like ask these very simple questions when you were presenting something and they were like embarrassingly simple and you would, you would be embarrassed to ask that question, but then that always turns out to be the right and deepest question to ask. This is one of those papers. It's like, it's very simple concept, but it's something that when you realize it, you're like, Oh my God, that was missing. So first let's just talk about how currently we typically train reasoning into models, right? So you have some output that you know is correct, right? Some answer, the desired or target output, and you've got your input. So what you're going to do is you're going to feed your input to your model. You're going to get it to generate a bunch of different reasoning traces. And then in each case, you're going to look at those reasoning traces, feed them into the model. And based on the reasoning trace that the model generated, see what probability it assigns to the target output that you know is correct, right? So reasoning traces that are correct in general will lead to a higher probability that the model places on the target outcome because it's the right outcome. So if the reasoning is correct, it's gonna give a higher probability to the outcome. So this is sort of, it feels a little bit backwards from the way we normally train these models, but this is how it's done, at least in GRPO, group relative policy optimization. So essentially you reward the model to incentivize high probability of the desired output conditioned on the reasoning traces. And this makes you generate over time better and better reasoning traces because you want to generate reasoning traces that assign higher probability to the correct output. So the intuition here is if your reasoning is good, you should be very confident about the correct answer, right? Now this breaks and it breaks in a really interesting way. Even if your reference answer is exactly correct, you can end up being too forgiving to the model during training because the way that you score the model's confidence in the correct answer based on the reasoning traces is you average together essentially the confidence scores of each of the answer tokens in the correct answer. Now, the problem is the first token of the correct answer often gives away the answer itself. So even if the reasoning stream was completely wrong, like even if, let's say the question was like, who scored the winning goal in the soccer game, and the answer was Lionel Messi, if the model's reasoning is like, I think it was Cristiano Ronaldo, the model is going to, okay, from there, assign a low probability to Lionel, which is the first word of the correct answer. But once it reads the word Lionel, the model knows that Messi must be the next token. So it's going to assign up actually a high probability to Messi, even though its reasoning trace said Cristiano Ronaldo. And so essentially, this suggests that there are some tokens in the answer that are going to actually correctly reflect the quality of your model's reasoning. So if your model's reasoning was, I think it was Cristiano Ronaldo, and the actual answer was Lionel Messi, well, Lionel, you should expect it to have very low confidence in. So that's good. You'll be able to actually correctly determine that your reasoning was wrong there. But once you get Lionel as part of the prompt, then Messi all of a sudden becomes obvious. And so you get a bit of a misfire there. So essentially what they're going to do is they're going to calculate, like they'll feed in a whole bunch of reasoning traces and they'll look at each of the tokens in the correct output and see which of those tokens vary a lot. Tokens that are actually reflective of the quality of the reasoning should have high variance, right? Because if you have good reasoning trajectory, those tokens should have high confidence. And if you have a bad reasoning structure, they should have low confidence. But then you have some like kind of less reasoning reflective tokens, like say Messi and Lionel Messi, because then Lionel has already given it away. You should expect Messi to consistently have high confidence. Because again, even if your reasoning trace is totally wrong, by the time you get Lionel, by the time you've read Lionel, Messi is obvious. It's almost like, you know, if you're writing a test and you can see like the first word in the correct answer. Well, yeah, you're going to get even if you're thinking was completely wrong, you're going to get the correct second word if the answer is Lionel Messi. So anyway, this is just a way that they use to kind of detect good reasoning. And then they feed that into anyway, a broader algorithm that beyond that is fairly, fairly simple. Nothing too shocking. They just fold this in to something that looks a lot like a GRPO to get this DRO algorithm. Right. Yeah. They spent a while in the paper contrasting it with other recent work that deals with, that doesn't pay attention to tokens, basically. So that just to contextualize what you were saying, their focus is on this R3, reasoning reflection reward. And DRO, direct reasoning optimization, is basically GRPO, what people use generally for RL, typically with verifiable rewards. Here, their focus is how do we train kind of generally in an open-ended fashion over long reasoning chains, identify some of these issues and existing approaches, and highlight this reasoning reflection reward that basically is looking at consistency between these tokens in the chain of thought and in the output as a signal to optimize over. And as you might expect, they do some experiments. They show that this winds up being quite useful. And I think another indication of we are still in the early-ish days of using RL and training reasoning. There's a lot of noise and a lot of significant insights being leveraged. Last thing, DRO, I guess kind of a reference to DPO, as you said. DPO is direct preference optimization versus direct reasoning optimization. Not super related. It's just, I guess, fun naming conventions. Because aside from arguably being sort of analogous in terms of the difference between RL-based preference alignment and DPO. Anyway, it's kind of a funny reference. Yeah. Next paper, Farseer, a refined scaling law in large language models. So we've talked about scaling laws a ton. Basically, you try to collect a bunch of data points of, you know, once you use this much compute or this much training flops or whatever, you get to this particular loss on language prediction, typically on the actual metric of perplexity. And then you fit some sort of equation to those data points. And what tends to happen is you get a fairly good fit that holds for future data points that typically you're like scaling up, scaling up, scaling up. your loss goes down and down and down. And people have found that somewhat surprisingly, you can get a very good fit that is very predictive, which was not at all kind of common idea or something that people had really tried pre-2020. So what this paper does is basically do that, but better. It's a novel and refined scaling law that provides enhanced predictive accuracy and they do that by just systematically constructing a model loss surface and doing just a better job of fitting to empirical data. They say that they improve upon the Chinchilla law, one of the big ones from a couple of years ago, by reducing extrapolation error by 433%. So a much more reliable law, so to speak. Yeah, the Chinchilla scaling law was somewhat famously Google's correction to the initial OpenAI scaling law that was proposed, I think, in a 2019 paper. This is the so-called Kaplan scaling law. And so Chinchilla was sort of heralded as this kind of big and ultimately maybe pseudo final word on how scaling would work. It was more data heavy than the Kaplan scaling laws, notably. But what they're pointing out here is Chinchilla works really well for mid-sized models, which is basically where it was calibrated, like, you know, what it was designed for, but it doesn't do great on very small or very large models. And obviously, given that scaling is a thing, very large models matter a lot. And the whole point of the scaling law is to extrapolate from where you are right now to see like, okay, well, if I train a model a hundred times the scale and therefore at, you know, let's say a hundred times this budget, where would I expect to end up? And you can imagine how much depends on those kinds of decisions. So you want a model that is really well calibrated and extrapolates really well, especially to very large models. They do a really interesting job in the paper. We won't go into detail, but especially if you have a background in physics like thermodynamics, they play this like really interesting game where they'll use finite difference analysis to kind of separate out dependencies between N, the size of the model, and D, the amount of data that it's trained on. And that ultimately is kind of the secret sauce, if you want to call it that here. There's a bunch of other hijinks, but the core pieces, they sort of break the loss down into different terms, one of which only depends on N, the other of which only depends on D. So one is just model size dependent. The other is only dependent on the size of the training data set. But then they also introduce this interaction effect between N and D, between the size of the model and the amount of data it's trained on. And then they end up deriving what should that term look like. That's one of the framings of this that's really interesting. Just to kind of nutshell it, if Chinchilla says that data scaling follows a consistent pattern, it's like D to the power of some negative beta coefficient, regardless of model size, like no matter how big your model is, it's always D to the power of negative B. So if I give you the amount of data, you can determine the contribution of the data term. What Farts here says is data scaling actually depends on model size. Bigger models just fundamentally learn from data in a different way. And we'll park it there, but there's a lot of cool extrapolation to figure out. How exactly does this term have to look? Exactly. And this is very useful, not just to sort of know what you're going to get. That aspect of it means that for a given compute budget, you can predict what balance of data to model size is likely optimal. And basically is when you're spending millions of dollars training a model, it's pretty nice to know these kinds of things. Right. And one more paper. Next one is LLM first search self-guided exploration of the solution space. So the gist of this is there are many ways to do search or search just means, you know, you're look at one thing and then you decide on some other things to look at. and you keep doing that until you find a solution. So one of the typical ways is Monte Carlo Tree Search, a classic algorithm. And this was, for instance, done with AlphaGo. If you want to combine this in LLM, typically what you do is you assign some score to a given location and make perhaps some predictions. And then you have an existing algorithm to sample or to decide where to go. So the key difference here with LLM for a search is basically forget that Monte Carlo to research, forget any preexisting search algorithm or technique, just make the LLM decide where to go. It can decide how to do the search. And they say that this is more flexible, more context sensitive, requires less tuning and just seems to work better. Yeah, it's all prompt level stuff, right? So there's no optimization going on, no training, no fine tuning. It's just like, give the model a prompt. So number one, find a way to represent the sequence of actions that have led to the current moment in whatever problem the language model is trying to solve in a way that's consistent. So like essentially format, let's say all the chess moves up till this point in a consistent way so that the model can look at the state and the history of the board, if you will. And then give the model a prompt that says, OK, from here, like I want you to decide whether to continue on the current path or look at alternative branches, alternative trajectories. The prompt is like, here are some important considerations when deciding whether to explore or continue. and then it lists a bunch. And then similarly, they have the same, but for the evaluation stage where you're scoring the available options and getting the model to choose the most promising one. So, you know, it's like, here are some important considerations when evaluating possible operations that you could take or actions you could take. So once you combine those things together, basically at each stage, I'll call it of the game or the problem solving, the model has a complete history of all the actions taken up to that point. It's then prompted to evaluate the options before it and to decide whether to continue to explore and kind of add new options or to select one of the options and execute against it. Anyway, that's basically it. Like, it's a pretty conceptually simple idea. Just offload the tree and branching structure development to the model so that it's thinking them through in real time. Pretty impressive performance jumps. So when using GPT-40, when compared with standard Monte Carlo tree search on this game of countdown, where essentially you're given a bunch of numbers and all the standard mathematical operations, addition, division, multiplication, subtraction, you're trying to figure out how do I combine these numbers to get a target number? So at each stage, you have to choose, okay, do I try adding these together? Anyway, so 47% on this using this technique versus 32% using Monte Carlo tree search. And this effect amplifies. amplifies. So the advantage amplifies as you work with stronger models. So on O3 Mini, for example, 79% versus 41% for Monte Carlo Tree Search. So raising models seem to be able to take advantage of this You can think of it as a kind of scaffold a lot better It also uses fewer tokens So it getting better performance It using fewer tokens so less compute than Monte Carlo Tree Search as well So that really interesting right? This is a way more efficient way of squeezing performance out of existing models. And it's all just based on very kind of interpretable and tweakable prompts. Right. And they compare this not just to Monte Carlo Tree Search. We also compare it to three thoughts or three of thoughts, breadth for search, best for search. All of these are, by the way, are pretty significant because search broadly is like, there's a sequence of actions I can take and I want to get the best outcome. And, you know, so you need to think many steps ahead. And so depending, branches here mean like I take this step and this step. Well, So you can either go deeper or wider in terms of how many steps you consider one step ahead, two step ahead. And this is essential for many types of problems. Chess, go, obviously, but broadly we do search and all sorts of things. So having a better approach to search means you can do better reasoning, means you can do better problem solving. And moving on to policy and safety, we have one main story here called unsupervised elicitation of language models. This is really interesting. and I'll be honest, like was a head scratcher for my, like I spent a good embarrassing amount of time with Claude trying to help me through the paper, which is sort of ironic because if I remember it's an anthropic paper, but this is essentially a way of getting a language model's internal understanding of logic to help it to solve problems. So imagine that you have a bunch of math problems and solutions. So for example, you know, what's five plus three, and then you have a possible solution, right? Maybe it's eight. The next problem is like, what's seven plus two, and you have a possible solution. And that possible solution is maybe 10, which is wrong, by the way. So some of these possible solutions are going to be wrong. So you have a bunch of math problems and possible solutions, and you don't know which are correct and incorrect. And you want to train a language model to identify correct solutions, right? You want to figure out which of these are actually correct. So imagine you just lay these all out in a list. You have, you know, what's five plus three and then solution eight, what's seven plus two solution 10 and so on. Now what you're going to do is you're going to randomly assign correct and incorrect labels to a few of these examples, right? So you'll say, you know, five plus three equals eight, and you'll just randomly say, okay, that's correct. And seven plus two equals 10, which by the way is wrong, but you'll randomly say that's correct, right? And then you're going to get the model to say, given the correctness scores that we have here, given that solution one is correct and solution two is correct, what should solution three be, roughly? Or, you know, given all the incorrect and incorrect labels that we've assigned randomly, secretly, what should be this missing label? And generally, because you've randomly assigned these labels, the model is going to get really confused because there's a logical inconsistency between these randomly assigned labels. A bunch of the problems that you've labeled as correct are actually wrong and vice versa. And so now what you're going to do is essentially try to measure how confused the model is about that problem. And you are then going to flip one label. So you'll kind of think of like flipping the correct or incorrect label on one of these problems from correct to incorrect, say, and then you'll repeat and you'll see if you get a lower confusion score from the model. Anyway, this is roughly the concept. And so over time, you're going to gradually converge on a lower, lower confusion score. And that's, it sort of like feels almost like the model's relaxing into the correct answer, which is why this is a lot like simulated annealing. If you're familiar with that, you're making random modifications to the problem until you get a really low loss and you gradually kind of relax into the correct answer. I hope that makes sense. It's sort of like, you kind of got to see it and it's, yeah. Right. Just to give some motivation, they frame this problem. And this is from a topic and a couple other institutes, by the way. They frame this in the context of superhuman models. So the unsupervised elicitation part of this is about the aspect of how do you train a model to do certain things, right? And these days, the common paradigm is you train your language model via pre-training. Then you post-train, you have some labels for your words or preferences of our outputs, and then you do RLHF or you do DPO to make a model do what you want it to do. But the framework or the idea here is once you get to superhuman AI, well, maybe humans can't actually see what it does and kind of give it to labels of what is good and what's not. So this internal coherence maximization framework makes it so you can elicit the good behaviors, the desired behaviors from the LLM without external supervision by humans. And the key distinction here from previous efforts in this kind of direction is that they do it at scale. So they train a Cloud 3.5 haiku-based assistant without any human labels and achieve better performance than its human supervised counterparts. They demonstrate in practice on a significantly sized LLM that this approach can work. And this could have implications for future even larger models. Next up, a couple of stories on the policy side. Well, actually only one story. It's about Taiwan and it has imposed technology export controls on Huawei and SMIC. Taiwan has actually blacklisted Huawei and SMIC, the semiconductor manufacturing international corp. And this is from Taiwan's International Trade Administration. They have also included subsidiaries of these. It's an update to their so-called strategic high-tech commodities entity list. And apparently they added not just those, 601 entities from Russia, Pakistan, Iran, Myanmar, and mainland China. Yeah. And one reaction you might have looking at this is like, wait a minute, I thought China was already barred from accessing, for example, chips from Taiwan. And you're absolutely correct. That is the case. That's my reaction. Yeah. No, totally. Totally. It's a great question. So what is actually being added here? And so the answer is because of U.S. export controls, and we won't get into the reason why the U.S. has leverage to do this, but they do. Taiwanese chips are not going into mainland China, at least theoretically. Obviously, Huawei finds ways around that. But this is actually a kind of broader thing to deal with a whole bunch of plant construction technologies, for example, specialized materials, equipment that isn't necessarily covered by U.S. controls. So there's a broader supply chain coverage here, whereas U.S. controls are more focused on cutting off like specifically chip manufacturing. Here, Taiwan is formally blocking access to the whole semiconductor supply chain. It's everything from specialized chemicals, materials to manufacturing equipment, technical services. So sort of viewed as this loophole closing exercise coming from Taiwan. This is quite interesting because it's coming from Taiwan as well. This is not the U.S. kind of leaning in and forcing anything to happen, though who knows what happened behind closed doors. It's interesting that Taiwan is taking this kind of hawkish stance on China. So even though Huawei couldn't get TSMC to manufacture their best chips, they have been working with SMIC to develop some domestic capabilities for chip manufacturing. Anyway, this basically just makes it harder for that to happen. Next up, paper dealing with some concerns, actually from a couple of weeks ago, but I don't think we covered it. So worth going over it pretty quickly. The title of a paper is Your Brain on the ChatGPT, Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing Tasks. So what they do in this paper is have 54 participants write essays. Some of them can use LLMs to help them do that. Some of them can use search engines to help them do that. Some of them have to do it themselves, no tools at all. And then they do a bunch of stuff. They first measure the brain activity with EEGs to basically assess cognitive load during essay writing. They follow up by looking at recall metrics. and the use of results is there's significant differences between the different groups. EEGs reveal less so-called brain connectivity between brain-only participants and LLM participants and search participants. Similarly, self-reported ownership, recall, all these things differed. This one got a lot of play, I think, on Twitter and so on and quite a bit of criticism also, I think in overblowing the conclusions, I think the notion of cognitive debt, the framing here is that there's long-term negative effects on cognitive performance due to decreased mental effort and engagement. And you can certainly question whether that's the conclusion you can draw here. What they show is if you use a tool to write an essay, it takes up less effort and you probably don't remember what is in the essay as well. Does that transfer to long-term negative effects on cognitive performance due to decreased mental effort and engagement? Maybe. All I have is a personal take on this too. I think that good writers are good thinkers because when you are forced to sit down and write something, at least it's been my experience, that I don't really understand something until I've written something about it with intent. And so, in fact, when I'm trying to understand something new, that I actually make myself write it out because it just doesn't stick in the same way. Different people may be different, but I suspect that maybe less so than some people might assume they are. So I think at least for people like me, I imagine this would be an effect. It's interesting. They say, yeah, after writing 17% of chat GPT users could quote their own sentences versus 89% for the brain only group, the ones who didn't use even Google. The other interesting thing here is that by various measures, Google is either between using chat GPT and going brain only, or it can even be slightly better than brain only. I thought that was quite interesting, right? Like Google is sort of this thing that allows like fairly obsessed people like myself to kind of do deep dives on, let's say, technical topics and learn way faster than they otherwise could without necessarily giving them the answer. And chat GPT, at least, or LLMs at least open up the possibility to not do that. Now, I will say, I think there are ways of using those models that actually do accelerate your learning. I think I've experienced that myself, but there has to be some kind of innate thing that you do, at least, I don't know, I'm self-diagnosing right now, but there's got to be some kind of innate thing that I do, like whether it's writing or drawing something or making a graphic to actually make it stick and make me feel a sense of ownership over the knowledge. But yeah, I mean, look, we're going to find out, right? People have been talking about the effects of technology on the human brain for since the printing press, right? When people were saying like, hey, we rely on our brains to store memories. If you just start getting people to read books, well, now the human ability to have long-term memory is going to atrophy. And you know what? It probably did in some ways, but we kind of found ways around that. So I think this may turn out to be just another thing like that, or it may turn out to actually be somewhat fundamental because, you know, back in the days of the printing press, you still had to survive. Like, you know, there was enough kind of real and present pressure on you to learn stuff and retain that, you know, maybe it didn't have the effect it otherwise would. But interesting study. I'm sure we'll keep seeing analyses and reanalyses for the next few months. Yeah, quite a long paper, like 87 pages, lots of details about the brain connectivity results. Ironically, it was too long for me to read. It's actually true. I used an LLM for this one. Anyway, I have seen quite a bit of criticism on the precise methodology of a paper and some of its conclusions. I think also in some ways it's very common sense. If you don't put in effort doing something, you're not going to get better at it. You know, that's already something we know, but I guess I shouldn't be too much of a hitter. I'm sure this paper also has some nice empirical results that are useful in, as you say, like a very relevant line of work with regards to what actual cognitive impacts usage of LMS has and how important is it to like go brain only sometimes. sometimes. All right, on to synthetic media and art. Just two more stories to cover. And as promised in the beginning, these ones are dealing with copyright. So last week, we talked about how Anthropic scored a copyright win. The gist of that conclusion was that using content from books to train LLMs is fine, at least for Anthropic. What is actually bad is pirating books in the first place. So Anthropic bought a bunch of books, scanned them, and used the scan data to train that alum. And that kind of passed the bar. It was okay. So now we have a new ruling about a judge rejecting some authors' claims that Meta AI training has violated copyrights. So a federal judge has dismissed a copyright infringement claim by 13 authors against Meta for using their books to train its AI models. The judge, Vincent Bria, has ruled that met his use of nearly 200,000 books, including the people suing, to train the Lama language model constituted fair use. And this does similarly align with a ruling, their ruling, about anthropic with Claude. So this is a rejection of the claim that this is piracy. Basically, the judgment is that the outputs of LAMA are transformative, so you're not infringing on copyright. And this is, you know, using the data for training and a language model is for use and copyright doesn't apply. At least as far as you can tell, is again, not a lawyer, is in conclusion. Seems like a pretty big deal, like the legal precedent for whether it's legal to use the outputs of a model when some of the inputs to it were copyrighted appears to be being kind of figured out. Yeah, this is super interesting, right? You've got judges trying to like square the circle on allowing what is obviously a very transformational technology. And but I mean, the challenge is like no author ever wrote a book until say 2020 or whatever with the expectation this technology would be there. It's just like no one ever imagined that facial recognition would get to where it is when Facebook was first founded and people or MySpace and people first started uploading a bunch of pictures of themselves and their kids. And it's like, yeah, now that's out there. And you're waiting for a generation of software that can use it in ways that you don't want it to write. Like, you know, deep fakes, I'm sure we're not even remotely on the radar of people who posted pictures of their children on MySpace in the late 90s, right? That's like, that is one extreme version of where this kind of argument lands. So now you have authors who write books, you can say like in good faith or assuming a certain technological trajectory, assuming that those books when put out in the world could not technologically be used for anything other than just what they expected them to be used for, which is being read. And now that suddenly changes. And so and it changes in ways that undermine the market quite directly for those books. Like it is just a fact that if you have a great like a book that really explains a technical concept very well and your language model is trained on that book and now can also explain that concept really well, not using the exact same words, but maybe having been informed by it, maybe having, you know, using analogous strategies. It's hard to argue that that doesn't undercut the market for the original book, but it is transformative, right? The threshold the judge in this case was using was that Lama cannot create copies of more than 50 words. Well, yeah, I mean, every word could be different, but it could still be writing in the style of, right? And that's kind of a different threshold that you could otherwise have imagined the judge could have gone with or something like that. But there is openness, apparently, from the judge to this argument that AI could destroy the market for original works or original books just by making it easy to create tons of cheap knockoffs. And they're claiming that likely would not be fair use, even if the outputs were different from the inputs. But again, the challenge here is that it's not necessarily just books, right? It's also like you just want a good explanation for a thing. And the form factor that's best for you is a couple sentences rather than a book. So maybe you err on the side of the language model and maybe you just keep doing that, whereas in the past you might have had to buy a book. So I think overall this makes as much sense as any judgment on this. I don't have – I feel deeply for the judges who are put in the position of having to make this call. It's just tough. I mean you can make your own call as to what makes sense, but man, is this littered with nuance. Yeah, it is worth noting to speak of nuance that the judge did very explicitly say that this is judging on this case specifically, not about the topic as a whole. He did frame it as copyright law being about, more than anything, preserving the incentive for humans to create artistic and scientific works. And fair use would not apply, as you said, to copying that would significantly diminish the ability of copyright holders to make money from their work. And so in this case, Meta presented evidence that book sales did not go down after Lama released for these offers, which included, for instance, Sauer Silverman, Junot Diaz, and overall, there were 13 offers in this case. so yes this is not necessarily establishing precedent in general for any suit that is brought but at least in this case the conclusion is that it doesn't have to pay these offers and generally did not go against copyright by training on the data of their books without asking for permission or paying them. And just one last story. The next one is that Getty has dropped some key copyright claims in its lawsuit against Stability AI, although it is continuing a UK lawsuit. So the primary claim against Stability AI by Getty was about copyright infringement. So they dropped to claim about stability AI using millions of copyrighted images to train its AI model without permission. But they still are keeping the secondary infringement and I guess trademark infringement claims that say that AI models could be considered infringing articles if used in the UK, even if they train elsewhere. So honestly, don't fully get the legal implications here. It seems like in this case in particular, it was the claims were dropped because of weak evidence and lack of knowledgeable witnesses from Stability AI. There's also apparently jurisdictional issues where these kind of lacking evidence could be problematic. So development that is not directly connected to these prior things we were discussing seems to be, again, fairly specific to this particular lawsuit. But another case of copyright in cases going forward, this one being a pretty significant one, dealing with training on the images. and if you're dropping your key claim in this lawsuit, that bodes well for Stability AI. And that's it for this episode of Last Week in AI. Thank you for all of you who listened at 1x speed without speeding up and thank you for all of you who are tuning week to week, share the podcast, review, and so on and so on. Please keep tuning in. Thank you. Let it slide. Last weekend, AI, come and take a ride. From the labs to the streets, AI's reaching high. New tech emerging, watching surgeons fly. From the labs to the streets, AI's reaching high. Algorithms shaping up the future sees. Tune in, tune in, get the latest with ease. Last weekend, AI, come and take a ride. Hit the lowdown on tech and let it slide. I'm a class with AI, come and take a ride. I'm a lad through the streets, AI's reaching high. From neural nets to robots, the headlines pop. Data-driven dreams, they just don't stop. Every breakthrough, every code unwritten, on the edge of change, with excitement we're smitten. From machine learning marvels to coding kings, futures unfolding, see what it brings.

Share on XShare on LinkedIn

Related Episodes

Comments
?

No comments yet

Be the first to comment

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies