
#221 - OpenAI Codex, Gemini in Chrome, K2-Think, SB 53
Last Week in AI • Andrey Kurenkov & Jacky Liang

#221 - OpenAI Codex, Gemini in Chrome, K2-Think, SB 53
Last Week in AI
What You'll Learn
- ✓OpenAI has released a new version of GPT-5 Codex, which is designed to be better for coding tasks
- ✓Google has integrated the Gemini AI assistant into Chrome, allowing users to interact with it directly from the browser
- ✓Anthropic has added new file creation capabilities to its Cloud platform, including the ability to generate spreadsheets and PDFs
- ✓OpenAI has secured a memorandum of understanding with Microsoft, indicating a shift in their relationship as OpenAI transitions to a for-profit model
- ✓Microsoft is reducing its reliance on OpenAI by integrating AI from Anthropic into its Office 365 applications
- ✓Humanoid robotics startups like Figure AI and Unitree are attracting significant investment and making progress in developing advanced hardware
Episode Chapters
Introduction
The hosts discuss the podcast's recent inconsistency in output and introduce the topics for the episode.
Tools and Apps
The episode covers updates to OpenAI's Codex, Google's integration of Gemini into Chrome, and Anthropic's new file creation capabilities in Cloud.
Applications and Business
The discussion focuses on the business developments surrounding OpenAI, including its transition to a for-profit model and its relationship with Microsoft, as well as Microsoft's integration of Anthropic's AI.
Robotics Advancements
The episode highlights the significant funding and progress being made by humanoid robotics startups like Figure AI and Unitree.
AI Summary
This episode of the Last Week in AI podcast covers a range of AI-related news, including OpenAI's release of a new version of GPT-5 Codex, Google's integration of the Gemini AI assistant into Chrome, Anthropic's new file creation features in Cloud, and updates on the business side of OpenAI and its relationship with Microsoft. The episode also discusses the growing investment and progress in the humanoid robotics space, with startups like Figure AI and Unitree making significant advancements.
Key Points
- 1OpenAI has released a new version of GPT-5 Codex, which is designed to be better for coding tasks
- 2Google has integrated the Gemini AI assistant into Chrome, allowing users to interact with it directly from the browser
- 3Anthropic has added new file creation capabilities to its Cloud platform, including the ability to generate spreadsheets and PDFs
- 4OpenAI has secured a memorandum of understanding with Microsoft, indicating a shift in their relationship as OpenAI transitions to a for-profit model
- 5Microsoft is reducing its reliance on OpenAI by integrating AI from Anthropic into its Office 365 applications
- 6Humanoid robotics startups like Figure AI and Unitree are attracting significant investment and making progress in developing advanced hardware
Topics Discussed
Frequently Asked Questions
What is "#221 - OpenAI Codex, Gemini in Chrome, K2-Think, SB 53" about?
This episode of the Last Week in AI podcast covers a range of AI-related news, including OpenAI's release of a new version of GPT-5 Codex, Google's integration of the Gemini AI assistant into Chrome, Anthropic's new file creation features in Cloud, and updates on the business side of OpenAI and its relationship with Microsoft. The episode also discusses the growing investment and progress in the humanoid robotics space, with startups like Figure AI and Unitree making significant advancements.
What topics are discussed in this episode?
This episode covers the following topics: LLMs, AI assistants, Robotics, AI business and partnerships.
What is key insight #1 from this episode?
OpenAI has released a new version of GPT-5 Codex, which is designed to be better for coding tasks
What is key insight #2 from this episode?
Google has integrated the Gemini AI assistant into Chrome, allowing users to interact with it directly from the browser
What is key insight #3 from this episode?
Anthropic has added new file creation capabilities to its Cloud platform, including the ability to generate spreadsheets and PDFs
What is key insight #4 from this episode?
OpenAI has secured a memorandum of understanding with Microsoft, indicating a shift in their relationship as OpenAI transitions to a for-profit model
Who should listen to this episode?
This episode is recommended for anyone interested in LLMs, AI assistants, Robotics, and those who want to stay updated on the latest developments in AI and technology.
Episode Description
Our 221st episode with a summary and discussion of last week's big AI news! Recorded on 09/19/2025 Note: we transitioned to a new RSS feed and it seems this did not make it to there, so this may be posted about 2 weeks past the release date. Hosted by Andrey Kurenkov and co-hosted by Michelle Lee Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode:OpenAI releases a new version of Codex integrated with GPT-5, enhancing coding capabilities and aiming to compete with other AI coding tools like Cloud Code.Significant updates in the robotics sector include new ventures in humanoid robots from companies like Figure AI and China’s Unitree, as well as expansions in robotaxi services from Tesla and Amazon’s Zoox.New open-source models and research advancements were discussed, including Google's DeepMind's self-improving foundation model for robotics and a physics foundation model aimed at generalizing across various physical systems.Legal battles continue to surface in the AI landscape with Warner Bros. suing MidJourney for copyright violations and Rolling Stone suing Google over AI-generated content summaries, highlighting challenges in AI governance and ethics.Timestamps:(00:00:10) Intro / Banter Tools & Apps(00:02:33) OpenAI upgrades Codex with a new version of GPT-5(00:04:02) Google Injects Gemini Into Chrome as AI Browsers Go Mainstream | WIRED(00:06:14) Anthropic’s Claude can now make you a spreadsheet or slide deck. | The Verge(00:07:12) Luma AI's New Ray3 Video Generator Can 'Think' Before Creating - CNETApplications & Business(00:08:32) OpenAI secures Microsoft's blessing to transition its for-profit arm | TechCrunch(00:10:31) Microsoft to lessen reliance on OpenAI by buying AI from rival Anthropic | TechCrunch(00:12:00) Figure AI passes $1B with Series C funding toward humanoid robot development - The Robot Report(00:13:52) China’s Unitree plans $7 billion IPO valuation as humanoid robot race heats up(00:15:45) Tesla's robotaxi plans for Nevada move forward with testing permit | TechCrunch(00:17:48) Amazon's Zoox jumps into U.S. robotaxi race with Las Vegas launch(00:19:27) Replit hits $3B valuation on $150M annualized revenue | TechCrunch(00:21:14) Perplexity reportedly raised $200M at $20B valuation | TechCrunchProjects & Open Source(00:22:08) [2509.07604] K2-Think: A Parameter-Efficient Reasoning System(00:24:31) [2509.09614] LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software EngineeringResearch & Advancements(00:28:17) [2509.15155] Self-Improving Embodied Foundation Models(00:31:47) [2509.13805] Towards a Physics Foundation Model(00:34:26) [2509.12129] Embodied Navigation Foundation ModelPolicy & Safety(00:37:49) Anthropic endorses California's AI safety bill, SB 53 | TechCrunch(00:40:12) Warner Bros. Sues Midjourney, Joins Studios' AI Copyright Battle(00:42:02) Rolling Stone Publisher Sues Google Over AI Overview Summaries See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
Full Transcript
Hello and welcome to the Last Week in AI podcast where you can hear a chat about what's going on with AI. As usual in this episode we will summarize and discuss some of last week's most interesting AI news. You can head on over to lastweekina.com for the links to all the stories. You can also go to the episode description for the timestamps and so on. I am one of your regular hosts, Andrei Krenkov. I studied AI in grad school and now work on it in a startup. And once again, this week, Jeremy is busy. Unfortunately, Jeremy has been very busy lately, so he's not been around. But I once again have a great co-host with me, Michelle Lee. Hey, everyone. I am Michelle Lee, your guest host for the week. I went to grad school with Andre, also studying AI. And now I am the founder and CEO of Medra, which is a physical AI startup based in San Francisco. Right. And we can kind of do a quick bit of news on that. You just announced your big launch milestone for the company. So go ahead and feel free to let people know more about Medra for a bit. Medra AI, we're a physical AI company for life sciences. we're building the physical AI infrastructure that powers the scientific frontier. Our physical AI platforms can do lab work inside life science companies and generate a lot of experimental data, which then in turn can help train frontier and foundation models in sciences and also help our partners be able to find cures to disease faster. Yeah, so working with a bunch of robot arms, I just saw them recently doing a lot of pipetting and whatnot. I don't really know the details. Somewhat infimely that we'll have in this episode, quite a few stories about robotics, actually. There's a lot going on with humanoids, a lot going on with self-driving cars, even with foundation models. So you can look forward to some discussion on that front. Before we get into it, real quick, I do want to acknowledge for regular listeners, the output has been inconsistent lately. As I've said, Jeremy has been busy with work and whatever else is up to. So as always, I promise to try and make it more consistent, but just bear with us, please. Let's go ahead and start with the news. In tools and apps, the first story is OpenAI upgrades codecs with a new version of GP5. So they now have GPT-5 Codex, which is just GPT-5, but better for coding is what it sounds like. It is now available if you're using the Codex CLI or the Codex IDE tool. You can switch from using regular GPT-5 to GPT-5 Codex. And also if you're using WebAgent, it's now powered by GPT-5 Codex. So pretty significant news given that it looks like OpenAI is trying to catch up to Anthropic and be a competitor to Cloud Code where they're a little bit behind is my impression. Yeah, I think right now, definitely talking to software developers, the general consensus is that Cloud Code is still the best tool out there. So it's very interesting to see OpenAI release new tools and better tools to make it more powerful for coding tasks. Yeah, and people have been a little angry at Anthropik lately due to infra issues. So from like a business strategy, I think OpenAI has a real opportunity to get some converts. And if you go look on Reddit or Twitter, there's a bit of sentiment of like, oh, I've switched to Codex now. It's great. I'm trying it out. I'm not a convert yet, but I might become one. Next up, we have Google injects Gemini into Chrome as AI browsers go mainstream. So pretty much what it sounds like, they now have a version of Chrome where on the top right, there's a little Gemini button. You click on it and you can ask questions about the tab, talk to Gemini, potentially later going to more agentic tasks. very much in line with what we've been seeing from Perplexity, from the browser company, like integrating chatbots into the browser. Also, Anthropic recently had their Chrome extension. So it seems like just a matter of time till this happened. Actually, it took Google a bit long to do this, if anything. But definitely taking us towards the future where you just have a chatbot literally in every single piece of software you ever use. Yeah, I wonder if it took them a while because of also all the competitiveness issues that Google is facing with Chrome, because this is definitely gives them a huge competitive advantage that they own both the browser with Chrome, one of the most popular browsers, and also now can integrate that directly with their AI. yeah yeah right perplexity did try to buy chrome or make a bid for it so i guess that tracks yeah have you tried using any of these browser-based ai actually i did use chat gpt agent a little bit so chat gpt agent isn't like browser plugin or anything but it does browse the web for you and do stuff. And I found it to be pretty powerful, like things that you could not do. Otherwise, it can go and open your Google Doc and click on links and go and do it for like half an hour, which is pretty impressive. So I could see these being like an even easier way to automate stuff you do via prompt instead of anything else. Yeah, that's interesting. I tried Dia for a little bit and just didn't find it smooth enough really or didn't find that it brought enough value but I'd be interested in checking out Gemini directly in Chrome. And next we go to Anthropic. They have a new feature in Cloud. It can now make you spreadsheets or PDFs which I think is actually pretty different. Like I don't know that TradGPT or others can make it seemingly. I can do PowerPoints. I don't know about spreadsheets, but because OpenAI has such a strong collaboration in Microsoft, I believe they were able to roll out a lot of features with Microsoft 365 pretty early. Oh, nice. Well, in CloudNow, there's an experimental feature called Upgraded File Creation Analysis. It sounds like we might be running like a little cloud code agent within it that if you upload a file, it can do agentic stuff to it. So yeah, if you are working with spreadsheets or PowerPoints or PDFs, this should really make Cloud more powerful for that. And just one last story in the section. We've got a new video model. This one is from Luma, and it is their ray-free model. What they are saying is an AI reasoning video model, which is kind of interesting. They say it's using reasoning power to create AI video clips with more complex action sequences. I don't know if that means that it interleaves video creation with reasoning or pretty steady kind of progression and video creation for the last year or two. Now you're able to get clips in 20 seconds here and up press them if you want. And as with any of these video models, you really have to go and look at the previews to see the improved clarity, prompts, visibility, all those kinds of things. Yeah, interesting that it calls itself its first reasoning video model because I highly doubt that all the other video models don't use reasoning at all. Yeah, it's hard to know to what extent this is kind of marketing speak and to what extent this is architectural or other things like that. This is coming, I guess, with Google having released VR3, I don't know how long ago, but not too long ago, and being very impressive and very powerful. So it's getting increasingly competitive. Definitely. On to applications and business. And as usual, or I guess as often as the case, we begin with OpenAI having some very businessy kinds of updates. So for the past year or something like that, we've been trying to go for profit, as we've covered over many months, have had many legal struggles. And now there's a bit of an update on it. Apparently, OpenAI secured Microsoft's blessing for the transition to the for profit. So they have now this memorandum of understanding. so kind of an unofficial agreement, so to speak, where they have terms that they are agreeing upon, where they will retain some sort of relationship, but it'll be not quite as exclusive as OpenAI and Microsoft has had, I guess, prior to 2025. We've seen them become a little more antagonistic over time as they've tried to transition to for-profit. Let's see. Is there anything else to say here? I don't know. It's like a memorandum of understanding. I feel like... There's not many details here. Yeah, it just sounds like some interesting updates, maybe to help with fundraising, maybe to just produce some more news. Yeah, there's not really many details here. All we know is apparently this is ending months of negotiation. And this was stated in a joint statement. So presumably behind the scenes, this involved a lot of back and forth and is kind of a significant update for OpenAI because they are under the gun to do this whole transition. They announced wanting to do it early this year. They still haven't done it. You know, they're in a tough spot. And if they don't complete this transition, they're in real trouble. and related to that we have the next story microsoft is going to apparently lessen its reliance on open ai by buying ai from anthropic so they are going to integrate anthropic into their office 365 applications presumably it's going to be kind of a way to pick your models as you use AI. And at least according to this article, and presumably like reasonable speculation, this is related to whatever tensions are currently existing between Microsoft and OpenAI. Well, maybe that makes sense why the new cloud models now can work with Office 365 applications. And it looks like also OpenAI is also working to reduce their dependency on Microsoft by also working with AI chips, other cloud providers. So it sounds like both parties are trying to lessen their reliance and lessen their partnership. Yeah, that's right. Yeah OpenAI did just sign a massive contract with Oracle Oracle stuck to jump quite a bit So as OpenAI we covered a lot in this podcast They have if you like into business drama OpenAI is a never ending fountain of business drama and sort of interesting developments. And this is just the latest to that. Moving on to slightly less, let's say, boring or businessy news. We've got some stories on robotics. First up, we have Figure AI. They are passing $1 billion in committed capital with their Series C funding round, which would make their post-money valuation $39 billion. Figure is one of the several humanoid robotics startups that are fairly new. I forget how old Figure is, but they must be from 2023-ish. you know obviously pure revenue they're still more or less an r&d lab at this point so pretty cool to see the venture funds still being committed to funding these very ambitious humanoid robotics bets that are seemingly making a lot of progress what i can see maybe michelle you have a take on this yeah i mean we've been seeing really exciting new models come out of physical intelligence. Dyna Robotics just launched with their new fundraise. So very exciting to see more funding going into robotics and also very exciting to see especially more and more efforts into hardware, which has been very much a bottleneck right now in robotics. How do we actually have better hands, better humanoid robots. Six years ago, if you wanted a humanoid robot to do research, you would have to be in a select few universities around the world that actually have access to humanoid robots. And now we have several humanoid companies all trying to build better hardware. So it's very exciting, which I guess will lead to the next news. Exactly. The next news is about China's Unitree, which is already planning an IPO, apparently. So they are saying that the company might be valued at up to $7 billion. dollars. Unitary, if you haven't seen lately, has been big in humanoids. They've unveiled this kind of mini humanoid that is quite affordable and quite capable. And I believe they also were pretty active in the dog quadruped space, which China has been killing it for quite a while now. Even they've been profitable since 2020, actually, with revenues now exceeding like $140 million. So China is, especially on the robotics front, quite competitive with the frontier of AI. But there is a question, I think, here of the software AI side where it's still very tough. Yeah. I mean, it's very cool to see China focusing more and more on humanoids. And in general, in robotics, I heard that there are just dozens of humanoid companies, not just Unitree that have been founded in China. And this is great for the robotics industry as more companies are building hardware, the cost of these general purpose hardware keeps going down. And with Unitree, with their new human or robot, which is very affordable, it costs around the same price of a robotic arm. Most labs now in the US and university can very easily afford their own humanoid, which again was just not true several years ago. Right. Apparently it's what, $16,000 for this Unitree G1 robot, which is actually on the lower end of robotic arms from what I've heard. So very cool. On to another type of robotics, Robotaxis, also a very hot area this year. First up, we've got a couple of stories about Tesla. First of all, Tesla's Robotaxi is planning to test in Nevada. They now have a testing permit from Nevada's Department of Motor Vehicles. On the slightly less positive side, there was reporting from Electrek about there having already been three robotaxi accidents with at least one injury reported as well. And this is from the robotaxi fleet in Austin, which is estimated to have about 12 vehicles still in a very kind of small scale with safety drivers. So if that is true, apparently the NHGCA is investigating Tesla for potentially misreporting the crash data. That would not be a good news, good sign for it in trying to compete to Waymo, which has a stellar record from what I know at least. Yeah, it's very tricky for these companies when they try to avoid or try to hide these accidents because that was really what got Cruz in trouble in San Francisco was when after the accident they tried to hide information and many reasons why Cruz no longer was able to operate in San Francisco. So I hope Tesla is able to be honest and report the accidents correctly so that they can continue building that trust with the government officials. Yeah, for sure. And Robotaxi does seem pretty capable. I'm a major Waymo user. I don't know how often you use it, Michelle, but I'll be looking forward to try Robotaxi whenever I come here. I mean, I love Waymos and it really truly feels like magic. And so very excited to see more and more self-driving cars in the streets. And on that note, one more story about Robotaxis. Next up, we have Amazon's Zoox jumps into U.S. Robotaxi race with Las Vegas launch. So they have this offering now of a public robotaxi service on the Las Vegas Strip. Apparently, they're offering free rides from select locations with plans to expand citywide. So pretty small test, as you might expect, I suppose, with just the initial set of testing from Zoox. They are using their very futuristic model of car where you don't have a steering wheel. And it's like a tiny kind of bus looking thing where you have all the seats facing inward. It looks great. I would love to try it. I've been seeing people, probably employees and testers in San Francisco riding them. It looks so cool because you're facing each other. So you can actually have meetings while you're in the cars, which is very cool. Yeah, and this Zoox Bioway, for those who don't know, was acquired by Amazon back in 2020. They've been working on this since 2014. So Zoox, even though they haven't deployed to the extent that Tesla or Raymo have or haven't demonstrated as much, given their backing and given that they've been in this for a long time, I think they still have a chance to really kind of grow rapidly if this turns out to go well. Yeah, and how fun to have it start in Las Vegas. I know. Yeah, I should go try it out. And just two more stories in the section with more funding news. We've got Replit hitting a free billion dollar valuation with $150 million annualized revenue. That's after they raised $250 million in a new funding round. So Replit, one of the key winners of the Vibe coding era, I suppose, that started this year, seemed to be growing very rapidly in terms of their revenue. And unsurprisingly, I suppose, also getting some impressive fundraising as a result. Replit definitely makes it really easy for people to get started on coding, building their own projects. And they have definitely done a great job at being able to leverage all the new AI coding tools and integrating it with their platform. Right. And as a result, I'm looking at this kind of jumped out at me. Apparently, my revenue went from $2.8 million annualized revenue to $150 million in less than a year. And this company has been around since 2016. So Redblade has been active for a long time as a sort of dev tool for coders, but now having kind of made it usable for non-professionals, they're rocketing upward. Well, honestly, I am surprised they've been able to raise money with only a 2.8 million AR revenue previously for them to grow this big. But yeah, very exciting. We're definitely seeing a lot of AI tools now being able to go to 100, 150, 200 mil revenue in a very short amount of time. So very exciting. And the last story also on fundraising perplexity, the, I guess, primarily search tool. Now they're trying to expand into agents and browsers and so on. They have reportedly raised $200 million at a $20 billion valuation. And this is just after two months after they raised $100 million at an $18 billion valuation. valuation. So one of the very fun things with this podcast is in AI, people just fundraise like constantly. Every few months, these companies are getting billions of dollars if they can. And that is certainly true in this case. Yeah. So also their ARR just hit 200 mil, up from 150 million reported last month. So they're also growing quite a lot in revenue as well. And on to the projects and open source section. Just a couple of things here. First one is K2Think, a parameter efficient reasoning model. So this is a research paper plus an open source model coming from the Institute of Foundation Models from the Mohd bin Zayed University of Artificial Intelligence in the UAE, which I don't think we've covered before, which is interesting. they took an existing model quen 2.5-32b as their base model and then they put it through all the typical reasoning training so they had some fine-tuning some reinforcement learning all the tricks they also have best event sampling some of the stuff on the inference side package in here And as a result, they get a 32 billion parameter model, which is seemingly performing very impressively. According to at least their math results, they are performing better than DeepSeq R1, DeepSeq V3.1, GPT-OSS at a relatively small number of total parameters. Now this is a little bit unfair because they not comparing total active parameters we comparing total parameters But nonetheless I think very cool to see even better open source models now on the reasoning side And pretty impressive to see a university publishing this kind of stuff. Yeah. And it's also interesting that their way of getting to better results is not just more parameters. It's actually thinking about scaling. It's using plan before you think, prompt restructuring. We're just seeing this like time to compute, rethinking prompts, really, really trying to think of it almost as having different agents be able to think through different prompts and surfacing the best ideas come up as now one of the best ways to improve reasoning. And I think even in large foundation models, the time to compute and improving the prompts is so key to getting better model performance. And just one more open source story here. Next one is a benchmark, not a model. The paper that came out of it is called LocalBench, a benchmark for long context, large language models in software engineering. So I assume that's long context software engineering benchmark. The basic point I make is the existing software engineering benchmarks we have, like SBEbench and so on, typically deal with GitHub issues and therefore are pretty localized. So you might be working at code base, but the total amount of work, the total amount of files you need to look at, the total amount of code you need to look at is relatively minor. And as a result, the benchmarks don't necessarily correlate too deeply to the performance you get when you actually try to use them via cloud code or via codecs or via any of these tools. So this paper introduces a whole bunch of tasks. So they have eight categories of long context tasks. They have architectural understanding, cross-file refactoring, feature implementation, bug investigation, et cetera, et cetera. They have like a thousand of each of these eight things. Different difficulty levels in terms of the length of, I think, tokens here. So on the low side, you have what you typically see in the existing benchmarks of 10K to 100K tokens. But then you scale up to 10X, 50X, 100X, those kinds of context lengths for their kind of hardest level. And as you might expect, compared to the easier or, let's say, the shorter existing coding benchmarks, existing systems aren't able to solve these things. As a BE bench, I think now we have like 90%, we're like saturating them. These tasks, the existing models are now unable to fully resolve them. And there's quite a hierarchy in terms of capabilities as well. yeah i think this is great that benchmarks are becoming more and more realistic that's always so important because when the benchmarks aren't realistic we end up building what we can measure and 10k tokens is not at all realistic to the type of coding tasks that people do every day even simple things like 10k is not enough if you're trying to work with multiple files and refactor and the context window, just a lot of people are now just doing a lot of engineering tricks to be able to remember what's happening. And so we don't have to use up all the context window, but it's great if we can start measuring how these models can work with longer and longer contexts. They also interestingly introduce some kind of interesting metrics. So they have a total of eight software engineering excellence metrics, architectural coherence score, dependency, traversal, accuracy, cross-file reasoning depth, system thinking score, robustness score, comprehensiveness score, innovation score, and solution elegance score, all based on, I guess, previous research that suggested variations of these, or at least the last few that are more dealing with code quality. So overall, it seems like a very thoughtful effort to make a very useful benchmark that tracks actual software engineering quality. Yeah, hopefully, this just means these models can keep improving on more realistic tasks. And speaking of continuing to improve, on to the next section, research and advancements. The first story is self-improving embodied foundation models. And this is coming from Google DeepMind in collaboration with Generalist, which I don't know that I'm aware of. Oh, yeah. Generalist is a robotic company that came out of DeepMind. Oh, well, there you go. That makes a lot of sense. So in this collaboration, they introduced a self-improving embodied foundation model. What that means is they begin with something like the RT2 model that came out of DeepMind, where they take a whole bunch of video, a whole bunch of rollouts of robotics, and train robotics foundation model in the sense that you're able to get a robot arm, in this case, to technically do anything. So give it some text, and it'll try to execute a policy to do whatever you want. the self-improving part here is after you do the pre-training in stage two you can do online self-improvement with an on-policy rollouts of a robot so you have ideally one person or maybe two people like supervising actual robots and in these little cages they have something that is able to evaluate success criteria on whatever tasks they're working on. And as a result, you're basically able to generate a continual stream of success and failure rollouts. And at least in the ideal case, creating a larger data set to then train on. And yeah, they implement this with real hardware and show that you're able to do quite a significant improvement on some of these language to table, Aloha, single insertion, real to sim, language table, all these different evaluations of robotic arm-based tasks. This is very smart because in robotics, one of the biggest problems is just we don't have enough data. You want to do imitation learning and behavior cloning. Great. Now you have to collect lots of data, either VR headsets or using Aloha to like teleoperate the robot. Having the self-improvement basically is almost like a simplified reinforcement learning by without needing to do reinforcement learning fully where you only get supervision from the rewards itself. Now you can just predict the reward function and detect the success and use that to supervise and be able to get more trainable data in order to scale up their models. Yeah, in a way, it's almost similar to what people are doing with reasoning models now, which is you pre-train your model, you then align it, and then you do a bunch of executions and actually do reinforcement learning on the language models with these verifiable rewards. This is kind of that in the robotics domain, which I suppose makes a lot of sense. Yeah, the only difference is with reasoning models, you can start out fully self-supervised. Here you have to start out with imitation learning, and then you can, after with enough data, you now can improve it with its own self-supervision. And the next research is also about a foundation model, also about, I guess, physics-related foundation model, although in this case it's not robotics. it's a physics foundation model so the paper is towards a physics foundation model i'm going to be honest it's mostly gonna go over my head so i'm not gonna be able to go deep in but it looks pretty impressive so they frame this as there are existing physics models like physics and form neural networks they can do various things like estimating thermal flows solving shear flow yeah these kinds of things and they try to create a foundation model in the sense that it's one model to do a whole bunch of stuff right and the way they do that is that they have this g phi t g phi t model that is given a set of states and the states are these kind of spatial temporal patches containing things, I guess like state basically, right? So they have forces, fields, et cetera. And they basically just give you a prompt, which is a sequence of states. And just from this sequence of states, the model is able to then do these various kinds of physics related operations like thermal flows and so on. And they do that and they train it on a diverse set of 1.8 terabyte corpus of simulation data on these wide range of physical systems without explicit physics describing features. So seems pretty impressive. Again, I'm not too caught up on the physics simulation side of research, but pretty cool. It's pretty cool, but it seems like they train mostly on simulation data. So I am curious if they can generalize to real data. Yeah, I guess that would be a key question. But they do compare to these specialized models. And apparently it outperforms these specialized architectures on known tasks and also generalizes to other distribution problems. problems. So I guess the hope is you train on enough data, you train on enough varied data, and it's going to be able to do quite well. I'm sure you're right that it needs to go beyond simulation to really be super reliable. Next, we have yet another foundation model. I just decided to make that kind of a theme and also in robotics, but this time instead of arms, it's about legs or regals, I suppose you could say. The paper is Embodied Navigation Foundation Model. And so navigation is one of these sort of pretty base sort of tasks that's been looked at in research over the past decade. It's kind of what it sounds like. The robot is given a goal place to go to, and it needs to make it there, usually by relying on vision. So you can give us to quadrupeds, You can give this to humanoids, robots on wheels. And I typically need to kind of navigate an apartment or some other space to be able to get there And there been quite a bit of research for about a decade on doing reinforcement learning deep learning all sorts of things like that. So here, the researchers have developed NAVFOM, which is a cross-task and cross-embodiment navigation foundation model. So they have 8 million navigation samples from these different tasks and embodiments, where embodiments, again, can be quadrupeds, humanoids, robots on wheels, and all of these, if you just give it an egocentric video and language instructions, the model is then going to predict the trajectory that the agent should take to get you to wherever you want to get. So very useful type of model for where you want, I guess, general purpose robotics, for instance. I have to be honest, I feel like this is just like publishing for the sake of publishing a foundation model, right? Like we have pretty good models to do self-driving. Like that's why earlier in the episode, we talked about several self-driving car company news. And with these kind of like diversity-based foundation models, like, hey, we can do it on humanoid. Hey, we can do it on a car. Hey, we can do it on robot wheels. Oftentimes it's really about diversity because if you look at the benchmark, the performance is like at 64.4%, which still feels quite low to actually be utilized in a real world setting. So I wonder if, you know, for navigation, it's still more important to build the models, probably big foundation models necessary for navigation, but rather than trying to go across different types of platforms focusing on the specific type of platform. Yeah, we do try to incorporate autonomous driving and UAV data here, which to your point, probably isn't necessary. I think navigation benchmarks typically are more indoors oriented. I guess the key benefit to trying to do this cross embodiment stuff is trying to have something that generalizes, right? So they do say that they are taking in different camera view information, have different temporal contexts. Maybe if they focus a little bit to not deal with cars and UAVs, but more so just different types of embodied agents of different heights and different kind of perspectives, I think that could probably be quite useful. Alrighty, well, that's it for research. Lots of foundation models. Next on, we go to policy and safety. First up, we have something in our home state of California. Anthropic has endorsed California's AI safety bill SB 53. So this is, I believe, the kind of follow-up version of regulation that was being discussed earlier that was passed by vetoed by the governor of California. This is a tweaked version that took out some of the, let's say, more onerous requirements. And Anthropic explicitly endorsing it is a pretty significant sign that they think that this is a good way to regulate for AI safety. And SB 53 is an AI safety bill that is meant to regulate basically Anthropic, regulate companies working on advanced AI models that might contribute to risks such as biological weapons or cyber attacks. So as with the previous version of this bill, it passing or not passing would be a pretty big deal. I'm sure OpenAI would not be very happy if it passes, but it probably has a better chance than its predecessor. Anthropic seems to be always on the forefront of really arguing for more safety, but I am surprised that they are going after regulatory efforts too, to improve safety, as it does mean that there will be more requirements and legal requirements for people to innovate on models. Yeah, according to this article and some, I guess, policy experts are saying that this is a more restrained approach compared to previous AI safety bill. So this could be a good, at least according to Firm, it seems to be the right way to do it. They have this quote in their blog post. The question isn't whether we have AI governance, is whether we develop in a thoughtfully today or reactively tomorrow. SB53 offers a solid path toward the former. So the basic point is, according to Anthropic, this is a good way to do this kind of regulation. Next up, moving away from AI safety to copyright stuff, another popular topic for legal battles. This time we have Warner Bros. suing MidJourney. So Warner Bros. is filing a lawsuit against MidJourney, accusing them of copyright violations related to things like Superman, Batman, Bugs Bunny. The complaint alleges that Me Journey has removed safeguards that previously prevented users from creating infringing videos and has resulted in unauthorized creation of Batman and so on. They have also, the team in charge here, has also filed lawsuits to Me Journey on behalf of Disney and Universal. So sounds like it's more of what MidJourney is already facing. Yeah, well, it does seem like MidJourney, compared to a lot of other image generation platforms, doesn't really have as many safeguards against intellectual property violations. But it's also interesting that all these companies are now kind of jumping in and dogpiling and going against MidJourney. Yeah, I think it's because it's a pretty straightforward thing to do. And as with the previous lawsuits here, if you go and read the PDF, the actual complaint, it's kind of a fun one to read just because there are image attachments. And so they have examples of Batman and Superwoman and Wonder Woman and Scooby-Doo and all these characters as images generated from your journey in this lawsuit, which is certainly fun to see. And let's just do one last story. This is a bit of a shorter episode. The last one also deals with lawsuits and copyrights, but this is now in the text domain. the company doing the lawsuit is rolling stones and they are suing google over ai overview summaries so the lawsuit is claiming that google ai overview the panel displays summaries that discourage users from clicking through to the full articles which would impact the publisher's ad and subscription revenue similar to what perplexity has been dealing with i suppose now google kind of doing the same thing as perplexity and giving you this kind of AI summary of a bunch of sources. And I guess it was just a matter of time until Google had to address this. There's details here that apparently publishers like DMG Media and others have reported significant declines in click-through rates since the introduction of AI overviews. Few research found that users are less likely to click through to articles when AI summaries are present in search results. So not a trivial matter. I mean, this is kind of live or die for these kinds of publishers, right? And I love how Google denies these claims, but if you actually ask Gemini if AI overviews result in less traffic, it actually contradicts Google's public stance and says, yes, it does actually reduce. Right. And publishers are on a tough spot here because they need Google, right? They need to be indexed by Google. They need the traffic generated by Google. But on the other hand, now Google is cannibalizing on that business on the clicks. So it's tricky balance to strike. And another kind of interesting question on the legal dynamics, the financial dynamics of kind of an LLM-driven world. As with image generation now with search, text, publishing, all of this is somehow still not resolved. I mean, look, this is very disruptive technology. So a lot of old business models are just going to be disrupted. And I mean, publishing has been very much hurt by the internet as well. So this is another wave of potential less revenue, less clicks for these publishers. So I can see why they are trying to figure out a way to salvage the situation. Well, we'll finish it with that slightly sad detail, although robotics stuff hopefully made up for it. Thanks once again, Michelle, for guest hosting. Yeah, it was fun. It was fun to talk about the latest AI news with you, Andre. Thank you so much for inviting me. Yeah, maybe we'll do it again. We'll see. And thank you also to the listeners, as usual, for tuning in. Apologies once again for not being very consistent. Last Week in AI is supposed to be every week, but sometimes it's not. Please do keep tuning in. Thank you. I'm a lab to the streets, AI's reaching high. New tech emerging, watching surgeons fly. From the labs to the streets, AI's reaching high. Algorithms shaping, but the future sees. Tune in, tune in, get the latest with ease. Last weekend, AI, come and take a ride. Hit the lowdown on tech, and let it slide. Last weekend, AI, come and take a ride. I'm a little ass to the streets, AI's reaching high. From neural nets to robot, the headlines pop. Data-driven dreams, they just don't stop. Every breakthrough, every code unwritten, on the edge of change. With excitement, we're smitten. From machine learning marvels to coding kings. Futures unfolding. See what it brings.
Related Episodes

#228 - GPT 5.2, Scaling Agents, Weird Generalization
Last Week in AI
1h 26m

Why Physical AI Needed a Completely New Data Stack
Gradient Dissent
1h 0m

#227 - Jeremie is back! DeepSeek 3.2, TPUs, Nested Learning
Last Week in AI
1h 34m

#226 - Gemini 3, Claude Opus 4.5, Nano Banana Pro, LeJEPA
Last Week in AI
1h 11m

#225 - GPT 5.1, Kimi K2 Thinking, Remote Labor Index
Last Week in AI
1h 18m

Proactive Agents for the Web with Devi Parikh - #756
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
56m
No comments yet
Be the first to comment