Back to Podcasts
Last Week in AI

#220 - Gemini 2.5 Flash Image, Claude for Chrome, DeepConf

Last Week in AI • Andrey Kurenkov & Jacky Liang

Monday, September 1, 202552m
#220 - Gemini 2.5 Flash Image, Claude for Chrome, DeepConf

#220 - Gemini 2.5 Flash Image, Claude for Chrome, DeepConf

Last Week in AI

0:0052:43

What You'll Learn

  • Gemini 2.5 Flash Image is a highly impressive image editing model from Google that can convincingly modify images while retaining the subject's features.
  • Anthropic has launched a Claude AI agent that integrates with the Chrome browser, allowing users to delegate web-based tasks to the AI.
  • Anthropic has added the ability to remember past conversations to its chatbot, similar to features available in ChatGPT.
  • The episode discusses the implications of these new tools, such as the potential impact on Photoshop, the evolution of AI-powered browser experiences, and the challenges around personalization and memory in chatbots.
  • The episode also touches on the widespread use of ChatGPT by students and the potential for AI-powered 'study mode' features to encourage deeper learning.

Episode Chapters

1

Introduction

The hosts introduce the episode and provide an overview of the topics to be discussed.

2

Gemini 2.5 Flash Image

The hosts discuss the release of Gemini 2.5 Flash Image, a powerful image editing model from Google.

3

Anthropic's Claude AI Agent for Chrome

The hosts discuss Anthropic's launch of a Claude AI agent that integrates with the Chrome browser.

4

Anthropic's Chatbot Memory Feature

The hosts discuss Anthropic's addition of conversation history to its chatbot, similar to features in ChatGPT.

5

Implications and Challenges

The hosts discuss the potential implications of these new tools and the challenges around personalization and memory in chatbots.

6

AI in Education

The hosts touch on the widespread use of ChatGPT by students and the potential for AI-powered 'study mode' features.

AI Summary

This episode of the Last Week in AI podcast covers several recent AI-related news and developments, including the release of Gemini 2.5 Flash Image by Google, which is a powerful image editing model, the launch of Anthropic's Claude AI agent for Chrome, and Anthropic's addition of conversation history to its chatbot. The episode also discusses the implications of these new tools, such as the potential impact on Photoshop, the evolution of AI-powered browser experiences, and the challenges around personalization and memory in chatbots.

Key Points

  • 1Gemini 2.5 Flash Image is a highly impressive image editing model from Google that can convincingly modify images while retaining the subject's features.
  • 2Anthropic has launched a Claude AI agent that integrates with the Chrome browser, allowing users to delegate web-based tasks to the AI.
  • 3Anthropic has added the ability to remember past conversations to its chatbot, similar to features available in ChatGPT.
  • 4The episode discusses the implications of these new tools, such as the potential impact on Photoshop, the evolution of AI-powered browser experiences, and the challenges around personalization and memory in chatbots.
  • 5The episode also touches on the widespread use of ChatGPT by students and the potential for AI-powered 'study mode' features to encourage deeper learning.

Topics Discussed

#Image editing models#AI-powered browser agents#Chatbot personalization and memory#AI in education

Frequently Asked Questions

What is "#220 - Gemini 2.5 Flash Image, Claude for Chrome, DeepConf" about?

This episode of the Last Week in AI podcast covers several recent AI-related news and developments, including the release of Gemini 2.5 Flash Image by Google, which is a powerful image editing model, the launch of Anthropic's Claude AI agent for Chrome, and Anthropic's addition of conversation history to its chatbot. The episode also discusses the implications of these new tools, such as the potential impact on Photoshop, the evolution of AI-powered browser experiences, and the challenges around personalization and memory in chatbots.

What topics are discussed in this episode?

This episode covers the following topics: Image editing models, AI-powered browser agents, Chatbot personalization and memory, AI in education.

What is key insight #1 from this episode?

Gemini 2.5 Flash Image is a highly impressive image editing model from Google that can convincingly modify images while retaining the subject's features.

What is key insight #2 from this episode?

Anthropic has launched a Claude AI agent that integrates with the Chrome browser, allowing users to delegate web-based tasks to the AI.

What is key insight #3 from this episode?

Anthropic has added the ability to remember past conversations to its chatbot, similar to features available in ChatGPT.

What is key insight #4 from this episode?

The episode discusses the implications of these new tools, such as the potential impact on Photoshop, the evolution of AI-powered browser experiences, and the challenges around personalization and memory in chatbots.

Who should listen to this episode?

This episode is recommended for anyone interested in Image editing models, AI-powered browser agents, Chatbot personalization and memory, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

Our 220th episode with a summary and discussion of last week's big AI news! Recorded on 08/30/2025 Check out Andrey's work over at Astrocade , sign up to be an ambassador here Hosted by Andrey Kurenkov and co-hosted by Daniel Bashir Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: Google's newly released Gemini 2.5 image editing model showcases remarkable advancements, enabling highly accurate modifications of subjects while retaining their original features. Anthropic expands Claude with an AI browser agent for Chrome and adds features to remember past conversations, enhancing the user experience and personalization. NVIDIA and AMD to share revenue from AI chip sales to China with US government, marking a notable shift in export control policies and trade practices. AI companion apps are experiencing substantial growth, with projected revenues expected to reach $120 million by 2025, raising questions about social implications and user engagement. Timestamps + Links: Tools & Apps (00:02:12) Google Gemini's AI image model gets a 'bananas' upgrade | TechCrunch (00:05:32) Anthropic launches a Claude AI agent that lives in Chrome | TechCrunch (00:08:30) Anthropic’s Claude chatbot can now remember your past conversations | The Verge (00:11:46) Google Launches AI ‘Guided Learning’ Tool to Teach Users (00:14:55) Apple Intelligence’s ChatGPT integration will use GPT-5 starting with iOS 26 | The Verge (00:15:39) OpenAI Adds New Features to Codex, Like IDE Extension and GitHub Code Reviews Applications & Business (00:16:49) Lovable projects $1B in ARR within next 12 months | TechCrunch (00:18:56) Decart hits $3.1 billion valuation on $100 million raise to power real-time interacti | Ctech (00:20:19) Cohere raises $500M to beat back generative AI rivals | TechCrunch (00:21:25) Pony AI, Nearing Full-Year Robotaxi Goal, Eyes European Markets - Bloomberg (00:22:41) Co-founder of Elon Musk's xAI departs the company | TechCrunch Projects & Open Source (00:24:39) Meta AI Just Released DINOv3: A State-of-the-Art Computer Vision Model Trained with Self-Supervised Learning, Generating High-Resolution Image Features - MarkTechPost (00:27:02) GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models (00:29:49) China’s DeepSeek Releases V3.1, Boosting AI Model’s Capabilities - Bloomberg (00:30:36) Open weight LLMs exhibit inconsistent performance across providers (00:32:02) Microsoft Released VibeVoice-1.5B: An Open-Source Text-to-Speech Model that can Synthesize up to 90 Minutes of Speech with Four Distinct Speakers - MarkTechPost Research & Advancements (00:33:43) Deep Think with Confidence (00:36:30) Generative AI reshapes U.S. job market, Stanford study shows Policy & Safety (00:41:42) Inside the US Government's Unpublished Report on AI Safety | WIRED (00:44:10) U.S. Government to Take Cut of Nvidia and AMD A.I. Chip Sales to China - The New York Times (00:45:13) Anthropic Settles High-Profile AI Copyright Lawsuit Brought by Book Authors (00:46:56) AI companion apps on track to pull in $120M in 2025 | TechCrunch See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Full Transcript

Hello and welcome to the last week in AI podcast. We can hear us chat about what's going on with AI. In this case, a bit more like the last month in AI, unfortunately. We've had to skip a few weeks. Jeremy is busy as always with exciting reporting and, I don't know, his natural security. work and I've been traveling. So as I always say, sorry for the missed weeks. We'll try to be back on a regular schedule going forward. And in this episode, we will summarize and discuss some of last week's most interesting AI news and a bit of also the week before. You can go to lastweekin.ai for our text newsletter, which does go out every week, most weeks, for other stuff we are not covering in this episode. And in this episode, I'm one of your regular co-hosts, Andrey Krenkov. Jeremy once again could not make it. So we have one of our regular guest co-hosts, Daniel Bashir. Hey, yeah, I'm Daniel. You may have heard me on this podcast before. If you have explored the last week in AI Substack world, you might have also listened to The Gradient, which if you haven't, we have lots of interview episodes, which are, I think, pretty cool. We'd love for you to check those out. Yeah, great to be here. Yeah, and me and Daniel were just chatting. There's at least an idea being floated of reviving the Gradient podcast, which has been on ice for a little while now. So yeah, last week in AI listeners, you might hear some news on that in a few weeks. We'll see. But in this episode, we'll be covering some primarily exciting news regarding new tools and apps, some releases from Google Anthropic, not any major applications or business stories, some pretty cool open source stuff, and just a couple notable policy stories. Not been a super busy month so far, luckily, and we haven't missed too much. So starting out in tools and apps, first story has got to be the new image editing model by Google. They have released Gemini 2.5 Flash Image, which is by far the most impressive model for editing images that has been available so far. It was kind of hyped up for a while. It was being used under this pseudonym Nato Banana. And yeah, after a little bit of that, it was revealed to be, in fact, from Gemini. and you know sadly this is an audio format so i'll have to describe what you can do with it but gist of it is you can very accurately take a subject like a person and then change the clothing of this person change your posture change the setting and it is very convincingly retaining the features of this person and very successfully kind of following your instructions about what you want to do you can combine different images as well it's by far beyond anything else we've seen to a point that some people are saying photoshop is in trouble now so still to me you know we've had very powerful models for image gen for a while this one is still you know next level Yeah, and this is also coming kind of just off the heels of Genie 3, which was released earlier this month, also by Google DeepMind, which is really quite impressive as well. It's got this sort of ability for you to look into a world and it's actually quite stable and sort of maintains some of the features of the physical properties. Like if you look at something, you make some adaptation to a part of the environment, like painting a wall, you turn away and you turn back, it is still there. And, you know, this is quite interactive with the notion of a world model that's pretty hotly debated in AI circles, whether models have them, what a world model is, things like this. So I'm pretty excited to see more work like this that forces us to think about these notions. Yeah, there's been many fun examples of things you can do with this. It's a multi-turn kind of conversational model, of course, so you can take an empty room and decorate it, you know, paint the walls, add a couch, add a table, and the room will be successively sort of populated without all the details changing, just having the very specific thing you wanted executed. And on the note of the world model, there was a fun example I saw online where someone pointed out that they gave the model an image of a road in Dallas or something and asked it to show what is the opposite view, what's behind the view. And the model apparently was able to show the view from the other side of the same area, which definitely kind of speaks to this being sort of world model-like, being able to understand physical properties, locations, things like that. Next big piece of news on this front is that Anthropic has launched a Claude AI agent that lives in Chrome. And this is pretty much something that you'd expect. The AI exists in a sidecar window. It maintains context of browser capabilities. The agent can perform tasks on behalf of the user, which is pretty exciting. There's a lot of AI companies out there developing similar AI-powered browser solutions. I feel like this is an interesting direction. And maybe there is a question out there of, like, what would an AI native browser look like? Like, how does the interaction design for that look different from how we use browsers today? Is it like this agent in a sidecar, the way that Anthropic is doing right at the moment? Or is there a world where browsers actually look pretty different in a more fundamental way? That feels pretty unclear, but I think we're in the beginning stages of something that could be really interesting. Right, yeah. This is launched via Cloud for Chrome. So it's an extension coming pretty quickly. I think OpenAI launched their agent model maybe a month or two ago, a little while ago. And that is very similar. You give it an instruction and it does web stuff for you. For OpenAI's thing, it has its own little dedicated environment and it creates its own browser and sort of does it in the chat GPT interface. here you have this plugin for Chrome and you actually use it within the Chrome browser, which is a little different. And to your point, this is also coming pretty soon after Perplexity has launched their browser, I think called Dia or something like that, that is also pitching this sort of agentic browsing stuff. So it's yet another sort of competitive area. We've seen this with search, with deep research, with every single kind of use case of AI, OpenAI and Fropic and a few others going head to head. And yeah, I think these are going to be pretty powerful. I have one fun example where I had a spreadsheet with some links where you got to open each link and check kind of the website, check for some quality assurance. assurance. And in the past, I would have to do this myself, click and go look and do this very manual labor. I was able to use strategy agent, like tell it, go to this Google doc, open it, click on these links, look at the site, check for these things. It took like half an hour. It took a long time to get through it, but I was able to do it. So there's going to be, I think, similar to So just the chatbots, these kind of agentic web browsing agents are going to be used in a million different ways to speed up all sorts of boring stuff. And speaking of Anthropic, they have another slightly notable update. The Cloud chatbot can now remember your past conversations. So something that's been available on ChatsGPT for a long time, you can activate it by going to settings and labeling search and reference chats. It's interesting to see Anthropic adding this a long, long time after OpenAI did. OpenAI, as far as I remember, must have been last year when it started remembering details from your conversations to sort of personalize it to each user. And that speaks, I think, to OpenAI having had a more consumer-oriented focus and anthropic targeting much more of a code and enterprise and business people. But yeah, I'm sure this makes it a more compelling offering. Yeah, there's a coupling with another story here that Google's Gemini will also be more personalized by remembering details automatically. as with other chatbot offerings. You have options like temporary chats where you can have private conversations that won't be saved to use for personalization or AI training. My take on this, and you can kind of see if you look at the pieces we're referencing here, that Anthropic has taken a slightly different pattern where they will only use chat memories if you prompt them to. And I think that we are in the pretty early innings of what memory and personalization are going to look like for these systems. I think that there are a lot of different contentious issues that come up here. I think that memory in its current form is not perfect in any of its implementations. And I do think that there's going to have to be a lot of considerations and hard work on how does the interaction pattern it affords right now. how does that make a difference to the model behavior in a way that's relevant to different things users care about and what sorts of principles should we have about how that evolves again feels like a very early discussion as these things are just beginning to be ruled out but something to pay attention to yeah and i think this is interesting to me as a topic because I think this, I've started realizing, first of all, the magnitude of CHI GPT usage, right? We've covered how they have 700 million active users. And, you know, in my mind, I was sort of assuming or even not imagining how people are using it. Like I use it for work. I use it to like help brainstorm and write some code and whatever. But many people use it in many different ways. Some people use it as like a therapist or like a life coach. Some people use it just to talk to and think through problems. So for people who do talk to ChatGPT a lot, like talk to it, that kind of memory feature, I think, probably matters a lot more. And so Gemini launching it in particular, where Gemini is becoming, I think, the main competitor to ChatGPT as far as chatbots people actively use. That could matter quite a bit. And speaking of Gemini, there is another launch from Google. They launched Guided Learning, which is available within Gemini and is designed to teach rather than simply answering questions. So it's meant to have you learn things, have you build a deep understanding, help you work through problems step by step, all that sort of stuff. again we keep saying this i find it interesting this is happening very very soon after chat gpt launch study mode we know that all these services are used heavily by students there's no i don't know how what percent of high school students and college students aren't using chat gpt at this point it must be in the low digits so it makes a lot of sense for this to launch and hopefully these sort of study-oriented things will make it so students actually try to learn as opposed to just have the AI do the work for them. Yeah, I hope so too. I think there's a really interesting set of questions here. Some of them are around how do we ask people to still do the hard and effortful work that is learning and developing a deep understanding of things Because I think that to really cause the sorts of changes in your brain and time you need to mull over something to really get it and have deep intuition and understanding there just isn't a shortcut to that. I think that the way our education systems work, there's different forms of legibility that we have that indicate what it looks like for a student to have attained mastery or to have a deep understanding of something. And I don't think it's news to anybody that these forms of legibility are pretty imperfect and don't always indicate that. And increasingly, they can be gamed. And when you're a student, you lose out on something, you lose out on this, not just generalization of understanding that might come to matter later on, but a sort of satisfaction you might get personally from deeply understanding something in such a way that it might intellectually stimulate you, make you want to consider different paths or things like this later on in your life. And so that deep and effortful work looks or feels quite important also just for the development of a person. This is getting too long. I mean, it's a deep topic, I think. And it's very interesting to consider how if you're on the younger side, you know, starting to grow up and you wouldn't remember a time before AI, before chatbots, like your experience will be very different from our experiences where we had the internet at least, but like we had no guy to learn with. It was very different yeah moving on to a couple stories about open ai we've had what now anthropic google as the main guys of this section so far next we have the news that apple intelligence will be integrating with gpd5 starting with ios 26 so siri already integrates with open ai gp4 oh as far as i know, like you ask Siri a question and then it decides to pass forward that topic to chat GPT. And that's perhaps not surprising that they're going to be upgrading it to GPT-5 relatively soon, but does speak to the kind of continued partnership between Apple and OpenAI. One more piece of news on OpenAI, they are adding new features to Codex, their coding assistants. So they are introducing an IDE extension, an extension to the standard coding tool, which is also something that Cloud Code has. It is introducing GitHub code reviews. Yeah, generally kind of expanding the feature set of their Cloud Code competitor. And this is, I guess for non-programmers, this might not be very exciting, but I think Cloud Code is really seemingly made a huge impact in the programming world. And these kind of agentic coder tools are pretty rapidly being adopted and making a big shift. So OpenAI managing to compete, managing to get some user share with codecs as, you know, for a rare occasion kind of entering this space later than Tropic, it's a pretty significant struggle. Yeah, a lot of stuff going on in the coding world right now, as you've seen from the many startups involved in this. Our applications and business story for today is also about a startup sort of in this space. It's about a company called Lovable, which TechCrunch refers to as a vibe coding startup. And if you haven't seen Lovable before, basically, it's used to create full stack web applications and websites. So that's the specific area that they're in. And they are projecting some pretty big numbers. They're aiming to achieve $1 billion in annual recurring revenue within the next 12 months, which is quite soon. And it's currently growing that ARR by at least $8 million each month. It's already surpassed $100 million in ARR just eight months after reaching its first $1 million, which again goes to show, obviously, many of these companies have lots and lots of spend. but the kind of user and revenue growth that they can experience is quite on a different level from what we've been seeing before. Yeah, Lovable has been kind of a clear winner so far in this entire space. And they did launch quite a while ago, so they pretty much took off this year as AI got good enough to be usable basically without knowing code, without reading code. Lovable is one of these very user-friendly, you know, I don't know if they allow you even to see the code. There are some competitors like Replit, which are more friendly to technical users that expose much more kind of techie stuff. And it's a very busy space, as you said. So Replit is one competitor. There's also Bolt. There's V0 from Vercel. Base44. There's like at least 10 significant players at this point, I think. And yeah, it's probably going to be a major market, assuming the economics of it start working. Because I think the speculation is these companies are acquiring all this revenue by burning through cash and not even trying to be profitable at this point. And speaking of big numbers, the next one is about Raise, Descartes, the company that we recently covered as having launched this real-time sort of filter real-time video-to-video model that was very powerful you can give it like normal stream of a regular kind of world normal video and it can turn it into gta or i don't know simpsons or any sort of art style with real-time streaming which would mean that if you're like playing a game it can completely change their art style, for instance. Or you can even have a very low-res game and then make graphics, whatever you want. So they have raised $100 million and they have now hit a $3.1 billion valuation. And that's pretty significant. There's no large set of users for this yet. And And this entire idea of streaming video to video, their model Mirage LSD, yeah, again, still sort of at the preview stage. So investors seem to be pretty optimistic on this having a lot of potential. Yeah, it's one of those where it feels quite early to say anything substantive. We have another story here that's also about a pretty big raise and the company you've surely heard about before. That is not too new. Cohere has raised $500 million from investors with a new valuation of $5.5 billion. Lots of different players involved here. Cohere is hoping to use those funds for accelerated growth. They plan to expand their technical teams and developing enterprise AI solutions. Again, unlike many AI startups, Coher is less focused on consumer applications and much, much more on customizing AI models for enterprise clients like Oracle and Notion, hoping to develop this sort of cloud agnostic AI platform. So this is, again, a pretty different approach that some of these labs are taking with their technical talent, where they are trying to look at different enterprises and businesses, thinking about how can AI be useful for your sort of vertical. And you're seeing both sort of general versions of that, like Cohere, but then also ones that want to develop deep expertise in a very specific area. Next up, we have a story about Pony AI, not active in the US, but is aiming to roll out to the European market. So this report from Bloomberg is kind of saying that this is their aim so far. Apparently, they've already rolled out 200 Gen 7 Robotoxy vehicles just over the past two months. We are aiming to get to a total of 1,000 vehicles. And this is notable because in the U.S., we've definitely seen a speed-up of competition and deployment of robotaxis this year in particular. Waymo is entering new markets. Tesla's robotaxi service just launched and is also at least aiming to expand rapidly. And it's very clearly going to be a huge deal. This problem is starting to be at the point where it's solved, where robotaxis are quite reliable. People seem to prefer them to Ubers in general from what I've seen in discussions. So Ponyai being another significant player coming from China has the potential to really break into the European market. And if that's the case, that's going to be a big deal, right? last story on applications is about another big lad and a bit of changing of the guard Igor Babuskin who is a co-founder of Elon Musk's XAI and who I recognize as having some kind of bird as his ex profile photo I don't know if it's a bird I can't remember if it is wings but it's a memorable profile photo anyway besides the point he has announced his departure from XAI to start and venture capital firm, the Blue Skin Ventures, which will focus on supporting AI safety research and backing startups that aim to advance humanity and explore the universe. This was inspired by a discussion with Max Tegmark about building AI systems safely for future generations. And there's also following several scandals at XAI involving their chatbot Grok, which included controversial responses and inappropriate content generation. Many of you, if you are extremely online or spend basically any time on X, probably remember the Grok 4 release and what happened around then. Yeah, it's been a tumultuous few months for XAI, to be sure. A lot of impressive results with Grok 4 launch. Just very impressive LLM, XAI in general. Since launching, I think towards the end of 2023, since the team coming together, just caught up incredibly rapidly. It would be fun to speculate if this means that XAI is not doing so well. Typically, you don't see people departing from startups I've co-founded in less than two years. But here, obviously, it's hard to say if Babushkin just wanted to go off and start this venture initiative or if it indicates anything about XAI internally. But still significant to have a shakeup in leadership in general. And XAI is an interesting time in its life. So moving on to projects and open source. First, we have an open source release from Meta AI. This was, I think, from a couple of weeks ago. The release is Dino v3, a state of art vision model trained with supervised self-supervised learning, which is able to generate high-resolution image features. So basically, it allows you to process any given image and output a representation of it that's useful for all sorts of stuff and that you can use for things like object retention, somatic segmentation, video tracking, et cetera, without any fine-tuning. And this is a pretty large model. It has 7 billion parameters, which is unusually large. for just pure image models. Trained on 1.7 billion images, this is very much just like taking the image processing model to the biggest place it's been. We don't talk too much about just pure image models for things like semantics imitation, object detection, video tracking. These are like semi-solve problems at this point. It used to be like, you know, a decade ago, these were significant tasks in computer vision. But it's pretty important to remember that I think as far as using AI, applying it, object detection, segmentation, just general video understanding and image understanding tasks are pretty significant. So having a really cutting edge model that is free for academic use that has a commercial license as well comes with a lot of code could be very useful for certain people Yeah these sorts of models clearly have pretty important impacts out there in the world For this specific model a few orgs like the World Resources Institute and NASA's Jet Propulsion Laboratory have been using it. This has also improved the accuracy of some pretty specific tasks like forestry monitoring, and supported vision for Mars exploration robots. And the fact that you can do this with minimal compute overhead and you don't have to rely too much on web captions or curation so that you're able to sort of apply this universal feature learning when you're bottlenecked by annotation is a really good advancement, I think. Next up, we have a specific set of foundation models, GLM 4.5. This is an LLM with 355 billion parameters designed to excel in agentic reasoning coding tasks, employs a mixture of experts architecture, which is pretty familiar to a lot of people who spend some time in ML research, but basically lets it select different subsets of its parameters for different tasks, which is quite good for efficiency and performance. What this also means is when you hear the number of parameters in the model, that's not quite the same as the effective number of parameters. So the number of parameters that are actually being used when the model makes an inference about something. And the training is sort of multi-stage here. It pre-trains on a diverse dataset. This is followed by fine-tuning on specific tasks, improvements capabilities. Nothing too crazy here. there's RL thrown in the training process, especially when it's working on decision-making, problem-solving sort of tasks. Just a pretty interesting model. Yeah, it's kind of interesting. We have a figure here, figure three, and there's pre-training on a general corpus, then pre-training on a code and reasoning corpus. Then there's myth training, which has three steps, repo-level code data, synthetic reasoning, and long context and agentic data. And then there's RL and stuff. So there's a lot going on, and this is very much following in the footsteps of R1. R1 sort of introduced, I think, this approach, at least in terms of published research, of having these multiple kind of stages for training agentic and reasoning models. And the notable thing about this model, aside from being big, is they are doing quite well. Like they're claiming on the benchmarks to be beating Opus 4, to be up there with O3 and Grok 4 almost, to be quite performant at a smaller number of parameters. So, you know, 353 billion parameters is a lot, but it's less than DeepSeek R1, it's less than Kimi K2. And on coding tasks, they are similar on the benchmark front. So very much a continuation of a trend you've seen all throughout this year of open source models coming out of China, starting with R1 and really proceeding ever since, that are getting better and better, that are getting really on par with the closed source offerings from Anthropic and OpenAI for many things, which is new, right? Until this year, you could not get an open source LLM that was anywhere near competitive with Cloud or ChatGPT. Now that's different. And speaking of open source releases from China, next story is about DeepSeq releasing its V3.1 model. So this is a bump in the version as per the title. It has a longer context window and not like any sort of substantial jump in any sense, but I think notable to see Deep Seek continuing to release and continuing to update the R1 model sort of incrementally and still being competitive. Although apparently Deep Seek fans are waiting for the release of R2, which would be the successor to R1. So this is kind of leading up to that. And speaking of open-weight LLMs, we have kind of an interesting story about the overall market. So artificial analysis did a benchmark evaluating the performance of GPT-OSS-120B, the recent open-source release from OpenAI. And they evaluated the performance of this model across different providers on the cloud. So you can run these open source models through various companies like Cerebrus, Fireworks, DeepInfra, Together.ai, Grok, Amazon, Azure, a bunch of them. And the funny thing that they found in this is on a particular benchmark, Amy, they have very different outcomes across the different providers. So on some of them, Cerebra, Snebius, Deep Infra, they get a high score, 95%. Then you go to Grok, Amazon, Azure, they go down by 10%, maybe even more than 10%, which speaks to hard to say what these providers are doing. Are they like making smaller versions? Are they quantizing? Are they using different hardware? But definitely a surprising result. You would think that if it's the same model and all these people are serving it, letting you use it via their hardware, you would expect roughly the same performance. But apparently that's not the case. Our last story on this front is an open source text-to-speech model from Microsoft called Vibe Voice 1.5b. This is capable of generating up to 90 minutes of speech with four distinct speakers supporting cross-lingual synthesis and singing. It's primarily trained on English and Chinese and available under an MIT license. There's a decent amount of work going on right now in audio synthesis. And I think that this is like a pretty exciting advancement, like 90 minutes of speech is quite a long time. I think there's still questions about general coherence of the audio over that stretched period of time. But it does seem as though, again, we're making pretty quick advancements. Yeah, and this is one of these notable things where audio in general historically has kind of lagged behind in the open source front in terms of data sets, in terms of models. It's just this kind of area where you don't have as many options as, for instance, image generation. So having powerful text-to-speech means that on the one hand, as a company, you can use it, fine-tune it for various applications. On the other hand, we know now that people use these kinds of things for scams and so on. And that would just mean that you have to really be on the lookout whenever you hear someone in audio these days. It's at a point where you cannot tell the difference between AI generation and actual recorded audio. And on to research and advancements, just a couple of stories for this episode. The first one is deep think with confidence, which is a new approach that basically makes test time scaling more efficient and more effective. So they are looking at the type of test time scaling where you want to do several parallel reasoning paths. You want to have the model try to solve the problem multiple times and get to different results. And then you might take sort of a majority output or a combined output of your various reasoning traces. And this paper introduces a fairly straightforward idea. so it's titled deep think with confidence as you are doing your rollouts of different reasoning paths towards getting to an answer you can evaluate roughly speaking the confidence of the model in terms of the kind of prediction what they call token confidence which is looking at the probabilities of the tokens it's actually outputting and they also define an average trace confidence that they call self-certainty. And basically, they evaluate this thing as you roll out the model. And if you have low confidence, they kill the run. They kind of stop it. So, you end up being able to do many parallel runs, cut off ones that seem unpromising. And then, if you get to high confidence, you're now able to combine these results from multiple models and get to a combined kind of confident output. And in benchmarks, they show that with this method, they're able to improve performance pretty substantially, able to improve performance by 10% getting on some of these benchmarks like AIME, a couple percent boost for GPT-OSS, 5% boost for TPC, basically making it so for things where you're not reliably getting an output necessarily, you're now going to get more significant ratio of getting to the right answer. And yeah, it speaks to, I think, the place where we are of reasoning and test time scaling. There's a lot of logging fruit probably in this whole area of test time scaling in terms of ways to do it more reliably, efficiently. This one is a fairly straightforward algorithm method that can be applied widely? While we're thinking of test time scaling and a lot of these improvements, maybe a natural question is to ask what happens to jobs? And as it happens, a couple of days ago, a Stanford study found that the adoption of generative AI is significantly affecting job prospects for young U.S. workers, particularly those aged 22 to 25. And this came out quite recently and there's been a lot of commentary, I would actually recommend taking a look at Noah Smith's recent blog on this specific paper and also, of course, reading the paper itself, because it's worth trying to understand and contextualize those claims. But just to get up a bit on a soapbox about this paper, I feel like despite the fact that the people who wrote this paper are pretty careful economists and very deserving respect, it does feel like this finding is a bit of a specification search. As job markets rise and fall, there's always some group of people who are doing worse than the rest. And it's a little bit unclear that it is always justified to tie this to there is a new technology on the block like AI. What's worth saying is that sure, it's possible that AI is impacting job prospects to some effect, but it's a little bit hard to disentangle this entirely from other economic factors. One really great thing that Noah Smith does in this post where he takes a look is he looks at the data about how AI exposure relates to job prospects for people at different ages. And this is specifically for people who are 22 to 25. But the workers who are in their 30s, 40s, 50s, who were judged to be most heavily exposed to AI actually have seen robust employment growth since late 2022. And you can maybe score this back with the story about AI destroying jobs. But again, it's kind of unclear, like, why would companies be rushing to hire new 40-year-old workers in AI-exposed occupations? Again, just a lot of question marks here. Six facts about the recent employment effects of artificial intelligence. So they're examining the effects of AI on the labor market, on employment, on people being able to get jobs. The first fact is they uncover substantial declines in employment for early career workers aged 22 to 25, as we say, in occupations most exposed to AI, such as software developers and customer service representatives. Second key fact is that overall employment continues to grow, but employment growth for young workers has been stagnant since late 2020. Third fact is not all uses of AI are associated with declines in employment. Fourth, they find that employment declines for these workers remained after conditioning on firm time effects So they do try to be careful as you said this is analysis from labor data We not doing experiments here We just looking at various statistics and trying to conclude what AI effect may have had So they try to account for these other factors that could explain statistics Fifth, they say that labor market adjustments are visible in employment more than compensation. And sixth, the above facts are largely consistent across various alternative sample structures. so as you said like economics research is tricky there's no careful experimentation going on here they are working with data that can have various interpretations like in the case of software development for instance which is one of the major areas where employment has been much harder for early careers professionals obviously there's many factors going on during covid there was There's arguably over-employment. Many of the big tech companies really hired like crazy. And then there was a large amount of layoffs going on in software development over the last couple of years. There's economic conditions, all sorts of stuff. So this is a very early piece of research. And they do, to be fair, kind of position it as such. They call this in the coal mine to indicate that this might be a sign of what's happening, but it's kind of still early and it's hard to tell. But as far as analysis, as far as sort of actual research that is able to tell us anything about employment with AI, to my knowledge, this would be the first sort of major work. or obviously I'm not an economist. Maybe there's been some prior research on this, but this is coming from a Stanford group that is pretty oriented on this. One of the lead authors is Eric Brindjolfson, who has done previous research on AI and economics. So as you said, Daniel, if you find this interesting, probably worth to follow up and see some more deep analysis and possible interpretations of this. Yeah. Our next story is in the policy and safety space. And this one's actually really interesting about an unpublished report on AI safety from the U.S. government. Back in October, a red teaming exercise was conducted at a computer security conference in Arlington, Virginia, where AI researchers stress tested some advanced AI systems. They identified 139 novel ways these systems could misbehave, like generating misinformation, leaking personal data, things like this. The key upshot of the exercise was it revealed significant shortcomings in a new U.S. government standard designed to help companies test AI systems. But the National Institute of Standards of Technology didn't publish a report on those findings. The reason for that, according to some sources, was that along with other AI documents from NIST was withheld because there were some concerns about conflicting with the incoming administration's policies. Wired now has this unpublished report. And I guess one of the key takeaways here is that this is an area that feels like it should be nonpartisan and ideally not too influenced by politics. But it seems like there have been challenges faced in publishing AI research under the Biden administration. So just an interesting story in terms of the confluence of politics and AI safety. Right. This is a report from NIST, the National Institute of Standards and Technology. which was tasked with this kind of thing, with creating standards and technology for AI. They created this NIST AI 600-1 framework to assess AI tools. This is an artificial risk management framework, general artificial intelligence profile. So this Red Teaming exercise basically was to evaluate this framework that they published like a year ago, I think, mid-2024. So probably, yeah, not too surprising, we know that Trump administration reversed Biden's actions on AI, recently published their own agenda on AI. And it's very likely that these kinds of AI security initiatives are going to see less interest or less promotion with the current administration. And another story about the US government and kind of a surprising one, the US government is going to take a cut of NVIDIA and AMD AI chip sales to China. So we have talked quite a lot about export controls, about restrictions for NVIDIA to be able to sell GPUs to China. It's been a very evolving area in the Trump administration. There was a time where the H20 chip, which for a long time was the one that NVIDIA sold to China, specifically was blocked suddenly from being sold. and so this is kind of reversing that now nvidia apparently is able to sell the h20 again but will have to pay to vs government so jeremy unfortunately would be the guy who would give the most insight on this development but seems a bit surprising as far as kind of the approach to export restrictions Moving on to something unrelated to the government, going to another topic we've talked about quite a lot, lawsuits ongoing about copyright for the major LLM providers. So Anthropic has settled a high-profile AI copyright lawsuit brought by book authors. So this was initiated by authors Andrea Bards, Charles Graber, and Kirk Wallace-Johnson, who accused Anthropic of using their books without permission. There was some, let's say, conflicting developments in here. California District Judge ruled that Anthropik could use books with fair use, but found that the acquisition method via shadow libraries constituted piracy. And this is just one of multiple lawsuits ongoing, basically for years now, that would have major amplifications about basically how you can use, how you can acquire data for training AI models. OpenAI, Anathropic, and others kind of took the maximally permissive approach of using a bunch of data without asking any permission. And so this settlement, hard to say as non-lawyers how significant of an effect it has on other ongoing law developments, but does mark one kind of piece of progress in this long ongoing story where at least this lawsuit has reached an end. Our last story is about AI companion apps, which are on Tractable and $120 million in 2025. In the first half of the year, these apps already generated $82 million with downloads up by 88% year over year, reaching $60 million. And the top 10% of these apps account for 89% of the revenue, with 33 apps surpassing $1 million in lifetime consumer spending. The popular ones in this space include Replica, Character AI, Polybuzz, Chai, with a significant portion of users seeking AI girlfriends. You may have also seen commentary on Twitter about AI boyfriends being very popular. This is a really interesting, hairy space to me because I think it represents or sort of portrays something pretty fundamental about the kind of companionship that people seek and are willing to accept and the different ways in which it can be met and not met. Personally, I find AI companions a bit troubling for numerous reasons, but I won't get up on the soapbox about it here. Yeah, well, I did include it in the policy and safety section very much because it has pretty, let's say, concerning or significant implications for society, for people's psychology. We know in the modern age, there's been a very much degradation in the amount of socializing and the amount of close connections people have. It's arguably one of the major health crises of the modern age, like people's ability to have friends and close connections. And so this market's growing significantly, getting a lot of revenue. According to this report, there have been 112 apps published just in the first half of 2025, with the names of those apps, having girlfriend in 56 of them, fantasy, boyfriend, anime, soul, soulmate, lover, waifu. A lot of clearly romantic attraction apps. And it's coming also in this current paradigm with dating apps. I think the general consensus is it's a hard and unenjoyable process to try and find a human girlfriend or soulmate. so yeah i mean it's a little concerning i think it's fair to say on the one hand you can treat it as a video game as like a role-playing exercise as a fun thing by the way character day i one of the players in the space for a while which isn't focused on girlfriends which is general role play still has millions of monthly active users like this is a very big space so it's likely to keep growing i mean xai recently launched ani and like their own grok based companions i don't know it's an interesting phenomena for sure and fun fact the movie her which was all about this thing directed by spike jones where the main character played by phoenix falls in love with an ai character set in 2025 so lots of people are saying that this movie was incredibly prescient and i think gets there if you haven't seen her highly recommend well that is it for this episode as i've said hopefully we are going to back get back to a weekly schedule thank you daniel for fulfilling the guest co-host duties always fun to have you on here thanks for having me i always really love doing this and thank you to the listeners as always we appreciate you tuning in and bearing of us as we skip some weeks at an unpredictable rate always appreciate it if you leave reviews if you share with your friends and more than anything if you just keep tuning in AI News begins, begins. It's time to break. Break it down. Last week in AI, come and take a ride. Get the load down on tech and let it slide. Last week in AI, come and take a ride. Our labs to the streets, AI's reaching high. New tech emerging, watching surgeons fly. From the labs to the streets, AI's reaching high. Algorithms shaping up the future sees Tune in, tune in, get the latest with peace Last weekend, AI, come and take a ride Get the lowdown on tech and let it slide Last weekend, AI, come and take a ride I'm a lab to the streets, AI's reaching high From neural nets to robot, the headlines pop, data-driven dreams, they just don't stop. Every breakthrough, every code unwritten, on the edge of change, with excitement we're smitten. From machine learning marvels to coding kings, futures unfolding, see what it brings.

Share on XShare on LinkedIn

Related Episodes

Comments
?

No comments yet

Be the first to comment

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies