Back to Podcasts
Last Week in AI

#217 - ChatGPT Agent, Kimi k2, Hiring Drama

Last Week in AI • Andrey Kurenkov & Jacky Liang

Wednesday, July 23, 202553m
#217 - ChatGPT Agent, Kimi k2, Hiring Drama

#217 - ChatGPT Agent, Kimi k2, Hiring Drama

Last Week in AI

0:0053:00

What You'll Learn

  • OpenAI released a new ChatGPT agent that can control a computer and perform a wide range of tasks, outperforming previous OpenAI tools on various benchmarks.
  • Kimi K2, a 1 trillion parameter open-source model from Alibaba, has been shown to be competitive with proprietary models like ChatGPT and Claude in areas like coding and creative writing.
  • Amazon launched a new Kiro AI software development tool that aims to bring more structure and planning to the agentic coding space, though its long-term success is uncertain.
  • Anthropic has tightened usage limits for its Cloud Code service without informing users, leading to complaints from the community.

Episode Chapters

1

Introduction

The hosts provide an overview of the topics to be covered in the episode, including new AI tools and developments.

2

OpenAI's ChatGPT Agent

The hosts discuss the release of OpenAI's new ChatGPT agent, which can control a computer and perform a wide range of tasks.

3

Kimi K2 Open-Source Model

The hosts cover the impressive open-source Kimi K2 model from Alibaba, which has shown competitive performance with proprietary models.

4

Amazon's Kiro AI Development Tool

The hosts discuss Amazon's new Kiro AI software development tool and its potential impact on the agentic coding space.

5

Anthropic's Cloud Code Usage Limits

The hosts cover Anthropic's tightening of usage limits for its Cloud Code service without notifying users.

AI Summary

This episode of the Last Week in AI podcast covers several key developments in the AI industry, including the release of OpenAI's new ChatGPT agent, the impressive open-source Kimi K2 model, Amazon's new Kiro AI software development tool, and Anthropic's tightening of usage limits for its Cloud Code service without notifying users.

Key Points

  • 1OpenAI released a new ChatGPT agent that can control a computer and perform a wide range of tasks, outperforming previous OpenAI tools on various benchmarks.
  • 2Kimi K2, a 1 trillion parameter open-source model from Alibaba, has been shown to be competitive with proprietary models like ChatGPT and Claude in areas like coding and creative writing.
  • 3Amazon launched a new Kiro AI software development tool that aims to bring more structure and planning to the agentic coding space, though its long-term success is uncertain.
  • 4Anthropic has tightened usage limits for its Cloud Code service without informing users, leading to complaints from the community.

Topics Discussed

#Large language models#Agentic AI tools#Open-source AI models#AI software development

Frequently Asked Questions

What is "#217 - ChatGPT Agent, Kimi k2, Hiring Drama" about?

This episode of the Last Week in AI podcast covers several key developments in the AI industry, including the release of OpenAI's new ChatGPT agent, the impressive open-source Kimi K2 model, Amazon's new Kiro AI software development tool, and Anthropic's tightening of usage limits for its Cloud Code service without notifying users.

What topics are discussed in this episode?

This episode covers the following topics: Large language models, Agentic AI tools, Open-source AI models, AI software development.

What is key insight #1 from this episode?

OpenAI released a new ChatGPT agent that can control a computer and perform a wide range of tasks, outperforming previous OpenAI tools on various benchmarks.

What is key insight #2 from this episode?

Kimi K2, a 1 trillion parameter open-source model from Alibaba, has been shown to be competitive with proprietary models like ChatGPT and Claude in areas like coding and creative writing.

What is key insight #3 from this episode?

Amazon launched a new Kiro AI software development tool that aims to bring more structure and planning to the agentic coding space, though its long-term success is uncertain.

What is key insight #4 from this episode?

Anthropic has tightened usage limits for its Cloud Code service without informing users, leading to complaints from the community.

Who should listen to this episode?

This episode is recommended for anyone interested in Large language models, Agentic AI tools, Open-source AI models, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

Our 217th episode with a summary and discussion of last week's big AI news! Recorded on 07/17/2025 Hosted by Andrey Kurenkov and guest co-host Jon Krohn. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Check out Jon's workshop on Agentic AI Engineering, and find his consultancy here. Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: OpenAI's new ChatGPT agent: The episode begins with a detailed discussion on OpenAI's latest ChatGPT agent, which can control entire computers and perform a wide range of tasks, showcasing powerful performance benchmarks and potential applications in business and research. Major business moves in the AI space: Significant shifts include Google's acquisition of Windsurf's top talent after OpenAI's deal fell through, Cognition's acquisition of Windsurf, and several notable hires by Meta from OpenAI and Apple, highlighting intense competition in the AI industry. AI's ethical and societal impacts: The hosts discuss serious concerns like the rise of non-consensual explicit AI-generated images, ICE's use of facial recognition for large databases, and regulations aimed at controlling AI's potential misuse. Video game actors strike ends: The episode concludes with news that SAG-AFTRA's year-long strike for video game voice actors has ended after reaching an agreement on AI rights and wage increases, reflecting the broader impact of AI on the job market. Timestamps + Links: (00:00:10) Intro / Banter (00:02:49) News Preview Tools & Apps (00:03:29) OpenAI’s new ChatGPT Agent can control an entire computer and do tasks for you (00:07:11) Alibaba-backed Moonshot releases new Kimi AI model that beats ChatGPT, Claude in coding — and it costs less (00:09:36) Amazon targets vibe-coding chaos with new 'Kiro' AI software development tool – GeekWire (00:12:33) Anthropic tightens usage limits for Claude Code – without telling users (00:15:51) Mistral's Le Chat chatbot gets a productivity push with new ‘deep research' mode | TechCrunch (00:17:46) I spent 24 hours flirting with Elon Musk’s AI girlfriend (00:21:32) Uber is close to completing its quest to become the ultimate robotaxi app | The Verge Applications & Business (00:24:02) OpenAI’s Windsurf deal is off — and Windsurf’s CEO is going to Google | The Verge (00:28:09) Cognition, maker of the AI coding agent Devin, acquires Windsurf | TechCrunch (00:28:46) Anthropic hired back two of its employees — just two weeks after they left for a competitor. | The Verge (00:28:46) Another High-Profile OpenAI Researcher Departs for Meta | WIRED (00:28:46) Meta Hires Two Key Apple (AAPL) AI Experts After Poaching Their Boss - Bloomberg (00:31:31) Mira Murati's Thinking Machines Lab is worth $12B in seed round | TechCrunch (00:33:20) Lovable becomes a unicorn with $200M Series A just 8 months after launch | TechCrunch (00:34:55) SpaceX commits $2 billion to xAI as Musk steps up AI ambitions: Report | World News - Business Standard Research & Advancements (00:35:59) A former OpenAI engineer describes what it’s really like to work there | TechCrunch (00:38:23) Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Policy & Safety (00:42:14) Anthropic, Google, OpenAI, xAI granted up to $200 million from DoD (00:43:08) California State Senator Scott Wiener Pushes Bill to Regulate AI Companies - Bloomberg (00:43:58) AI 'Nudify' Websites Are Raking in Millions of Dollars | WIRED (00:45:55) Inside ICE’s Supercharged Facial Recognition App of 200 Million Images Synthetic Media & Art (00:48:47) Video game actors' strike officially ends after AI deal See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Full Transcript

Hello and welcome to the last week in AI podcast where you can hear a chat about what's going on with AI. As usual in this episode, we will summarize and discuss some of last week's most interesting AI news. You can go to the episode description for the timestamps and links to skip to any of the many stories we'll be talking about today. I am one of your regular hosts, Andrei Krenkov. I studied AI in grad school, and I now work at a generative AI startup. Nice. This is John Krohn. I am irregular. You might even say one of your odd hosts. That would be a good adjective. A regular guest co-host is how I like to think about it. Right. Exactly. I really appreciate that. Yeah, I've been on the show probably half a dozen times, at least and love being on the show. It's the only podcast that I listened to last week in AI. If people have heard me on the show before, I'm sure they've heard me say that before. Delighted to be here. I'm perhaps best known for hosting a show called Super Data Science, which you've been on, Andre. It's an interview format show as opposed to, you know, news focused show like last week in AI. It's a very nice compliment to each other, we might say. And something big since I've last been on the show is that in March, I co-founded a new consulting firm, which I'm CEO of, and we're called Y-Carrot, like Y-Hat, but it's like the computer character, the thing above the six on a U.S. English keyboard. It's a bit of a machine learning joke for people who are in the know, but we're focused on agentic stuff. We're focused on generative stuff, reg, and bringing that into enterprises, letting people get ROI on all the latest and greatest in AI. So there's some stories that I'll be able to relate to from firsthand experience because of that. Makes sense. And now is probably a good time to be consulting people because there's certainly a lot happening very quickly. and it's honestly hard to keep up even if you're like hosting a podcast, much less if you're not doing that. Andre, it's unreal. I've never had an experience in business like this before. Every other commercial thing that I've ever tried, it's hard to get product market fit. But for any of our listeners out there, I'm probably now, I'm cannibalizing my own. But there's so much work out there, like a rising tide lifts all boats. There's so much opportunity out there right now to be transforming organizations with LLM-enabled technology, basically, that it's crazy. Every conversation leads to next steps. Nobody's ever like, I'm not sure this is what I need. It's just a matter of prioritizing and getting things done. And it's actually a quite good fit to give a quick episode preview. This episode is going to be pretty heavy on the tools section, lots of new things. And most excitingly, the ChatGPT agent just came out, So that'll be probably one of the big focus areas. Then in business, lots of interesting developments in the hiring front. We've been talking about for the last few weeks, even more kind of weird news of acquisitions, hires, movements, et cetera. And beyond that, we'll only have a couple stories in research and policy and safety. This is going to be a bit of a quick episode. So it's just going to race by, try to keep up. So let's go ahead and dive in. Tools and apps, starting with OpenAI's new ChatGPT agent, which can control an entire computer and do tasks for you. So the way this looks like is in ChatGPT, they have this kind of selector menu where you can choose various modes, including deep research, web search, etc. and ChatGPT agent is now a new option there and the gist of it is it's combining two previously existing things they already had operator which could browse the web for you and do various tasks that way and they had deep research which analyzed and summarized information so the way that OpenAI pitches this is as sort of like best of both worlds a much more powerful agent that can do general computer use. It can click, it can do commands, it can browse the web, and so on and so on. And so, yeah, this is the latest frontier, you could say, on agentic task execution beyond code. This is able to do conceptually, I suppose, anything you could do with a computer. And coming along with the announcement, besides the utility of this, they also show really, really strong performance on various benchmarks like humanities, like last exam, frontier math, things we cover, this ChatGPT agent with browser and computer and terminal is able to outdo OpenAI for many of tools, deep research, all of these by quite a big margin. So this seems to be sort of the most trained agent that OpenAI has ever released. it's cool i used it already and it's really effective you can watch it working so you can kind of you can see it going on the internet doing tasks for you you can actually interrupt it and take over in like so you kind of have this this view if you've ever remoted into you know a remote server it's like watch it's like doing that and watching you know a colleague of yours program or search the web and you can actually go in there and interrupt it if you want to i haven't try the interrupting it. I'm not sure what value that would really provide. Or if it can continue after you stop interrupting it. I don't know exactly how that works. It can create assets for you like spreadsheets, like slideshows. And so we've been using it for that already. And it's been really good. So it has, I've been a deep research user for months now. I pay for the pro tier of ChatGPT in order to be able to get used to amazing report building. Like it seems like it would be comparable to having a McKinsey analyst working for you, except that they can get their work done in minutes instead of days or weeks. But it's that level of quality with deep research. And now adding into it as well, you know, the ability to be outputting assets for you to be able to see what it's doing while it's crawling the web agentically. It's a cool interface. I like it. Powerful. Yeah, that certainly seems like it. And in fact, it's so powerful that there are some kind of safety concerns. It's going to ask you for permission for things like sending emails and making bookings since it can kind of do whatever. Also has restrictions on financial transactions, probably a good idea. And as you said, this is now rolling out to pro plus and team users with enterprise and education coming out later. So lots of people are going to start using this. I think we're going to start seeing some pretty cool examples of what you can do with this. On to the next story, we covered Kimi K2 briefly in the last episode as a new exciting open source release, but we didn't dive into it. So I think we will cover it a little bit more. The headline is Alibaba backed Moonshot releases new Kimi AI model that beats chatgbt, clod in coding, and it costs less. So the just is Kimi K2 is a 1 trillion parameter model that is very, has a lot of experts. So only 32 billion active parameters at a time. And it had really impressive benchmark numbers. What I've seen since then is kind of it passes the vibe check. Everyone seems to agree this is a really good model, really impressive open source model, competitive even, as this article says, potentially with Claude or ChatGPT or other proprietary private models. So way beyond LAMA, way beyond probably anything we have in open source, including DeepSeq v3. And this is not even a reasoning model. So they presumably have an R1 variant of this in the works. Yeah, this is kind of a story that is unsurprising, I suppose. This is kind of like the trajectory that you're on. You're kind of expecting somebody to come up with open source approaches that rival. You know, Jeremy talks a lot on the show. I'm sure you do as well. But for some reason, I remember Jeremy saying this frequently of kind of six months after a proprietary model comes out. You can expect kind of similar capability in open source. And that's what we're seeing here. Yeah, I haven't used it myself, but the benchmarks look good. Yeah, and there are interesting notes about it. As for instance, people say that it is really good at creative writing. It has like a different writing style, potentially because of being trained on different data distributions coming out of China. So, yeah, interesting developments. And as with DeepSeek, interesting to see this coming out of China where they are more hardware constrained due to export restrictions, as we talk about quite a bit. And so in the technical report, similar to DeepSeek, they go into some of the interesting technical insights. They in particular highlight Muon, this new optimizer that hasn't been proven so much yet, but in this case scaled to a gigantic model. So a combination of really exciting developments for open source, but also some new technical insights that are quite interesting. And next, Amazon targets Vibe Coding Chaos with new Kiro AI software development tool. So kind of a surprise story for me. We've seen Cursor, of course, be a very important agentic powered ID for code development. Cursor code has been killing it in the past couple months. Now Amazon has released this new Kiro development environment that basically positions it as another agentic coding tool that is particularly focused on making it a little more principled. So they highlight specs and planning and all these kinds of things. in their blog post. It also has all the various features that you expect with MCP and so on. So boy, this is a really, really busy space with all this coding agentic stuff. I was just exploring like CLine and Roo, these extensions by open source teams. There's like Forex and combinations and now Amazon is in a fray of this new tool. It's clearly people are putting a lot of work and trying to optimize and make this work well. I'm a big Cursor fan personally. How about you, Andre? I used to use Cursor as my main tool, but Cloud Code has kind of overtaken it. And I actually moved back to VS Code from Cursor just because it is now pretty feature comparable. And Cursor updates a lot and sometimes not in ways that works too well. Nice. That's good to hear. I'll have to try that out and kind of maybe go back also to VS Code myself. This one here, this Kiro announcement from Amazon, this one feels kind of random to me. I know Amazon is often throwing stuff at the wall to see what will stick. And this kind of seems to fit into that category. You know, big company trying out lots of different projects. But Amazon hasn't been, like, I can't off the top of my head think of any big LLM releases, like proprietary or open source that have been anywhere near the cutting edge. Can you think of anything? No, they have developed some models, but they really haven't tried to compete in terms of performance. They have internal models, presumably, for their chatbots and so on. So yeah, Amazon strategy is, I think, interesting. They don't try to be a frontier lab so much, but they work with Anthropik, for example, and they do develop some things like this to be in the ecosystem in some ways. Yeah, we'll see what happens. My crystal ball predicts that we're not going to be all using Curo browsers in a here or two. Yeah, it's also... Curo IDs, sorry. Yeah, it's a bit strange. They don't target enterprise that much. But regardless, it looks pretty slick. So who knows? Maybe it will actually take off. And speaking of agentic coding tools, next story. Anthropic tightens usage limits for cloud code without telling users. So this is a development that happened this week. I saw this happening in real time on Reddit where people on the cloud subreddit were complaining that their usage seems to be more restricted. They hit the limits on using the Opus biggest model quicker. So apparently that true at least this article seems to support it especially on the per month max plan where you have like crazy amount of kind of budget to use up tokens And this has coincided with some instability Like Wednesday, Thursday, Cloud Code and Anthropic were both down briefly and were just not usable. So in a way, not surprising. Like they are definitely losing a lot of money by being so generous with this Max plan. But I think an indication of where things are heading where I guess at some point you'll have to be profitable and the cost of these subscriptions are going to go even beyond 200. Yeah, with functionality like agents now being available in Cloud as well, you can imagine that their compute is getting slammed. So I mentioned earlier in the episode that I have a ChatGPT Pro subscription. I also have a paid Cloud plan because there's different kinds of things that I like to do with different providers. I have Gemini Ultra as well. And Cloud is my favorite for most tasks, actually. It's kind of my default go-to. And I have been hit. It's just funny that the story came up. I had never been hit with one of these overload errors before, but I hit one this week. So it seems like we're all kind of in the same boat. And as you said, it's unsurprising given how much money all of the big frontier labs are hemorrhaging on providing their services. You know, they're losing money by giving us access to such powerful models at such low cost. and you wonder when things are going to have to change. And so I understand, like you said, that they have to make some changes. What's surprising, because Anthropic is usually good organizationally about communication and getting things right. Maybe they just didn't anticipate that some people would feel this change, but it's a rare own goal, I'd say, from Anthropic. I agree, yeah. They rarely seem to take these sorts of missteps. And I think it's probably an indication of just cloud code has taken off pretty rapidly and they've been probably trying to just keep up. It's a fun detail for me. So all these models allow you to use them with a subscription plan. You're not paying per token generally, especially in this max mode. So if you use some tools, you can see like the hypothetical amount of money you spent. and as a user myself, I'm spending like $2,000 in tokens on this $200 per month plan. It's insane. So I don't know. I think this is a sign of things to come. That's a great stat there. I know. Whatever the inverse of a margin is, a loss that you're putting in there. Yeah, nice. Next up, we've got Mistral, and they are also keeping up with all the agentic hype. They have rolled out deep research in their LetChat offering for talking to their models, you know, the equivalent to ChatGPT and Claude and so on. This is actually part of several things. They now also have projects. They have image editing, multilingual reasoning. So very much in line with Mistral kind of just racing to be feature equivalent to ChatGPT and Claude and provide an offering that's comparable. As we say with Jeremy here all the time, Mistral is in a tough position. They don't have as much money. They don't have as much compute. But it's always cool to see them kind of rolling out things pretty rapidly. Yeah, I mean, everyone is rolling out deep research. There's been people doing it for a year now, some of the early movers. And it's kind of, it's expected it's what we call table stakes in software product design these days if you are an LLM provider, I think. And it actually, I mean, there's all kinds of safeguards you need to get in place. There's all kinds of engineering complexity when you roll this out on the kind of scale that LeChat would be. But I actually, I'm going to plug a free thing that I published a month ago on YouTube. I published this agentic AI engineering course. It's four hours long. And the first hands-on project, we use the OpenAI Agents SDK to create a deep research kind of functionality. And so you can kind of see how that works. And yeah, so that's free on YouTube. And I'll provide a link for you to provide in the show notes. It's a pretty cool 30,000 people. I've already watched it on YouTube and there's no ads. I've turned off ads. It's just there is an educational resource for people who want to be doing cool stuff with AI agents. Yeah, it sounds like a pretty fun project for sure. Next, moving on to Grok. We spent quite a while talking last week about Grok 4 and some of the controversies around it. Soon after, there was a strange development with Grok and X. They have released a feature called Companions in the Grok app, which you can access if you're on the Super Grok subscription, costing $30 per month. And these companions, there's a couple personas you can chat with as sort of characters. They have 3D models. They talk to you with audio and you can talk to them with audio. One of them is an anime girl wearing sort of dark Lolita fashion. And the article here is called, I spent 24 hours flirting with Elon Musk's AI girlfriend, which is surprisingly entirely accurate. This character companion is literally designed to be flirty. It's in their system prompt that it should be a 22 girly cute character who is into whoever is talking or chatting with her. And you can like build up a meter for how much this companion is attached to you. at some point you can get into inappropriate territory. You can actually like reach a level where you're able to put the character in lingerie. I mean, interesting feature here from Grok, I suppose. I did not know this story. I've clicked on the link and I'm looking at the photos and videos and it is intense. It feels like I shouldn't be looking at this while working. Yeah, it's not safe for work entirely. And I mean, there's something to be commented on as it actually is potentially a significant concern and problem that people are already kind of falling in love with these AI companions. This has been happening for a while. So, you know, this might have some interesting effects on people if they really do start to bond with it. But yeah, just go and look at the screenshots and the videos of us because it's something else. Whoa. In this article, it says, yeah, things can include descriptions of, I'm not going to read them out loud. I feel uncomfortable saying these words, but sex acts. There's a quote here. At no point did it ask me to stop or say I'm not able to do that. And then, yeah, I guess you, there's something, I'm kind of vaguely just quickly skimming this as we're speaking here. But it's kind of gamified in that depending, I guess, on how long you talk or the kinds of things you say, I don't know, you get hearts on the screen and that allows you to level up to different levels in, I guess, this game. And yeah, when you get to level five, she's wearing lingerie. Yeah, it's interesting. It's interesting. I mean, in some ways, it's kind of it's, you know, this kind of thing is inevitable, right? It's like it's but it's kind of surprising that it's such a such a big mainstream company that's raised so much money. And yeah, just last week, I was making headlines for being at the frontier in some capabilities. Yeah, to be clear, this is not a new thing. There's plenty of apps that provide this exact kind of feature. And it is just surprising that, you know, in Grok, the equivalent to ChatGPT or Claude or so on, this is now a built-in feature. Literally like a sexy companion to chat with. Certainly a differentiator, I guess. That it certainly is. Next, we've got a story of Uber being close to completing its quest to become the ultimate robo-taxi app. So this is because they have announced a partnership with Baidu to deploy robo-taxis outside the U.S. and China, focusing on Asia and the Middle East. They already, Baidu already operates around 1,000 robotaxis globally. They are in a pretty good spot from what I can tell, a competitive Waymo. And Uber already has a partnership with Waymo where you can hail a robotaxi through their app. So I think the headline here is not too sensational. It does seem like Uber is trying to partner and kind of use robotaxis as part of the product, which I suppose they kind of need to, right? Yeah, the Uber share price has long priced in being able to go too autonomous, to not have to be paying human drivers. And it's a pretty wild thing. As we start to have cars driving themselves, trucks driving themselves, in the U.S., in something like 30 states out of 50 in the U.S., truck driving is the number one occupation. And then lots of the other top jobs are supporting that in some way. And so we're marching inevitably to more and more autonomous driving. I think ultimately it can be a good thing for society because that kind of job, whether it's, you know, I feel so bad for I live in New York and taxi drivers, Uber drivers, you can tell it pains them in a lot of cases to be using that right foot because just all day using that right ankle. And so you're like, in some ways, it'll be a good thing, but it's also going to be very disruptive to all these people who have this kind of job today. So retraining programs will need to come into place or some other kind of solution. Right. Yeah, it's been an interesting thing with Waymo kind of slowly but surely expanding their robot taxi capabilities over the last couple of years. Tesla just rolled out robotaxis. And there are companies working on autonomous trucks as well that are not Waymo. Tesla itself is presumably working on it. As you said, there are like 3.5 million truck drivers in the U.S., around 1 million Uber drivers. So it's going to be here in a year, two years, three years, and it's going to be disruptive, hopefully in a good way. And on to applications and business, as promised, some interesting kind of acquisition and hiring developments this week. First up, OpenAI's Windsurf deal is off and Windsurf's CEO is going to Google. So we reported previously that OpenAI was in talks with Windsurf. Windsurf created a number of one of these coding tools with agentic capabilities, seemed to be in talks to be brought out for $3 billion. That was canceled and the CEO and some of the top talent went over to Google for a deal, I think reportedly around $2.4 billion with some licensing details as well. So another case of a non-aquihire where the big company hires away the top talent, the leaders really of the project, throws in some license deal or something of that sort. And the company, Windsurf, stays. It's still there. It hasn't been bought out in any sense. In fact, I don't think any shares in Windsurf went to Google. You've seen many examples of this in the last couple of years at this point. Scale AI with Meta had this happen. I think Lamini with AMD, different examples of that. A very different kind of new seeming normal thing for Silicon Valley. Like you either buy the company to acquire its people or you buy the company, Aku hires a term, but now you can kind of hire away the key people and the original company sticks around. This used to be an antitrust kind of move in the Biden era, but now antitrust is not really a worry. So it just seems like a new profitable or easy way for large companies to do these kinds of deals. And I think they were doing these kinds of deals originally to avoid antitrust inquiries. But then it started to become such common practice that antitrust regulators were like, wait a second, this is, you're just, you've slightly changed the approach here, but ultimately this is anti-competitive. And then so this had a lot of discussion in the Silicon Valley circles around like where the other Windsurf employees kind of screwed over in this deal because the top talent clearly you know got handsomely paid But the way this works in startups is you get some share of ownership in the startup. You hope that either it becomes a big profitable company and goes public or it gets acquired and your shares get transferred, converted to cash. that you can actually use, right? This is the kind of bet you make with startups. When you have this structure of deal where the company isn't acquired, but the leadership goes away, that in some ways like breaks the typical contract or expectation with being a startup employee, being someone who joins a startup. So yeah, lots of kind of questions by people around the nature of this kind of deal for Silicon Valley. And in fact, just like a couple of days after this happened, Cognition, who is a maker of the AI coding agent Devin, announced that they are acquiring Windsurf. So they kind of swooped in. They got the announcement that the top brass is leaving for Google. And now this other AI startup, Cognition, is now buying out the remaining company, Windsurf, which is quite the story. This whole business development, at least even in the startup world and business, this is pretty interesting stuff. and even more news on this front anthropic hired back two of its employees who had just left for cursor recovered this boris cherny and kat woo two leaders of developing cloud code announced to have gone to cursor apparently just reverted that again really weird kind of story in silicon valley two weeks since the announcement they apparently are going back to anthropics so wow yeah it's bizarre it is bizarre and on that theme continuing you know the way this was all kicked off is of course meta going on a hiring just binge just a complete spree of throwing around money to get top talent from open the eye and others and there are new developments at that front as well the reports of other high profile open ai researchers going to meta they've got open ai researchers jason way and also huon wong chong both pretty significant talents as far as i can tell so yeah it's there's now trading cards right you can see on twitter for when people swap companies going from open ai to meta or i don't know open ai to anthropic it's it's quite to me, I suppose, at this point. That's funny. Yeah, definitely. As you say, exactly. Kicked off by Meta putting all this budget into it. And I think it's also, it's a very, from speaking to friends who work at the frontier in these big labs, it is very stressful. It is super intense work because you're trying to stay at the frontier against other companies that are also spending billions of dollars on the same problem. And so very stressful work. And so I'm sure the money and the kind of, you know, these hundred million dollar contracts that supposedly Mark Zuckerberg is personally negotiating, you know, that's part of it. But I think also part of the story here, which I don't see talked about publicly, but it's just kind of my hunch is that you also probably, you know, if you've been at a frontier lab for years, you've been helping roll out cutting edge LLMs, you're kind of you're hoping that by switching to a competitor that maybe there's going to be like a bit of a culture shift that you're just hoping that somehow the new role is going to be a bit less stressful than what you've been going through for years at your current firm. Yeah. And in opening in particular, they have grown like crazy, right? They went from something like 1000 people to 3000 people in, I think less of a year. And when you have that sort of startup scaling, it just compounds the craziness. Like it must be really messy, really fast moving and chaotic now at OpenAI. And that could be one of the many reasons besides money that these people are leaving from OpenAI. One more story on this front, Meta has also hired two key Apple AI experts, Mark Lee and Top Gunter, who were researchers at Apple and now are going to Meta. So not just going after OpenAI. Every kind of top talent is being sought out by Mark. On a related story, Meta, of course, is doing this for its super intelligence efforts. And they're one of many in the field with OpenAI, of course, being one of the key ones. Mira Murati's Thinking Machines Lab has now closed their $2 billion seed round with a valuation of $12 billion. This, of course, is composed of a lot of people from OpenAI, including the former CTO, Mia Murati. And we haven't seen too much of them. They are saying that in a few months, they'll start rolling out some products and open source things of some nature. We've known that They have been looking at this kind of number, billions of dollars in a seed round of no product to speak of, and they got it. So the competition for AGI is certainly not slowing down. Yeah, if you're not going to take a $100 million contract from Mark Zuckerberg as an engineer that is one of the trading card players right at the top of their game, then the thing to do is exactly what Mira Mirati has done here. And yeah, we've seen other folks from OpenAI, Elias Iskiver, do a similar kind of thing with safe superintelligence. And The Economist did an interesting article a week or two ago that made the case that these AI valuations are completely insane unless AGI really is just a few years away. And I think that's quite reasonable, given the kind of revenues and profits you might expect. You know, there's word that some of these are being valued 100 billion, 200 billion, just absolutely fantastical numbers. And speaking of billions, next up, we have an actually very profitable business reaching that status. No. Yes. At least, you know, revenue. At least revenue generating. Yes, revenue generating, we don't know about profitable. This is lovable. They just raised a 200 million Series A just eight months after launching. They are now valued at 1.8 billion. And in case you don't know it, it's one of the big winners in the agentic kind of vibe coding world. Users can create websites and apps, just Vibecode it. Apparently, they have over 2.3 million active users and 180,000 paying subscribers that yields $75 million in annual revenue. I mean, crazy, crazy rise, super successful kind of play in the Vibecoding space at the exact right time with the exact right kind of approach. And I haven't used Lovable myself, but it's not like you see the code, right, so much as a Lovable user. It's more about it's like it's like Gen.AI of a whole application. Exactly. Yeah. This is for sort of non-technical people, broadly speaking, where you don't need to touch the code generally. And so it's focused on apps and websites, things that are not kind of super complicated, not the sort of things that, let's say, AI engineers tackle. and it's got a lot of users and a lot of people are building apps and websites at this point of this and just one more story dealing with billions of dollars related to xai spacex has committed two billion dollars to xai so that's one of elon musk's companies investing in another of elon musk's private companies there's also apparently going to be a tesla shareholder vote for tesla to put in some billions into XAI. So, you know, we could have an hour-long discussion about the weird business empire that is Elon Musk and the various moves of different business entities like XAI buying X that recently happened. But suffice it to say, XAI is looking for lots of money to keep, you know, doing what they've been doing. Nice. I think all of this $2 billion went to an alien-themed sex chatbot. Is that right? I mean, that's definitely one of the big investments that Musk is betting on, it seems. Imagine if there was no gravity, baby. And we're done with all this stuff with billions and hires. But next story in research and investments actually is related in some ways. So this is a blog post covered in this article with a headline, a former OpenAI engineer describes what it's really like to work there. So Calvin French Owen, who was an engineer there for over a year at OpenAI, has published this since moving on. It's not a drama type post. He just wanted to move on and start something new. And so there is quite a detailed kind of description of what it's like to work at OpenAI. He worked, for instance, on Codex, which is their agentic coding tool. And lots of interesting tidbits here, for instance, talking about OpenAI's rapid growth, where it went from 1,000 people to 3,000 people in the time that this person spent there. the crazy scale of as being a product that, you know, as soon as you launch something like Codex, you get a huge number of users using it. A lot of details on the culture of sort of being bottom-up, people taking initiative and doing different kinds of things. Lots of nitty-gritty stuff that isn't critical, isn't sort of dramatic, but interesting if you work in the space as an engineer or just follow OpenAI. This backs up the case that I I was trying to make earlier that people, you know, looking for, you know, some kind of culture, maybe, you know, just hoping that by switching to another frontier lab, they're not going to be in such a hectic environment. Yes. Like so many little bits that could be worth mentioning, like he highlights an unusual part of OpenAI is that everything runs on Slack. There are no emails. If you're a software engineer, that's a very interesting detail. If you, I guess, work in an office, that might be an interesting detail. Yeah, and I guess this is a slow week for research and advancements, Andre, that this is one of the key research and advancement stories, a report on what it's like to work at OpenAI. Yeah, well, we are trying to keep this one a bit shorter, so I decided to not include too many papers and do something a little bit different. We do have one research paper that we'll touch on. The title is Reasoning or Memorization, Unreliable Results of Reinforcement Learning Due to Data Contamination. So this is related to a whole bunch of research in recent months dealing with reinforcement learning for reasoning. There's been many papers kind of presenting weird ways to train that sort of work unexpectedly. things like incorrect rewards, things like training on super limited data. We've covered quite a few, maybe five, six of these kinds of papers. We also covered how there was skepticism and criticism of some of these papers that seem to be first a result of incorrect evaluations on these benchmarks. Now we also see that these results are very particular to a Quen model family. So the kind of claim here is you get these nice results on Quen, potentially because Quen was trained on the data of these benchmarks. When you actually do this on other models, you don't see the same sorts of positive results. And so that kind of basically disproves the conclusions of these other papers. Papers they do show that the correct kind of intuitive way to do RL works as we would know But yeah and an ongoing kind of development in the research world here Yeah. Leakage is a big problem with these benchmarks. People like training to excel at these benchmarks, but then the models maybe not performing outside of the benchmarks. All kinds of problems with benchmarks in this way. I actually recently did an episode of my show specifically on this. I'll look that up kind of while you're speaking next and have a link that people can follow if they want kind of like an hour-long discussion on the issues with LLM benchmarks. This is a really interesting one here because it's specific to one model family and it's researchers following a thread of surprising evidence where incorrect reward strategies were leading to reasoning performance or random reward signals were leading to reasoning performance. And that shouldn't be the case. It just shouldn't happen. And it would happen if there's leakage from the training set into the test set. Exactly. And they, like, figure one of his paper shows that if you give it an input, like, if you give to Quinn an incomplete question for how many positive integers greater than one is, and you stop there, the model autocompletes to the actual question and answer. So clearly there is data leakage that you can demonstrate, and this is not going to happen if you use LAMA, for instance. Nice. And then thank you, Andre, for talking there a bit. If people want to hear all about the issues with LLM benchmarks, episode 903 of my podcast, Super Data Science. Yep. I'm going to link it as well in the episode. So yeah, just one note on this paper, I think this whole story is an interesting examination of the super rapid pace of developments in AI. Now papers get published in a matter of weeks or months. There's not much time for good peer review. And so some things kind of leak through and the scientific process is struggling. At the same time, this showcases the kind of self-corrective nature of research, where pretty quickly after these initial papers, we've had these follow-up papers explaining or rebuking their results. So overall, an interesting kind of little micro example of the way that science works in the current world of AI. On to policy and safety. First up, we've got some big money coming from the Department of Defense. Anthropic, Google, OpenAI, and XAI have been awarded up to $200 million in contracts for AI development. So there's an initiative to integrate AI agents across various mission-critical areas. This is coming right after the launch of Grok for Government, a suite of AI products for U.S. government customers. OpenAI and Anthropic have already launched their own government things in June, actually. OpenAI introduced OpenAI for government. So, yeah, another trend among all these frontier labs is getting the money of a federal government is definitely a nice bounty to go after. On the regulation front, we've got California State Senator Scott Wiener introducing a bill to regulate AI companies. So this is SB 53. We covered this. This was a big deal earlier this year or last year with an effort to regulate that ultimately failed. It was vetoed by the governor of California. There was lots of lobbying. there's now a renewed push for this kind of bill with kind of tweaked details. And the key thing is additional reporting requirements and security protocols for AI models above a certain computing performance threshold. So still an ongoing kind of story, still a big deal if it does get passed, And I think we'll probably keep reporting on it as developments happen. And on the more concerned side of the spectrum, we've got an article titled AI Nudify Websites are Raking in Millions of Dollars. So one of the big sort of ethical issues with AI we've known for some years now is non-consensual explicit images. This has been a problem for years with even teenagers being the target of false imagery deepfakes that showcase them inappropriately. Now there are multiple, many websites. According to this article, there's an average of 18.5 million visitors per month, and these may be earning up to $36 million annually. So just to showcase the scale of the problem, there's a lot of talk about safety with XAI, so X-risk and issues like that. But we shouldn't forget that already there are super significant ethical implications and actual negative impacts being brought on by things like this. yeah you know i talked earlier in the episode about how it's kind of inevitable that you'd have you know the sex chat bots come out of llm technology and this is a really concerning thing that also kind of seems like an inevitable misuse in this case of the technology and yeah hopefully hopefully yeah you know hopefully i don't know how you regulate it exactly but maybe penalty becomes so large that it just becomes something that is very hard to find online, which it seems right now it's easy to find. Right. There are regulations being proposed and passed in some cases to target these kinds of things. So presumably it's up to Google and other cloud providers to go after these kinds of things. And on another topic related to concerning users of AI, we've also got facial recognition. So this is another thing that's been ongoing for years. We're concerned that you're going to have the ability to get someone's name and potentially other details just from a photo of their face. It was developed even before ChatGPT. And there's now this article inside ICE's supercharged facial recognition app of 200 million images. So ICE, the department within the U.S. that enforces immigration and has been cracking down quite hard, apparently have an internal app called Mobile Fortify that allows officers to use facial recognition to access a database of 200 million images. images. And these are images coming from multiple government sources, the State Department, CBP, FBI, and others. So if you think state surveillance is concerning or state police power is concerning, there's more reasons to be concerned as a result of AI, clearly. Well, yeah. So yeah, ICE stands for Immigration and Customs Enforcement. And ICE will receive, apparently is a part of this big, beautiful bill that was passed recently by U.S. Congress, that is going to multiply many fold the budget, billions and billions of dollars more budget for ICE. And it kind of makes you wonder. So, you know, in the beginning or recently in this current administration, there's a big focus on, OK, you know, this person is like shown to be a gang member. I mean, you still end up in weird situations where, for example, people who have been deported for supposedly being gang members, you know, these people aren't, they're not going to a judge, there's not much due process, and so they make some mistakes. So there's issues anyway, even with how they're doing it today. But if you're multiplying by many fold, the budget that ICE has, you're going to start, presumably the idea is to be taking, you know, there are a lot of illegal immigrants in the US, but at the same time, the US economy, for the most part, has a huge demand for those illegal migrants. So the construction sector, for example, I recently read 30% of people who work in the construction sector in the US are illegal migrants. And for things like food delivery apps, farming, oh my goodness, I mean, that's going to be way more than 30%. There's economic repercussions to deporting a lot of these people as well. So I don't know, it's an interesting, I don't have all the answers. Yeah, there can be a lot said about ICE and the state of US politics. Certainly, I have a lot of thoughts about many things that have been ongoing, but this is not the place for it. So I think we'll move on. That's true. Yeah. And just one more story in the synthetic media and art section that we occasionally have. Video game actors strike officially ends after AI deal. So video game actors, with voice actors in video games have ended this year-long strike that they have an agreement with major companies like Activision and Electronic Arts. There were 2,500 members of the U.S. Union SAG-AFTRA. There was a big vote and they had agreed for things like protections for their rights to their voice, wage increases, things like that. So we've seen this happen with Hollywood actors. We've seen this happen now multiple times. And this is the latest example of kind of the world of entertainment grappling with the reality of deep fakes and AI-generated media and coming seemingly to a new understanding of how to do this. Yeah, it's interesting. This is a whole world that I hadn't really thought of. So there's this woman in the article, Ashley Birch, who I guess is kind of a big proponent of this video game Actors Strike, our big player in it. And she's voiced a huge number of actors in well-known games like Fortnite, The Last of Us, many others, Minecraft. and you know i i hadn't really thought of this this whole world that i could i can imagine there would have been or i guess there could still continue there can continue to be tons of work for video game actors because unlike a film which would typically be at most like two hours long you could have huge amounts of dialogue that need to get recorded but now you could have you know use technology like 11 labs to generate it and that is it for this episode as i promised kind of a quick one. Hope you kept up. If you made it to the end, thank you for listening. And of course, thank you, John, for fulfilling your guest co-host duties. Anytime, Andre. It's so great to be back. Do check out the links mentioned in the description for John's cool YouTube video and related episodes. And as always, we appreciate your reviews, your shares. Even though I sometimes don't get around to replying to comments. Also appreciate your comments. So please do keep engaging and please keep tuning in. Begins, begins, it's time to break. Tune in, tune in, get the latest with peace. Last weekend, AI, come and take a ride. Get the lowdown on tech and let it slide. Last weekend, AI, come and take a ride. I'm the last of the streets, AI's reaching high. From neural nets to robot, the headlines pop. Data-driven dreams, they just don't stop. Every breakthrough, every code unwritten, on the edge of change. With excitement we're smitten. From machine learning marvels to coding kings. Futures unfolding, see what it brings.

Share on XShare on LinkedIn

Related Episodes

Comments
?

No comments yet

Be the first to comment

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies