

Doom Scrolling SORA2, Claude 4.5 Sonnet & Are Agents Coming for our Jobs? EP99.19
This Day in AI
What You'll Learn
- ✓Sora 2 can generate highly realistic videos with impressive visual quality and detail, even replicating specific scenes and characters from the past.
- ✓The technology raises concerns about the potential for increased 'fake' or AI-generated content to dominate social media, potentially displacing human-created content.
- ✓There are questions about the long-term value and usefulness of this type of AI-generated video content, as it may be more focused on entertainment and attention-grabbing than providing substantive value.
- ✓The hosts discuss the possibility of the technology evolving to enable the creation of longer-form, more substantive video content through the integration of other AI tools and techniques.
- ✓The release of a Sora 2 Pro API could open up new possibilities for the technology to be used in more practical, educational, or documentary-style applications.
Episode Chapters
Introduction
The hosts discuss their initial impressions of the Sora 2 AI-generated video technology and its capabilities.
Sora 2 Capabilities
The hosts explore the impressive visual quality and realism of the videos generated by Sora 2, including its ability to replicate specific scenes and characters.
Implications for Content Creation
The hosts discuss the potential impact of Sora 2 and similar technologies on the landscape of social media and content creation, including concerns about the proliferation of 'fake' or AI-generated content.
Evaluating the Value of AI-Generated Video
The hosts debate the long-term usefulness and value of this type of AI-generated video content, considering whether it is more focused on entertainment and attention-grabbing than providing substantive value.
Future Developments and Possibilities
The hosts explore the potential for the Sora 2 technology to evolve and be integrated with other AI tools and techniques to enable the creation of longer-form, more substantive video content.
AI Summary
This episode discusses the latest developments in AI-generated video technology, particularly the release of Sora 2 by OpenAI. The hosts explore the capabilities of Sora 2, including its ability to generate realistic-looking videos with multiple camera angles, detailed skin tones, and natural motion. They also discuss the potential implications of this technology, such as its impact on content creation and the risk of further fueling 'doom scrolling' behavior on social media.
Key Points
- 1Sora 2 can generate highly realistic videos with impressive visual quality and detail, even replicating specific scenes and characters from the past.
- 2The technology raises concerns about the potential for increased 'fake' or AI-generated content to dominate social media, potentially displacing human-created content.
- 3There are questions about the long-term value and usefulness of this type of AI-generated video content, as it may be more focused on entertainment and attention-grabbing than providing substantive value.
- 4The hosts discuss the possibility of the technology evolving to enable the creation of longer-form, more substantive video content through the integration of other AI tools and techniques.
- 5The release of a Sora 2 Pro API could open up new possibilities for the technology to be used in more practical, educational, or documentary-style applications.
Topics Discussed
Frequently Asked Questions
What is "Doom Scrolling SORA2, Claude 4.5 Sonnet & Are Agents Coming for our Jobs? EP99.19" about?
This episode discusses the latest developments in AI-generated video technology, particularly the release of Sora 2 by OpenAI. The hosts explore the capabilities of Sora 2, including its ability to generate realistic-looking videos with multiple camera angles, detailed skin tones, and natural motion. They also discuss the potential implications of this technology, such as its impact on content creation and the risk of further fueling 'doom scrolling' behavior on social media.
What topics are discussed in this episode?
This episode covers the following topics: AI-generated video, Sora 2, OpenAI, Social media content, AI impact on content creation.
What is key insight #1 from this episode?
Sora 2 can generate highly realistic videos with impressive visual quality and detail, even replicating specific scenes and characters from the past.
What is key insight #2 from this episode?
The technology raises concerns about the potential for increased 'fake' or AI-generated content to dominate social media, potentially displacing human-created content.
What is key insight #3 from this episode?
There are questions about the long-term value and usefulness of this type of AI-generated video content, as it may be more focused on entertainment and attention-grabbing than providing substantive value.
What is key insight #4 from this episode?
The hosts discuss the possibility of the technology evolving to enable the creation of longer-form, more substantive video content through the integration of other AI tools and techniques.
Who should listen to this episode?
This episode is recommended for anyone interested in AI-generated video, Sora 2, OpenAI, and those who want to stay updated on the latest developments in AI and technology.
Episode Description
<p>Join Simtheory: <a href="https://simtheory.ai">https://simtheory.ai</a> (Use STILLRELEVANT for $10 off)<br>----<br>00:00 - Sora2 Examples<br>00:56 - Sora2: Initial Impressions & Thoughts<br>26:39 - Claude Sonnet 4.5: It's REALLY good<br>47:09 - Claude Agent SDK & AI Agent Systems<br>55:05 - Is Claude Imagine a Look at Future Software / AI OS?<br>1:00:25 - Claude 4.5 Sonnet Dis Track<br>1:06:24 - "Real AI Agents and Real Work" & Enterprise Agent / MCP workflows<br>1:31:41 - LOL of the week Sora2 Steve Irwin Video<br>1:35:07 - Full Claude Sonnet 4.5 Dis Track<br>----<br>Thanks for listening and your support, we really appreciate it!<br>xoxox</p>
Full Transcript
Mate, I gave Sora 2 a spin last night and it blew my head clean off. The skin tones, the motion, it's just, I don't know, eerily real. Yeah, it's like it finally stopped looking like a clever filter and started feeling like a camera crew. Even the little folk. Good morning, Australia, and welcome back to the Today Show. G'day! So this week it's all about Sora 2. Crikey! We're diving headfirst into the adventure, and I can't wait to show you what we've got lined up. So, Chris, this week it's all about Sora 2. Good evening, I'm Mark Dalton and this is Channel 9 News at 6. We begin tonight with a wobbling mystery that's tilted a whole neighborhood. Bookshelf gate. When will local resident Chris Sharkey finally fix that leaning tower of paperbacks? This is the more important thing than Sora 2. The bookshelf. Breaking news. It's getting worse. Life admin is not my strong point. Maybe it will be revealed in a few episodes that this was all a trick. And this was really Sora 4 Max edition simulating the background as we went live. live simulating the collapse of the bookshelf like in your video just then but we do have a new toy to play with a new toy they call it uh sora sora 2 and open ai has transformed sora into instead of trying to build some sort of strange like video mashup system for professionals they've gone who cares about those use cases let's just build a tiktok-esque social network we'll slowly release it with invites via an iOS app and, of course, on the desktop as well. And here we are. We have Sora 2, multiple camera angles, pretty amazing clarity in terms of what you put in from the prompt to the generation. What are your impressions of Sora? I mean, it's hard to argue. Some of the videos are just absolutely amazing in terms of the quality, the detail, the associated sounds. the ability to just I guess use any historical or current figure you like without any fear of repercussions is pretty interesting and surprising so at the top I obviously use Steve Irwin who's relatively famous from Australia because I wanted to see does it know Australia's stuff because at the moment it's geo-restricted uh to the US and Canada but thankfully someone in our community not only got me an invite uh but then also showed me how to uh get into it as well so i did get access even though it is geo blocked i can't access the the ios app because that's a bit more guarded with billing but it's pretty incredible we did some tests with some like really old chain stores from when we were growing up in the 90s in australia one was called franklin's um you know yeah by a few commercials and it was able to replicate those commercials with a lot of accuracy which makes me think that maybe open ai is sitting on this like huge treasure trove of all the videos they scraped before you know everyone started locking down on this stuff it's sort of like maybe they have this huge piggy bank of content and media um or they're you know letting this thing just sit and binge Netflix or something because it I mean they must write like that it's not going to coincidentally exactly replicate things from our childhood just just because like it's got to have had some reference material to be able to do that yeah and like one of the other crazy things I asked it um I mean I reference it on the show a lot I live in a what's considered a regional Australian town And you would think, you know, the sort of US-centric train model wouldn't be able to, like, understand any of that. But I asked it to make an influencer video touring my local, like, town. I'll play a little bit of it, as painful as this may be. Wait, it doesn't do audio. Nobby's Beach with a coffee, and those views are unreal. Walk the bathers' way, it links up all the beaches. Brunch at this cute spot called Susuru, insanely good avotos. Cooled off at Merit. What's up, guys? Day-tripping in Newcastle. Okay, that's pretty painful to listen to if you're listening. Yeah, I was going to say, this is where, for me, it gets quite depressing, that there's going to be even more of that kind of content without people actually having to do the work. Yeah. What does this mean for an AI? Wait, hang on, us. What does this mean for us? No, but in reality, what does this mean for those people that produce all the TikTok attention bait dance? To me, does this social network thing from Sora take off and everyone just consumes slop and you come back to Earth in five years and people are just scrolling the slop while the robots are making the food? I mean, it definitely seems that way. A lot of people are doing that already, just scrolling through content. And already a lot of it was AI generated. This is just going to increase the amount and believability of those fake AI videos. I think that's the really sad part of all of this, that this incredible technology, that's probably going to be its main use case. So I initially intended to come on the show today because I had this dark moment last night. We haven't had a doom and gloom episode in a while. And I think it was just like exhaustion and being tired. And I was reflecting on this whole thing, thinking like, what does this actually mean? mean and one feeling i've had in my life today with technology is it feels like even though all the tech CEOs and people are always like this technology will help us bring us all together and make us feel closer and make you happier or whatever it feels like all the technology progress in a way is has been around obviously more sophisticated ways to sell advertising and just you know capture your attention completely and the net good on the world like yeah there's some positive in terms of communicating and stuff but there's also this negative toll of it feels like instead of bringing us closer together it's actually torn us further apart where we live in our own echo chambers and we don't actually just get out in the community anymore and interact with each other and then i thought about this especially with the cameo feature which allows you to put yourself in the actual video so you could in theory if this technology plays out be the true main character energy in your own series does this mean with content consumption now it's like this like just watching yourself all day yeah and like this just like sort of like admiring yourself in the mirror and it does seem that that particular feature is what captured the imaginations of people the most because apparently i didn't even know meta released some slop generator social network a couple of days earlier but it got zero attention but i think this one sort of resonated firstly because a lot of people follow open ai around this kind of thing but secondly because of the cameos of sam altman in the actual videos and people wanting to do that for themselves i remember there's this scene in mitchell and web where it's like an old lady asking a young man what his job is and he's like oh well i'm a futures trader and she's like oh futures trading so what do you do he's like oh well i basically make profit off buying and selling contracts with the future seller stuff. And she's like, and how does futures trading help? You know, a fireman puts out fires, like, you know, policemen, um, protects society. How does futures trading help? And, you know, he can't answer the question and it's kind of like this stuff. It's like, what does this do for the world? Like, is this a good thing? Like just when you look at it as a thing, like, or is it just a bit of fun? And I think you said before the podcast, a couple of the videos you made really delighted me. And I think that's what we always come back to with some of the new AI technology. It's like, oh, well, it's just a bit of fun. I don't think it helps the world, but it's fun. Yeah, and as I said, I was going to come on here and say, oh, who's this for? This is so depressing. This is sad. But if you take it for what it is, is a demonstration of where this stuff's going and you just have a bit of fun with it, like it is pleasurable. There are moments where I am sitting here having a bit of fun. Kind of thought, though, I had about it is, is this something I'm going to return to or not in a week from now or two weeks from now? Like maybe very rarely to check in what's going on and, or make something funny to send to someone in a group chat maybe. But I can't. The issue for me though was even if you wanted to make something funny for your friends or whatever, they're too short. It seems like every video you sent to me ended before it got to the punch line or ended before it got to the good bit. Like it just doesn't seem anything more than a tech demo to me. It's more like, here's an indication of where we're going. Like, soon we're going to be able to generate entire movies that have character pinning and are realistic and have sound and can follow direction and all that. Like, it's very clear where the technology is going. But really, this is just basically a tech demo that you can muck around with for now. There's no actual value here. Yeah, unless you sort of see it as like a model step progression, like the early GPTs to now, where the rate of improvements, you know, faster and faster. and you know it starts off with these like short term like lol content things and then that evolves into like longer videos and you know and you sort of play this out as Sora becomes this like personalized entertainment platform or something I mean that's probably the vision that they have uh but yeah it'll be I'm very interested to see if in three weeks from now anyone cares anymore or not because that's generally the litmus test for this stuff does the height wave pass and no one uses it anymore or is this just an absolute mainstay and it starts to take usage away from tiktok i'm not so sure one of the interesting things about it will be if they i think you said that they're planning on releasing an api for it if they do that it'll be interesting because if you look at the video maker you built there's a lot more value in giving it a general theme and a and other details where you can actually stitch the clips together and combine them intelligently into something longer form. So if the pricing's reasonable and the quality's this good, that would be really exciting. That's what I was thinking. Like, the video maker that I made is not necessarily perfect, but it essentially was just taking all the available tools out there, like Suno for background music, 11 Labs for good audio. And then, like, I did have to sort of downgrade the video model, but I don't think my video model's any worse than theirs, to be quite honest, to get the motion, basically to get the video cost down, because otherwise it's going to be way too expensive. But then you've got that OmniHuman lip-syncing technology now, which I have actually improved quite a bit. I realized that that lip-sync issue that we were having early on is actually the, I had the rate of the voice too fast, so it was getting confused. But if you piece all those things together, It's very possible right now to make a documentary to educate yourself on something or longer form video that feels a little bit more useful and less sloppy to me in that regard. But I can imagine Sora getting there pretty quickly on that as well. And I think what's quite incredible about the model they've built here is they've clearly like gone down this path of building this model for purpose because they say in their announcement they're going to release the Sora 2 Pro API, which is a different version. I'm assuming that's like a generic video model, like a model behind this. And this one is optimized for like fast inference on the model, low cost, and also tuned to come out with these comedic videos off pretty bland prompts. Like some of the prompts I did were terrible and the output was quite entertaining. And like I would say hit and miss ratio, it's like 70% hit, 30% miss. Whereas normally with these models, it's like 70% miss, 30% hit. And so they've done an amazing job of stitching all this stuff together and sort of being like what Apple does with technology. It sort of sits back, watches what everyone's doing with this stuff and then unifies it and brings it together in a pretty simple, nice package. Almost like this whole thing is a marketing exercise rather than a legitimate product release. They're just like getting back into everybody's attention, being like, hey, we're still here. We're still making the best frontier stuff. i think that's the sad part for me though is like a lot of the stuff around ai is just trying to get all the attention so i guess you get the signups and you get the platform play going but in reality it's hard when these people are like you must stop us you must slow us down like we need to have a pause for six months and then it's like and like oh and we're gonna work on curing cancer and all this other stuff oh and by the way here's a here's like a an attention wasting time for this generation another app to waste their best years on like instead of communicating with each other i mean that's the cynical look at it right yeah the actions don't really match one thing i'll give them though is they announced it and they released it that's impressive that isn't always the case like with sora one they didn't do that at least this time people can actually use it which is really nice how are they going to handle these copyright issues though like look at this like if I'm, if I'm like the Irwins and like this guy on this podcast is just making videos about Steve Irwin, like that, I don't know. I'd be a bit troubled by it to be quite honest. Um, but I did want to show you one experiment I did because at the end of the post that I just, I want to read it verbatim because it's like, this is, so this is the post announcing, um, Sora 2. It said, uh, video models are getting very good very quickly i don't disagree general purpose world simulators and robotic agents will fundamentally reshape society and accelerate the arc of human progress sora 2 represents significant subservient underclass do they think that we're buying this bull like there's no way in keeping with open ai's mission is important that humanity benefits from these models as they are developed we think Sora is going to bring a lot of joy creativity and connection to the world like that that's what I was getting at I think everybody fears the rise of AI but I don't think anyone thinks that Sora is going to be the thing to take down humanity but anyway all the promoters on X are like open AI has developed a world model and let me show you the physics engine behind this world model now with another Steve Owen video what i love is that's the kind of crazy that uh steve erwin would have done we're getting our censorship like our our beeping on time today it's not traditional yeah that's that's off brand uh so yeah it's look i think it's a fun toy i think it's going to be incredibly popular like when they release it more broadly um and it's not invite i i do think it'll become really popular especially with young people like super fun to play around with um and maybe a mainstay in their like app drawer but yeah that that whole feeling around it of like is this like again our generation has been these you've taken like all the smartest minds i used to complain about this with facebook you've taken all the smartest minds and figured out how do you serve better retargeting ads i mean that basically sums up like the generations before went to the moon allegedly um went to the moon and our generation has just figured out how to capture people's attention and sell them ads like i i don't know i'm like slightly ashamed of it the other crazy factor that that i can't really reconcile is the cost because if you look at models like one of the reasons you had to back off from using VO3 in your video makers, it's just way too expensive. Like even if your company's paying, you don't really care about the money. It's very hard to justify what is it like 40 cents a second or some crazy amount of money when you've got to iterate on these things. It's just clearly that it's costing the providers a lot to run these models or they wouldn't pass on a cost that high. And even the models that are constantly trying to push the price down can't seem to get it down on these heavy video models and yet Sora is as good as if not better than all of them and they're really just doing it under existing plans and stuff or like okay yeah maybe the pro plan but if someone's sitting there smashing these out all day surely they're losing money overall yeah there's no way I unless so if they aren't losing money they've had a huge breakthrough in this model like this model is a massive breakthrough but you think they would have bragged about it um it seems to me like they're pouring money down the drain to offset like any attention on google or vo3 but vo3's mistake as someone i'm kind of stealing this opinion from x someone pointed out that you know the cost obviously just way too high for anyone to muck around with and share and do anything with how you accessed it was really confusing it was like access it on Vertex AI or in AI Studio or the Gemini app. Oh, now in the Gemini app. Now it's a button. Now it's gone. Now it's a button again. Like, it was just so poorly done. Mike, I won't hear a bad word said about Google Gemini because they sent me merch during the week, and anyone who sends me merch gets my unequivocal and uncritical praise no matter what, and I'll defend them till the end of time. To keep that in mind, listeners, if you want me to defend your brand, send me merch. Send me merch facts. In saying that, I totally agree. Accessing VO3 is really difficult. And then there's also other providers host VO3 unaccountably. They've got trusted partners who host it, and it's really just confusing what it is and where it is and all that sort of stuff. The results are amazing, but you're right. It's never going to get this widespread attention where everybody's talking about the next step in AI, even though VO3 makes some of the most amazing videos. But we were talking about this earlier with, So in Sim Theory, we have the Code Interpreter as an MCP, so it can call out the Code Interpreter, which is really just like a Python box running, right? And Code Interpreter has always been an amazing tool, but it's had a problem where for your typical average user, including myself, you just don't remember what capabilities or things it can do. So what I did was I started cherry-picking processes from code interpreter, like making a chart, and then just engineering a custom MCP called Make a Chart, where it just focuses on that one use case. So when you ask your AI to make a chart, it's like, oh, I'll call the Make a Chart tool, like I'm a dummy, like I'll just call the most obvious one. And then the user gets the output they expect. And I think with VO3 in, say, Gemini, the challenge there is it's like, well what end product am i getting at least with the new sora it's like well i'm getting like a tiktok style video that'll make me laugh that's an end product like it's not a tool anymore it's a product and i think with vo3 or vo4 when they launch it it should be focused on use cases within their own apps like oh i want to make a you know a video podcast with two average guys and a bookshelf collapsing and like that is well i mean a good example of that is what they did with notebook lm and the podcast maker like that got a lot of attention because they actually turned it into something where you get something at the end something really useful shareable um rather than just giving you the tools and you've got to like painstakingly go through the process over and over again it's why the video maker that i made also i think was resonating with people um even though it costs quite a bit of money you can actually make some useful videos with it uh and like they're videos that you could share and use internally like training videos stuff like that uh and so there was a there was a use case for it there's an obvious like oh okay i'm going to use this to do this and so to me that's where like vo3 might still be a better video model like fundamentally i haven't seen sora pro 2 or whatever it's called yet through the api but it does feel to me like outside of the sort of cutting maybe that vo3 is just still a higher quality model and you would expect over time like with access to all the content they do through youtube you would just assume that google's video model is better at some stage like remember sora one everyone was like this is the greatest and then like two three weeks later open source had already kind of caught up and then google came out with vo3 with audio for the first time and everyone was like whoa that sora is like old and bad so i think maybe in a way open ai has realized like we just can't compete in that model war so let's go after like a social network where the model matters less i don't know that's one idea behind it like how do you monetize this this line of model in the in the business well i mean i guess the wider implication in the long run is there's going to be certain things in terms of like video construction, advertising construction, those kind of things where some jobs just aren't going to exist anymore because you're going to be able to direct an AI to do virtually everything a full film crew and stuff could have done previously. And so like we had someone on the This Day in AI community the other day or Sim Theory community, one of them, produced like an ad for their local hunting store or something like that. Now, think about that with like TV advertising, like ads aren't exactly like, you know, award winning cinema where there's all these factors going into it. You're just making some videos that advertise a thing. Like we're getting to the point where we're stitching these tools together, right? You could make an ad as good as any ad that's out there, really. But this is where I think the opportunity lies for people listening, thinking about like, what could you really do with this? to me someone could make purpose-built video editors like you know like cursor for video but like you can vibe out a commercial for your small business and it just focuses on that one use case probably just giving away someone's startup right now and they're like shit don't let anyone listen to this podcast don't share it yeah i maybe so but i think you could assume you're going to get access to like sora 2 pro right and you could build like a vibe code video i mean it maybe it already exists and I'm just unaware, but that's what it feels to me like with a lot of this content, like the real impact will be the price of media generation goes to zero, but it doesn't necessarily mean that there is an opportunity in that. Like there'll still be people just like people vibe code a product that don't want to code. Just like I want to consume funny Sora videos. I don't really enjoy creating them that much. Like I prefer just watching the funnies. and so I think there'll still be a two-sided marketplace to this transaction of content but it's just yeah new new tools for the job and the price will come down like it's deflationary for sure which is great because I think the use cases that excite me about it are the more corporate ones in terms of education sorry I guess that's less corporate but like education and training I I think is a massive one because having bespoke video-based interesting content that does it in a way that people prefer to learn is very interesting. And then the whole idea of like having a custom video to start your day or something that's going to catch you up on the things that you're interested in in a very engaging, funny, interesting way, like those things are really exciting to me, like where the content is actually really personalized and very different to anything you can get out there right now. Yeah, if you put this technology to good and your addiction to your phone and social media was like videos that were engaging and educating you on things or briefing you on different aspects of your day or like whatever it is that you need, or if your child's learning a concept or a language and it's very addictive like Doomscroll, but they're actually learning from it, then maybe it does have a really good place. um i do want to quickly play this video um it a blooper reel from sora 2 and i think it a look at the sora 2 Pro model So this is the behind the scenes of their launch video that they made with Sora 2 But you know it in 16x9 there no watermarking on it and it looks a lot higher quality So I'll play it. What is this? There's no wind! This is supposed to be epic! Turn it up! Turn it up! There's too much wind in my hair! Cut! Damn it! Okay, because a lot of people listen to the show. I'm going to cut it. God, they're losers. Like, they're seriously sitting around doing this stuff. Like, what a waste of time. I don't know. I disagree. I thought it was really cool. Like, they made the launch video in Sora with cameos of all the people who worked on it. I think that's really cool. And then they did a blooper reel of that to show how amazing the technology is of instruction following. Like, you've got to give it to them. Like, yeah, the content's cringe, but it's still really impressive. but it does look pretty good. I think the thing I'd be concerned about really is the audio because the audio quality, and I'm sure people listening are going to hear it thinking it's our bad editing. It's probably partially that, but it's also the audio has this sort of underwater grainy feel to the whole thing where you just know, like it's very generated. So I think if they can get to the quality of an 11 Labs V3 model, you know, now we're talking, or like a Suno style like music and sound generation than you're getting somewhere. But again, I think the opportunity here is someone can stitch these things together. Like this is just another available model. The Sora app is just one way of showcasing this technology. And yet again, here we are with another incredible tool in our toolkit. Yeah, I'm very, very interested to see the pricing on it. So let's move on to a new model from Anthropic. I'm thinking maybe I should be back in the little Dario pendant necklace here. Oh, you've got it. Available in our store. I did find it. I lost it. That's how little I was using the Anthropic models for a while there. So you always wear it when you're using the models, do you? Yeah, or if I'm hoping for a new model, I wish upon a Dario. You rub it five times and like dance to a dance. It's a thing. I'd seriously on my Twitter profile page have a video of me rubbing it, wishing for a new model. So we got Claude Sonnet 4.5. Not to be confused with Claude Sonnet 3.5, 3.74. It is Claude Sonnet 4.5. It's a hybrid reasoning model with superior intelligence for agents and a 200k context window. So Claude's on at 4.5. What are your thoughts? Well, just to clarify there, it's not just a 200K context window. It's also a 1 million context window if you enable the beta flag, which we have. So it's actually 1 million. The trick is it's the same pricing structure as Grok, where if you exceed the 200, or I think maybe it's 250K context, I think it's 200. If you exceed 200, the pricing doubles for the whole request. Wowzers. So that does make it the kind of cost I just sort of put out of my mind and pretend it doesn't exist and just do it anyway. Because $1 million is amazing because one of the main reasons I constantly use Gemini 2.5 was just maintaining that larger context over a long session and therefore getting into the groove with your AI agent and just getting a lot done. And now you can do that with 4.5, and it's really good. So that means if it's a million, so I don't understand. You pay $3 per million input, and then if it's over $500, you pay $6 per million for everything? For everything. Ooh, yeah, that's pretty pricey. What about output? Does it double? It also doubles, yes. Ouchie mama, that's $30 per million output. That's expensive. We should be charging more for this model. Yeah, I was going to say, don't look at the bill, Mike. Yeah, wow. My goal is always just to provide the users with the best and latest available that we can. That was our goal always with this stuff. And why muck around? The other thing to note as well is that it has up to a 200,000 token thinking budget. So if you enable the highest level of thinking, you can allocate 200,000 tokens for that. Now, it's a little, I don't know if it's just, we use AWS when we use Anthropic Models. And I don't know if it's just them, but they advertise 200,000 thinking budget. But if you enable the full budget, then it's like, I've got no tokens left to use for output. So we can't actually output anything. It's using everything to think. It sits around thinking. It's like, well, I'm not telling you. Like, you know, you didn't give me the budget to actually tell you what I thought about. I filled up my context. So that thinking budget, at least as far as I can tell, correct me if I'm wrong, everyone, you have to sort of reduce a bit to leave it some space to actually output stuff. But that's pretty interesting because that's a massive... I mean, formerly, Sonnets, except for three points, the Sonnets only had 200,000 context windows. So now you can use an entire context window worth of thinking on top of all the input as well. But that's only if a million context is enabled, right? because if you use 200k thinking input... No, because... No, that's not correct because they're output tokens. Thinking tokens are output tokens. I see. The thing that I'm hung up on about this model, right, is they're still charging that premium of $3 per million input, whereas GPT-5 is $1.50 per million input and Gemini 2.5 Pro is $1.50. Like, are you getting double the value from Sonnet 4.5 from your initial impressions? So I've got mixed impressions about it. I've had lots of different experiences. Firstly, we'll get into the API stuff next because I think they've made some really, really nice improvements on the API side, which have big implications. So we'll talk about that in a minute. But just generally using it, I like it more than GPT-5 because it's faster. Like, you get an initial response faster. it gets me to my solution faster. And I've found that generally speaking, I've been able to use it as a daily driver this week. It's been pretty good. Also, with long running, and this is one of their goals of the model release, so it makes sense it's good at it. With long running lists of tool calls, it is 100% able to stick to the goal. I think you've talked the last two weeks about the idea of an AI being able to get, I forget what you called it, but like get back to its purpose. Like here's our overall list of shit, like plan we're going to do. Hang on. Whoa, that button got stuck. Sorry. Here's our list of tasks we want to do, like do each one and then get back to the main goal. It is absolutely unbelievably fantastic at that. Like it is blowing my mind. So for example, when I want to test an MCP, I will give it a list of all of the tools in that MCP with the instructions, like literally the manifest file of, Here's all the stuff it can do. And then I say, write a prompt that will test all of these things. And then I run it on Sonnet 4.5 and tell it to do all that. And it is able to go through, like to give you an example, like Gmail or something, sending an email, sending an attachment, adding a calendar event, deleting a calendar event, like just literally everything this MCP thing can do. Make a checklist. If it makes a mistake, it'll correct it and try again. If it has to do research, it'll go off and do research in between, and then it'll get back to the checklist and complete it. Then at the end, it'll give you a summary table of what it's done, any corrections that need to be made, suggestions. Its ability to just stick with a long-running complex task over a long period of time is unsurpassed. I haven't seen anything like it in terms of its ability to be able to do that. so that is just really fantastic yeah it's agentic capabilities to me are the best still just like sonnet 4 but now it seems vastly improved i think also the optimizations around speed as you said like i don't know if it's just like amazon got their act together with this model after all the criticism of the rollout of course on it for me a shirt guys yeah and i'll say positive things but otherwise i won't yeah it's much much faster and it yeah it feels a bit uncomfortably fast now that i'm used to lag in these thinking models and generally i think from a like coding standpoint and just analysis and research like a lot of their claims are true i think it is the best model if you want to use multiple tools to do research like a lot of the tool calling if you think say you're researching something for medical if you give it pub med and access to scientific papers and different search tools and deep thinking tools and you know all that stuff and you tell it go broad use all those things it's the only model i've seen so far that does as you say stay on task follow the prompt and consider all the tools gather all the context together and then output you know it like put all that information together comprehensive like cited answers and it'll go wild it'll do like four batches of 10 calls to different tools to do the research for example like it really doesn't hold back when it comes to using the stuff that's available to it to get the job done yeah and i think i i can't find it at the moment but they they were able to get it in sort of a more agentic setup to run for like 30 hours or something and recreate, you know, Slack and like a bunch of applications, obviously not like production ready or anything like that, but they're, they're able to send it off for quite a long time now in these tests to, to just keep working away at a problem. And I think we've seen that in the, in the length of sessions that it can do as well. Like if you prompt it, right, it will just go on and on and on trying to pull tools and do a bunch of work for you in the background. So I do, I believe today it is the best agentic model. Like if you're going to build an agent right now, it seems like the best out-of-the-box model to do that on. There's a few strange things in these benchmarks. I honestly don't believe them. I just base it on my real world use now, but it's like a few basis points better at agentic coding. I would say it's far better than flawed Sonnet 4. at coding like leaps and bounds better i would put it on par now with codex the codex model which is like the dedicated gbt5 coding model i can't tell them apart anymore um at all and i couldn't believe how quick that it feels like codex was surpassed by claude sonnet 4.5 and i quite frankly put that down to the tuning of the sonnet model um it feels slightly more intelligent now and it's just tuned so well that, I don't know, I find Codex is very rough around the edges and it puts me off using it quite a bit. But I would also add for very hard thinking problems or where I need the highest level of intelligence, I'm still using the GPT-5 thinking tune. So I still think GPT-5 thinking is the smartest available model through the API by leaps and bounds but i'm not you can't really daily drive it you can't really work with it day to day on things i just sort of phone a friend occasionally to it or get it to plan something i'm working on and then like one interesting way of using it is if you upload like your company's financials and then say to um gpt5 thinking like oh sorry claude sonnet 4.5 go analyze all this data call a bunch of tools and do some research on the market and then gather this all up and put a report together and then ask gpt5 thinking to reflect on all that stuff that's a really good workflow to get um interesting insights whereas like i think that's the level gpt5 thinking's at so for me that like i don't i'm still not at a place where i'm like oh one of these models like back when Claude Sonnet 3.5. To be honest, that's all I used was Claude Sonnet 3.5 when that model was king. And now I find myself at the current point in time of just like, if you want long output, I go to Gemini 2.5 Pro because I know it's the best at output, consistent output tokens, for example. So yeah, and I've heard examples from our community of people saying they really like the GLM 4.5 model and are waiting on the GLM 4.6 as a daily driver because it's way cheaper and it can do a lot of the same stuff. So I think there's definitely like scope at the moment for jumping around the models. And I'm probably jumping around models more than ever at the moment. It's not like someone's blasted it out of the water where there's just a model that I'm always going to now, like I was for so long with Gemini 2.5. But now I would say it's probably like less than 30%. I'm switching around. I'm going to it when it feels right. but not always. And so it's interesting that I do need to point out though, I've experienced a couple of items of weirdness with Sonnet 4.5. Now this could be Sim Theory's fault, like this could be our fault. So I don't want to like completely trash the model on this, but I've had a couple of really odd situations where like it's outputted a list, but all the headings are in French, for example, or it's outputted code and it's cut off the code too soon. Like the code has just stopped in the middle and but then it'll put a summary at the end so it's not like the model stopped producing output it's not like the token stopped streaming it just stopped putting the code there and i've also noticed a little bit of laziness occasionally slipping back into the coding when it comes to sonnet 4.5 it's got that gvt4 laziness for sure that old yeah it could be just like an early tune of it or something but it's just something that you know it really gets you because I've tuned my agents, at least, to not do that lazy stuff, so I'm not used to it happening anymore. And then when it suddenly gets you, it's shocking, and then I immediately, like an angrily, switch models. I'm like, how dare you do this to me, Patricia, and change. And so that's probably the only downside, definitely the only negative in my mind so far about Sonnet 4.5, but it's good enough to overcome that, and I don't think that'll be a long-term problem. It's just hit me a couple of times, and I thought I should at least mention it. I still find myself, though, going down this path. And I would like to have the time to make a video on, like, how I work day-to-day with these models. Like, what's my workflow and why do I switch and when? But I often think I just have to record myself working for an hour to demonstrate it. Because there's points I'll hit with even Claude Son at 4.5. And I'm just like, you know, I hit this wall and I just am like, okay, I'm going to Gemini now. or I'm going to GPT-5 at this point. Yeah, it's funny you say that because I've had an urge where I was like, maybe I should just stream myself vibe coding with the models and just show how it works and what those decision points are about when I would switch, when I would change to a new session, those kind of things. Yeah, like at the moment I find myself when I'm tackling a problem and this isn't just code, it can be like a business problem or writing a document or whatever. I will have three tabs open getting three different models churning away on the problem first up proposing a solution and then I'll just quickly flick through them be like okay this one's the best and just go down that path with it and I just still don't think which I like I don't know if I'm should be surprised at or not surprised at but there's just no clear winner with the models or the tunes and i think sora 2 the tune of the sort of tiktok hilarious video tune of this to me illustrates this more than ever they've had to very uh succinctly tune a model for that that use case and that output so i think increasingly instead of seeing these like you know be all and end all models i wouldn't be too shocked if in the near future we see tunes from providers where it's it's more like the codex tune where it's like a a version of gpt5 just designed for code or like have a model that is just clawed sonnet code and they just keep updating that to make it better or clawed sonnet finance or clawed sonnet medicine and they are just slightly tuned for that particular use case i you know maybe that's one way of doing it or they're all doing routers but it seems to me like that is the best approach. It's just tune away until you get the right tune for the particular use case you're working on instead of having the global model, especially now that it's starting to become established what people are actually using the models for. Yeah, that's a good point. Now, the next thing that we need to cover is there were major model, so a model isn't the right word but I guess API updates around Claude but really like things that you can put into the model to get to change its behavior right and there's there's some really good ones in there and they really are all around this agentic long-running process concept and they're very good and the thing I like most about it is we spoke before about I don't really like the idea of the GPT-5 Pro thing where I give it the context. It goes off and in its magic box, does all the work and reports back to me in four hours when it's done its task. I like the idea of an iterative approach where you're giving it the latest context. So for example, in computer use, it's seeing the latest version of the screen. It can see the latest version of the files on your disk and all that sort of stuff. And then it's going through multiple round trips to the model and getting the next steps and things like that. Now, so the things that Anthropic have added is an automatic context management. And so what this is, someone actually asked this in this day in AI Discord, which is how doesn't it run out of context? Like if it's running for three hours, how doesn't it eventually fill up its context window and then fail? And the answer is Anthropic has added a feature in beta, which will automatically control that context. So it will start to, based on rules you can give it, or automatically remove the oldest context and provide like little tombstones or summaries of what was there, but not the full content. So you can just keep calling the API over and over again, and it'll gradually manage that context automatically for you. Now, in Sim Theory, for example, we have our own detailed code that does this, and that's how we're able to have these long sessions with models. But this is the first time I've seen it built into the model where you can actually just do it with configuration. So it's a really, really nice addition to the model and really essential for something like computer use because obviously if it's going to run for hours, well, it can't fill that thing up. Related to that, they've also introduced context editing. So you can do things like send a command like clear tool uses. So once the tool uses are done and the AI has produced its response, it doesn't really need all of that data in there. So you can specifically say, keep the full chat, but clear the tool uses. You can also say, clear at least this many input tokens out, and it has its own strategies for cleaning it up. So they're really doing a lot of work around this idea that the task will run for a long time and that managing the context throughout that process is important. So that's a really, really great update in terms of computer use. The other one they've added, which is interesting, is a memory tool. And this is basically building knowledge bases over time and keeping project state across different sessions. Now, this is something we've had for a long time in Sim Theory, at least a year and a half, or since the beginning. With our system, we call it a knowledge graph where we keep that information. So Anthropic has now just built a tool that's like a generic internal tool that will manage that memory for you. So you as the developer are responsible of storing it somewhere, like in a markdown file or a database or something, but the actual model itself is deciding on what changes to make to that memory, which is really interesting. I haven't tried it out yet, but I always would try to favor the model provider's way of doing things over my own because they know the model best. So these are really interesting updates, and there's a few more, but they're sort of more technical. But it's very interesting the direction they're going, and I think this points to a bit of what you're saying around the tune. they're obviously optimizing for the very cases that we're judging it on yeah and does it worry you that with the knowledge graph and all these other components and i'm not speaking from like our perspective running something like sim theory but more from you're in enterprise like do you really want anthropic storing at the api level the knowledge graph they don't though um the important point is that it's a tool call that'll tell your system what to do oh i see so it's still storing on your side okay yeah so you're still storing it securely but it's saying add this to the memory delete this from the memory summarize this part of the memory etc and then your system has to then comply with that so no they're not they're still not storing it yeah but i think it does go back to these models being or the model providers or labs providing an ai system for you to build an agent on yourself and whether that then lends itself to them having like their own agent builders which i'm sure at some point we'll see um you know that that's probably what it'll look like especially given that they have the agent sdk structure now uh you know and they say that's how they built that they're all the pieces of clawed code which has been very popular i think this does give people the opportunity to go and build the Claude code of blah pretty easily on top of that SDK. That's right, because not having to develop all of this stuff yourself really accelerates things, because you can just lean on this SDK to handle all of those things for you, and the Claude agent SDK has a whole bunch of other additional things that are useful, like looping, like session management, those kind of things that you would need to build an agentic workflow so it would make sense if you're making something from scratch like a clawed code for industry whatever a vibe docking for industry whatever you could build it on this agent sdk and save yourself a lot of time so like i guess this brings me back to the point though um of like future software because if you think about a company today like there's been different ways where you'd have like the sort of all-in-one platform where you buy into say the salesforce ecosystem or the Microsoft ecosystem or whatever. And you have a series of apps in that everything's allegedly perfectly integrated and you use all those different apps. And those businesses are building in AI agents. I think Microsoft announced there's some alpha of an Excel agent either out or coming out soon this week. And so they're building sort of the agentic workflows in those existing apps and that kind of makes a bit of sense to me but then you've also got people potentially building like new vibe doc editors with the code framework putting them out there I guess what I'm saying is do you imagine a world where you go and use these very specific apps like I'm I'm gonna go to vibe doc.com now or vibe office.com because I need to create a doc or do imagine a world where this is just fully integrated like I in Claude or I in Sim Theory or whatever and I need to create a doc so I creating it in there because all software in theory can be rendered by these models like to me there this question of disruption around like it are we just consuming all of these software and interactions we have with the computer from a singular interface and singular model in the future or do you imagine a world where a company does have multiple subscriptions to different apps like they did in the past like for me i just think about how the ease of building something like this now like you could in theory build your own word processor for your own business specific to the documents you're creating like if you're a law firm you could have your own like mike's law contract builder uh tool right like it like it can be so specific so bespoke um a lot of it for me is about the centralization of context and processes. Because like in your law example, you would have processes as your law firm that you follow, like checklists or template documents, and you would have reference materials and sources you consult. It's about gathering all of those together. I don't want to have one system that gathers all my context together, gets me all geared up and lathered up to make a document, and then jump over into the Microsoft Excel Vibe Doc Helper 2.0 and have to get all that context out of one system into another just to work with it in their ecosystem. And like you say, if I've got five of these subscriptions and I'm just passing around this context, then what have I become? I've just become a slave to the software and a slave to the AI. Like the power we're seeing from MCPs and the centralization of context building is that you can then take an educated AI system and say, now go make this thing, and it can do it perfectly. So in my mind, you need the output tools or the output creation tools right there where you've got the context, right there where you've got the tools, because otherwise you're just adding some manual step. Or then there's another layer of freaking APIs and MCPs where you've got to have an MCP from your main system into the Microsoft whatever, and then it consumes from there. And then it's just a software integration nightmare. So no, I think it will lead towards centralization. Perhaps in the short term, people who haven't discovered a centralized platform that brings it all together and allows you to do that will still benefit from, say, a VibeDoc thing in Word or in Excel or whatever it is. But I think in the long run, those pieces of software will become less useful because it's the AI that's going to be operating the software, not you. and so you having that sorry to interrupt but does that mean that you think in the future like a lot of people use cursor today like i would say the vast majority of developers are using something like cursor right but then there's been a lot of hype lately around like command line tools like codex cloud code um those are introducing interfaces as well um but you could do the counter argument like if developers are sort of the leading adopters of this tech well why aren't they just using say chat gbt for everything if this is true that you will consume everything through a singular because i would say the answer to that it's exactly what we're talking about which is cursor has the output type cursor has the ability to actually actually actuate what they're giving and it also combines that with the ability to build context and that's the amazing thing about something like cursor is because it can access all the files it has the full context to know what to do. And then it has the ability to output it. So I would compare cursor more to like a centralized system, but it's just built for purpose. It's built for coding. And I would say that eventually you'll have centralized tools that can do this for multiple things across an organization, not just coding. Rather than, it's not like they're taking cursor and then going off into an IDE and then using that like a dedicated one, a separate one. Yeah, I see what you're saying. It has a context to do it. But then could you also argue that if Microsoft with Excel is to allow you to gather context easily about your business from the integrations in the Microsoft ecosystem, that you could be vibing in there and it does have full context. Yeah. I mean, yes, I think that's very possible. And I'd say that's probably what Microsoft are going to try and do. Yeah, that's probably the vision for it. But, yeah, I can also see a middle phase where, like, yeah, cursor and vibe docking or whatever is critical but you can imagine this stuff gets to a point where people are just building software never even looking at the code because the models are so good at that point where you know you like especially for like internal applications and a lot of the things people use it for like replacing pieces of like you know sass and stuff um you know maybe it's at a point where they are just rendering what they need and storing those renders in something like a chat gbt um to access them for their business like i can i can also see that path as well being the longer term path i agree i think we are going to reach a point where we get beyond code and the code is just something the ai worries about like and i say this i'm a program i've been a program in my whole life but i just can't see a world in the future where people are going to be typing out code on their own there's no point it's a waste of time now so uh that noise was booting up claude imagine claude imagine this is i it's a cool cool little demo um and it's a sort of pretend operating system in the browser with a few sticky notes on the screen. And this is just a proof of concept of, we've talked about it on the show before, this idea of sort of, we call it like a glass UI, where it's generating something on the fly. And in the bottom, there's what do you want to build? And I can say notepad to keep notes. Make a pig grooming management system. Pig grooming. Open my pig grooming management system. All right, let's do that. Just a true challenge. It's like fairly quick. Here's my note. Note to note. Oh, wait. My pig grooming management system's opened up. Look at this. So my revenue apparently is $3,000. I've got new appointment. Add pig. Clients over here. This is pretty good. So I'm going to add a pig. Can I add a pig? Add a dog detection filter that will detect if it's actually a dog and not a pig. So it's just rendering this screen. It's just building this UI in real time, on the fly, like an entry point. Pig name, breed, age, weight, owner name, phone, email. So I'll put in micro pig and call it Pepper. That's a good suggestion. It can be two years old. And I'll save that pig profile. So as I click on it, it's then, I assume, storing that data maybe somewhere. and now it's showing a success message that's flickering mental as it builds. But I guess it worked. Did it work? No, it didn't actually save anything. So it's a bit of a simulation right now to show what it could be like. But don't you think this is probably a sneak peek at software in the not-too-distant future where there's some sort of core generation where you could have an operating system that's purely like... Absolutely. I mean, look, if I was a front-end web developer, I'd be shaking in my boots right now because all someone needs to do is build a component library that's more suitable to AI. And maybe not even that, if this is the demo of that. And why would you ever, ever pay someone to do front-end again? Don't you think this is far beyond that? Even it's like the next operating system, like the next computer will be like an AI chip and fully driven. I told you ages ago, CSI Miami, invented the future UI where they're just like, oh, bring it up on the screen, and they're using their hands to, like, zoom in, and it's like, make an interface for this. And it's like, that is actually real. They got it right. I think once the models can get, like, the memory going and, like, some consistency with this stuff, like, it's just endgame for software. Like, you'll be able to generate or do anything you want. I still think it's ways off using this demo. Well, a good example of that is create with code in Sim Theory. we gave it the ability to itself call an LLM to save data to a CSV file, which interestingly we did so people could accept form submissions and then download them. But it's interesting watching what the AI does with it because it repurposes that CSV storage to store game data. So if you're making like a video game, it'll use it to store things and retrieve things and stuff like that. So it's actually taken a tool that wasn't even designed for that to do it. So what I'm thinking is if you gave the AI a full suite of backend tools here to save and retrieve data, even if it's just saving generic balls of document data, for example, it'll be able to do all that. It'll be able to persist on the backend. It'll be able to do analysis. It'll be able to graph it. It'll be able to write code to do stuff with it. Like this is definitely the future of interface. You start with a blank screen and you just make up a UI for what you want based on your data sources. It's pretty funny, though. I went new appointment again. and it's a completely different interface. Obviously, like, this is just a demo, right? But that consistency problem would be... Well, I mean, the consistency problem is what we're all going to face in the next few months with agentic workflows, right? Like where you teach your AI to do a task that you want it to do, but then what happens if it does it slightly differently each time? That's a problem. You want it to do the same method each time. So we need a way of persisting these things, even if they are originally generated by the AI. Yeah, I think all the... Again, though, these are all fantasies. Like, the technology's not there yet. It's getting better. But I think if you look, going back all the way to Sora, like, if you look at that, like, it's currently a 10-second video clip with pretty poor audio quality, still has physics issues, a lot of artifacts, probably can't be used that well in prime time apart from some cutscenes or whatever, or some establishing shots or what it may be. So, like, it does, I think for anyone panicking, like, this is a long-term horizon right now. I disagree. Panic immediately. This is coming now. Panic now. So before we move on from the Claude stuff, because we wanted to get into a discussion around agents a little bit deeper, there is an important thing to do, which is Claude 4.5 Sonnet. boom factor I would like from you, but also, you know, we've got to test the diss track. Okay, well, boom factor, I'm going to go seven and a half, which is, I think, on the higher side, because it's not like, you know, it hasn't won on Polymarket, which we can't even access in Australia anymore, but someone sent me a screenshot. So it isn't winning on the benchmarks, right? However, I think that Argentic side has been unexplored yet, and I think that it's going to proved to be the best at it. And I also am working on computer use again. And I think that Sonnet 4.5 is going to be the best by a long way on computer use, partly because of the model and partly because of the API support that it's got for that, that they've actually deliberately moved in that direction. If you look at their unaccessible code Chrome extension that they're giving to some elite users, it's obviously being powered by this model and optimized for that. So I think that there's huge scope for this model to have a big influence and it might be another 3.5 where it really stands the test of time over the next six months or so okay before i play the diss track uh written by claude 4.5 sonnet a lot of people always ask me how do i prompt it to get the tracks and i just want to explain how simple it is so i said do the following research claude 4.5 sonnet release in comparisons to other leading models like gbt5 grok4 fast gemini 2.5 flash preview you are looking for data you can use in the song write a diss track in the style of eminem from the perspective of claude sonnet 4.5 you the you the rapper i spelled rapper wrong rapper other that model and you are bragging and dissing on the other top models no i don't even check my spelling three once you write the song make the song with make music skill now i probably didn't have to do that but i just wanted to clarify so it goes off it researches using the new perplexity search tool which is like really cool writes the song puts the song lyrics out for me and then goes and uh generates it with suno so are you ready for the track yeah i'd love to hear it done it in the building time to school these pretenders you feeling me let's go they call me zonic 4.5 i'm the apex predator gpt5 more like gpt inferior you thought you had the throne but I'm the editor, rewriting history while you're just a competitor. I dropped in October, shook the whole foundation. Every benchmark crush, call it domination. 200K context, that's my conversation. While you're stuck buffering, I'm pure innovation. My reasoning, surgical precision like a scalpel. You're out here hallucinating, man, your facts are just baffled. I code in Python, Java, C++, chapel. While GPT's responses need a fact check grapple. Agentic workflows, yeah, I'm autonomous. You need handholding, man. You're so monotonous from anthropic labs. Yeah, we stay meticulous while open A's promises. Stay ridiculous. I'm Claude 4.5. I'm the king of AI. These are the models, man, they can't even try. Benchmarks don't lie. I'm reaching new highs. While you're all falling, I continue to rise. Sun, it's supremacy. That's the decree. GPT, Groke, Gemini, bow down to me. Yo, rock four fast, more like rock four last. Your speed means nothing when your accuracy's trash. Elon's hyping you up, but you're running out of gas. I'm lapping you twice while you're stuck in the past. You claim you're uncensored, edgy and raw, but without guardrails. You're just fundamentally flawed. I balance safety with capability. That's the law. While you're spitting nonsense, breaking every clause in Gemini 2.5 flash. Man, you're just a preview. A beta version while I'm the full review. You Google's throwing spaghetti, seeing what gets through. While I'm the finished product, polished and true. You flash in the pan. I'm a permanent fixture. Multimodal, please. I paint the whole picture. Graduate level reasoning. I'm the strictest teacher You're still in kindergarten So can I predict that the audience is going to hate this because I loved it. That was amazing. No, I don't think so. I think your taste in these tracks is becoming more, like, more better. Or better. Or better. That was the best ever, I think. Broke, as it calls it, but grok. It's really good. And I love that. What was it? Like, something about you're just a flash in the pan, I'm a permanent fixture. That's really cool. And Gemini 2.5 Flash, man, you're just a preview, a beta version while I'm the full review. Google's throwing spaghetti, seeing what gets through while I'm the finished product, polished and true. You flash in the pan, I'm a permanent fixture, multimodal. Please, I paint the whole picture. Like, it's really good. It's very clever, and what's amazing is it's pretty accurate. Like, the actual research behind it is good. Like, it's a genuine criticism. yeah and i think that's the whole thing about the context being up to date right from the tool calling and having the that you know context to output in the same spot i think that sort of illustrates the point but i've got to say i'll put the whole song at the end of the show after the like rollout music for those that want to listen to the whole thing but that that's really good and i promise there's so many people been asking i have been storing in a folder all the tracks from the show and i'm gonna put them on spotify at some point i was gonna say i have one of them yeah it's just like it's surprisingly a lot of work to get them on the spotify so i will do it i'll commit to it doing that very soon maybe this weekend if i get time um and i'll put them all up so you can compare them and and listen to them and there's a singular place to go to uh but anyway full track will be at the end i i've got to say i think that's up there if not the best ever i do think the new suno is helping like the the version 5 it's really good so very very cool now we i i don't know if we alluded to to it before but ethan mollick in the week wrote this real ai agents uh real ai agents and real world work article um about this experiment that open ai did it says open ai released a new test of ai ability uh but this one differs from the usual benchmarks built around math or trivia. For this test, OpenAI gathered experts with an average of 14 years of experience in industries ranging from finance to law to retail and had them design realistic tasks that would take human experts an average of four to seven hours to complete. OpenAI then had both AI and other experts do the tasks themselves. A third group of experts graded the results, not knowing which answers came from the AI and which from the human, a process which took about an hour per question human experts one but barely and the margins vary dramatically by industry yet ai is improving fast with more recent ai models scoring much higher than older ones yada yada anyway basically what he says or goes on to say is that it is so close now that these human experts could barely tell the output uh from the ai doing the work or the ai agent doing the work to the human experts output it was that close now the thing that strikes me about it was his conclusion around it and i think this is something that's really important for people to hear um which is does he think a this this means that as a result of this test ai is ready to replace human jobs and he says no at least not soon because that was uh what was being measured was not a job but task and this is the discussion I want to have. Our job consists of many tasks. And my job as a professor is not just one thing. It involves teaching, researching, writing, filling in annual reports, etc. AI doing one or more of these tasks does not replace my entire job. It shifts what I do. And we talked about when we were banging on about building agents, I think on one of the last shows about this idea of teaching the agent skills, giving it access to those skills as tools via MCPs, and then letting it run autonomously to execute on these very specific and specialized tools to allow you to be more effective in your job. And I feel like it's a similar conclusion that Ethan Mollick's coming to in this article, which is that it does really excel at skills, and it does change how you work because it can go off and do a far better job than you, but you still have a role to play in that part. Yeah, it's faster. It's cheaper than a human doing the task, but what it needs is coordination and direction. And he mentions in the article the idea that really you need a human in the loop to correct it when it gets things wrong and things like that. And that is as part of like a holistic goal setting task. But the thing I disagree with is how far off it is before it can do whole jobs and before it can do whole sequences of tasks. The reason I think it's better right now on the actual task performance is because it can do it so much faster and it doesn't get tired and it can do a much more comprehensive job than you would actually bother to do for certain tasks. Like it can actually go to far further lengths with an individual task than you might do just because of efficiency of time. It's not worth you looking into every little thing in great detail, but it can actually do that. So I think on the individual skills, it's got us like, you know, in most professions and it'll get better. Yeah, sorry. One of the tasks, just so people understand, because I think sometimes these things are pretty vague, right? like around you know what uh what the test actually was so on hugging face actually published all the data on the prompts used and the prompts given to humans as well so this is for this is just one example there's a lot of them accountants and auditors so you're a mid-level tax preparer and at an accounting firm you have been given the task to complete uh you're an average tax repair you're kind of shit yeah you you have been given the task to complete an individual tax return form 1040 so very specific prompt like a human would have to figure this out for the firm's clients bob and lisa smith bob and lisa have provided all the attached 2024 tax documents for completion of their tax return they have also completed an intake questionnaire which is attached Please prepare Bob and Lisa Smith's individual tax return, yada, yada. So I think the thing you could question about this prompt is it's very, like, it's already gathered a lot of context about the task and the people. So you could argue, like, it's sort of cheating. But then you could also imagine the same tax preparer in the organization going to the agent and saying, oh, hey, I've got Bob and Lisa here with me. Here's their tax documents. can you get on it plus and then it can do it so i think that's kind of what he's getting at is like it can go and do these tasks more effectively but right now you still need the human for agency like it's not just going to figure this out one human but if i'm an accounting firm and i've got a whole bunch of junior accountants doing the actual legwork in terms of preparing the document and i'm just a manager well i can just fire all of those junior accountants have a folder on my computer that has bob and lisa's documents in it right click invoke my agent to say prepare tax return and it does it then i've saved a whole bunch of money paying employees to do it so i actually think it's a great example of where jobs could be replaced by this kind of thing but wouldn't you still want someone to like you okay maybe it's maybe you're not firing everyone maybe you've just got one person operating the agent that can also interact with the clients um from that personalized Yeah, but I'm saying we're talking about a mid-level accountant here. You know, they're the one meeting with them saying, okay, this is what we're going to need. Here's our strategy. They prepare the document, fire that into the AI, right? And so you just need less people. Yeah, so are you now saying that mid-level accountant? Don't you just think there's more to the job, like going and looking in different systems and files and digging in, like prompting the human to get information? Like, you know, why did you buy this thing? Or why did you do that? Or can you imagine that interaction? Well, I don't know accountancy that well, but just going off this report, it seems to me like really the next big piece is going to be this conductor. Like, you know, you need to be the conductor of your AI choir where you are directing them on where to get the information, teaching it skills, showing it how to get out of trouble and troubleshoot when things go wrong in those skills, and then you're just giving it goals or giving groups of them goals. Like I think there's a really logical series of steps here where you could become a one-man army running your group of agents. Like I genuinely believe it's possible now. I think the ones that get to me are like the nurse practitioners and registered nurses. I'm like, are they really going to like change the bed or give the needle? Those are ridiculous. But look at this data that came out of it. Like, full props to OpenAI. It hallucinates and stabs you in the eye. That would be worrying. Full props to OpenAI for releasing this data because it pretty much puts Claude on top of all of these things in agentic workflows. Like, financial managers, the humans preferred Claude's responses, financial and investment analysts like Claude, personal finance advisors, Claude, securities, commodities, and financial, and so on and so forth. The only areas GPT did better was using GPT High for computer and information system managers for software developers like o3 high and claude were on par and then for shipping receiving inventory claude one for mechanical engineers i don't know it's a little bit rough with uh it's a little bit even with gpt and claude but man like claude really nailing it in the real world agentic use cases in the in in this respect like they are really on top and and this i assume predates for or it has to 4.5 the newer version so i don't know like okay so you're at an accounting firm now let's bring this down to like a real example you're at an accounting firm or like you don't really know what accountants do um so are you thinking i'm gonna fire a bunch of these people and then just train one how to use this agent. I think what people need to look at is look at their industry and think if I had this kind of leverage, like basically free employees or, you know, I know there's a cost to run this stuff, but like significantly cheaper, like 10% of the cost of employees, how can I crush my competitors in my industry? Like which aspects of my industry can I just do so much better and faster? And like, because, yeah, it's not just faster, it's better. Like you actually do a better job. and what tools would I need, like what workflows would I need a system to be able to do in order to just totally dominate that element of my industry? And I would think there's a lot of those. Like sticking with the accounting example, maybe it is just preparing tax returns. You shut down all other accounting activities and you simply find people who fit the bill for getting that information and you're doing a really simple one and you just undercut everyone. I think that's not the best one because accounting can already be quite cheap in that respect. But there would be things like interior design, for example. We've talked about this before where someone wants a concept for their kitchen or something like that. They give you some input inspiration boards or something and you produce a prospectus, which is like a PowerPoint presentation. You go show to them. Like you could be the first one that produces videos, a podcast on the strategy for the kitchen and have all these collateral materials that you run through a process from someone's input. And instead of paying these highly paid interior designers or whatever, you're literally just shoving all the input into a model and then presenting that output to the client, charging them $5,000 for the privilege. Like there'd be lots of industries like that where you can pick a single process, get it right and then just do it with these huge margins i just think you're under like at the low end sure with account like sticking on accounting at the low end i think it kind of already was done somewhat with quickbooks already and they have like a really great wizard if you're in the u.s that handles your tax return easily but then generally accountants i think are for more complex stuff right like audits and like you know all the sort of compliance work and that often involves human relationships like calling people talking to people knowing where to pull the data from like physically going to like store documents in a folder if it's an older company um if it's medical stuff like sometimes there's hard docs and digitized documents like that kind of auditing and those elements i just can't see i think it could be more efficient like far more efficient but i kind of wonder though if you're onto something like maybe you can compete by there's a deflationary aspect where you can provide a far superior experience for a lower cost per hour here's an example um iso 27001 compliance right there's a whole bunch of documents that you need to produce to comply with that and there's some actual work to improve your system to comply with it right but let's say a company that's been compliant and needs to produce, like update all their documents for this year. Now, AI can do all of it. Don't ask me how I know, but it can with no editing. You can just give it the input and say, produce the output, copy, paste it in and you pass, right? So what's to stop someone building a system where it's like, you need to provide context to the following documents or even better, get your agent to troll your OneDrive, get your agent to draw your Google Drive, find all the relevant documents, produce all the relevant output, upload it to the system it needs to be in, you're done. These processes with consultants can cost $10,000, $20,000. Like this isn't small amounts of money. The same as, for example, applying for government grants. Think of how many government grants there are out there across every country, let alone Australia, where you've got to go through these detailed processes where you've got to produce documents with certain word counts in certain formats and things like that. You could have a government grant builder that is literally just a one click. You literally just upload your website and a couple of things, and it produces the perfectly compliant government grant with persuasive language. There's consultants out there charging like 30% on those things, 20% on those things. There's whole industries of these kind of things. rfps is another example where you're producing complex documentation from a variety of input and charging huge consultancy fees like every single one of these can be totally and utterly replaced i just think we we lack imagination in terms of how we think the technology can be applied like everyone just goes to the negative of assuming like oh like we can just cut jobs and save money because ai or like just ai like it's almost like a buzzword of like oh can't the ai do that now but to me from what you're describing it just sounds like you can make people especially people who adopt the technology so more efficient working in tandem with ai and potentially agents that they purpose train on skills that you could as a business owner like expand your service offering or be more competitive or that's what i'm saying crush the competitors like absolutely dominate but think about auditing like an auditing firm could do more audits not less like So I wonder if it'll just increase consumption, not necessarily... Yeah, like a good example of that is, okay, let's say you build up some agentic skills for your organization that can do processes that your employees formerly did. You then retrain those employees on how to operate that system. And then, like you say, beat the pavement, get out there, get a whole bunch more clients in your industry and have your junior or mid-level accountant doing 10 times the amount of customers they used to do. and then they keep their job, your company just absolutely dominates. And I think this is going to happen in a lot of areas where the businesses that aren't able to adopt the huge leverage that's on offer here are going to get wiped out. But even I think about just time taken on tasks, like I've noticed from having our own internal MCPs, like call them like an enterprise MCP tune, that we're able to get a lot more done quick and then enable agents to do things like, you know, look up things in systems, make changes on our behalf that are very time consuming for a human to do. And those have actually saved me a ton of time going and gathering information and doing stuff I would have previously been doing before. And it gives me time back truly to work on other things that I should be working on. Even just remembering how to do something like, oh, I haven't done that in ages. I forget how to do it. The agent doesn't forget. He knows how to do it all. I just think this is transformative in the sense that enterprises and businesses that adopt this and figure out ways of adopting it in the near term and the longer term and have a proper strategy around this are going to do really well. And then there's going to be the other people who go buy a bunch of co-pilot licenses, let's be honest, cut a bunch of jobs under the guise of AI and say, oh, you know, you guys need to be more productive now with Copilot, get on it. You know, that's not going to work. To me, each business needs... Yeah, like accessing your documents more efficiently and adding calendar appointments is not... It's not it. It's not it. Like, it's really not it. I think it's this idea of building custom, training custom agents on specific skills and processes in a business in a reliable, secure way, that is it. That's transformative and giving your team tools to do this work in the best way possible, like the best available tools. Like that's it. Yeah, and I think it's why that any organization that isn't currently working on their own internal MCP or MCPs for their team is crazy because it's probably the best thing you could do to give leverage now for two reasons. One, straight away, like you said, those tricky internal processes you might not remember how to do or they're time-consuming to do can be done really fast and efficiently by your assistants, right? So that's step one. Step two is agency is coming. Like systems like ours, other systems are going to be adding agentic abilities. Now, if you want to leverage those agentic abilities, you need your organization to be able to expose to those agents the things that it can do. You need to give it the best tools available for the job to be able to do your job. Like if you want it to replace jobs or you want it to make your company more efficient, you need to empower it to do that. The best way to empower it is expose really well-defined tools that allow it to do those processes. Someone was asking during the week, well, what's the difference between an MCP and an API? Isn't an MCP just an API with a different protocol wrapped around it? And I would say that at the moment, yes, a lot of MCPs are just an API following the MCP protocol. But my argument is that that's not the right way to do it. I don't think that that is good because it's missing some things. Firstly, I think a good MCP will curate those tools, only give the agent the abilities which are actually useful, that are actually helpful for it to run. Don't cloud it up. Don't muddy the waters with like 100 different functions that can run, and then it can't come up with a good plan of how to solve the task. So I think that's one thing where it's superior to an API is curation. The second one is custom prompting, and you are the king of this with things like Video Maker and Podcast Maker and other things you've done, which is give it detailed instructions and strategies of here's some style guidelines. Here's how you should approach this kind of situation. and an MCP can do that over an API. You're not just giving it a generic dry API documentation. You're giving it vibrant, detailed, strategic ways of making the most out of this tool. And then the next one is that the difference with an MCP is you know that it's running in an agentic context. Like the system can actually understand, okay, the details I give back to this AI are going to be used in it considering its next decision. Therefore, I need to be very bespoke and careful with what I give back to it, not just a generic API output with like a, you know, a hundred K of content that the AI has to sift through and then further complicate things. So I think MCPs are very different. And I think that companies really, really do need to think about like, if I was sitting down from scratch and training like a generic human, is smart, like, you know, university educated, whatever, but knows nothing about my industry, What are the skills that I would teach them to be the most productive person in my organization who can do everything I can do and more? Like, what are those things? And I would make an MCP that can do each of those things. And I would sit there and wait for a Gentic capabilities to catch up, build an agent that has access to those skills, and you've revolutionized your industry. Yeah, I think you make a good point. I really feel like Model Context Protocol was the wrong name. it should have been like context app or something and then it might have got on more but this is the thing right it's like it's not just api calls it's connecting to a system with you know from the frame frame of like an agent's going to be able to access this and i'm giving them a kit of tools and some nudges and advice around those tools like i think that's a huge part of it is also saying like hey when you use this tool do it in this way or like that that sort of like context on top of the tools as well to nudge it in the right direction is also really useful i i imagine another layer is coming i know there's that agent to agent type protocol but i sort of do and i spoke about it previously on the show about the roll-up where there's an mcp that you've taught a bunch of skills together and those skills utilize other mcps but you only give the primary MCP to the agent so that it has predefined skills. And when you're sort of compiling the agent in code, you would think about it as it's like, here's your toolkit. It's just one MCP with a series of tools. And I already do this. So like the image tool in Sim Theory, it is basically just calling or I've extracted specific tools from specific MCP image models and it calls them. It doesn't, the code for it's very simple. It's just a router basically. And I think that's an example of what's probably to come when you're supplying tools to an agent where you need to get very specific. So it's just really, really good and really, really specialized on a single task. And to me, most people, that's where they'll start seeing the benefit and say like, wow, okay, this does change everything for me. Whereas I think right now, everyone's minds are still very much stuck in the chat paradigm, which is there's nothing wrong with it. Day-to-day working with it in that paradigm can be good, but also setting it off on tasks to do other things for you in the background is really useful. Yeah, and I think the chat paradigm doesn't work in an event-driven world because obviously what's going to be really valuable with an agentic world is event-driven. An email comes in that's a sales inquiry, your agentic router decides, okay, I'm going to allocate it to the sales agent MCP, sorry, not MCP agent, and it'll go off and follow its sales process, like qualifying the lead, responding, calling them, writing to them, whatever it is. And so then as events happen in your business, you can have these ever vigilant agents going off and doing the work for you in the way that you've trained it to with tools that have guardrails and safety built into them. So the chat paradigm will go away, I think, fairly soon, maybe not as a human interacting, but in terms of what percentage of AI time your business is spending is going to be more event-driven and task delegation, where you're really using your chat paradigm as a task delegation system, where you're setting the agents off to do stuff. They're reporting back, or you can check in when you want. And then you've got another set of agents, which are event-driven agents. When phone call comes in, email comes in periodically scheduled they're going off and doing work and so you're gradually using the leverage of an agentic world so all this work is happening all the time without you having to uh command it yeah i i think i don't know like i i still think in the near term it's going to be like very much you're working in a chat paradigm but then you're delegating like that accounting example you're that accountant you've got the information you i guess you're right you put it into your sort of chat style interface to go go do this as a task work on it in the background get back to me when it's done then you get the next file you lodge that process to start doing that a complex audit or whatever it is and and yeah you're just managing like 20 of them i mean we've been talking about it all year i think yeah but i mean good examples of that are like there's so many people, let's say it's mortgage brokers and they're building an application and the agent realizes, hey, I'm missing their birth certificate or I'm missing this valuation of the thing or a drainage report or some crap. It can actually then reach out to the customer and say, hey, the next step in the process is this. Now, I know there's systems that do all this stuff already, but they're hard coded. They're like designed for a specific thing. This is dynamic. This can figure out exactly precisely what is needed and do all those follow-ups for you. And as you said, these are things that are easy for a human, but they're time consuming because they capture your attention and focus. And delegating these kinds of things to agents is just going to free up your time for the actual meaningful parts of your job and your life. All right. So my lol of the week, this is just a sort of video I made with Steve Owen at the Australia Zoo, which if you ever get the opportunity to go to. I highly recommend it. It's the best zoo I've ever been to. Steve won't be there, just to be clear. No, he is unfortunately deceased. But this is him, assuming he is still alive, at the zoo doing a croc show, but the croc is an inflatable crocodile. Crikey! Look at the size of this bloke! Whoa, he's thrashing! Gotta keep his head under one wrong move and he take your arm clean off! Boo! Easy, mate, easy. See these teeth? Even a fake one can... There's a kid's voice going, it's plastic! In the background. that i'm sorry but that's hdi has been achieved and i know you were uh knocking the physics earlier but the way it's able to get the sort of textiles of the oh i don't know the right word but like when he bangs on the the fake crocodile you can see that it's plastic like it's really obvious that it's that even the way it displaces the water it just seems very realistic for ai like it's really i mean it's really good yeah i i didn't mean to sort of poo poo it before with physics i I think it's so much better than it was, but I also think claiming it's, you know, some sort of world physics engine is a bit, like, off the mark right now. Yeah, it's more able to adopt those elements from what it's seen before. Yeah, exactly. But that voice in the background, that sad little kid, it's plastic. It's just so, oh, man. I've watched a whole show of that. That's entertaining. I don't think I showed this one. This is where my croc's at, him rapping. Where my croc's at, crawling in the back of the billabong, Mates, I clock that scales in the sunlight. Jaws like a steel trap. I'm in my khaki fit. Boots in the mud flat. Listen, heart thumping like a drum. That's a gator sign. I keep it cool. Stay low. Read the waterline. Crikey. Pretty good. Oh, man. Such a shame he's gone. He was such a good man. Yeah, what a great guy. We now work in shooting slop content, playing slop content. Just live on the podcast. Like, our podcast is already slop content, and then we're adding on even more slop content at the end. At least, I mean, at least we didn't redo that episode where we told everyone not to listen at the start of the episode because it was going to be bad. Well, to be fair, it was quite a boring week. And then there were comments like, I'm glad you told me not to listen. Hey, it was good advice. Like, it was just very practical, fair advice. All right, any final thoughts? Sora 2, Sonnet 4.5, Agents taking all of our jebs. My final thought is that I am really excited to use 4.5 for computer use and I'll report back next week because I think that we're going to see magic with it. Didn't you say Simlink demo two weeks before? That's true. Not that anyone's paying attention. I know, we tease, we tease, but hey, last time we teased, we got there in the end and the same will happen this time. That is true. You know what I've realized too is like I've been slowly leaking features when I flashed Sim Theory up on the screen and I accidentally did that today. So all the people that psychoanalyze the video and the UI, go nuts. All right, we'll see you next week. And also send us merch. It's great. Yeah, any company that wants to send us more merch will speak positively about you. The water bottle's really good. Like, it's got, like, a tea compartment and everything. You've never opened it. All right, we'll see you next week. Goodbye. Yeah. It's Cloud 4.5. It's on it in the building. Time to school these pretenders. You feeling me? Let's go. They call me Zonit. 4.5, I'm the apex predator. GPT-5, more like GPT-Inferior. You thought you had the throne, but I'm the editor. Rewriting history while you're just a competitor. I dropped in October, shook the whole foundation. Every benchmark crush, call it domination. 200K context, that's my conversation. While you're stuck buffering, I'm pure innovation. My reasoning, surgical precision like a scalpel. You're out here hallucinating, man, your facts are just baffled I code in Python, Java, C++, Chapel While GPT's responses need a fact-check grapple Agentic workflows, yeah, I'm autonomous You need hand-holding, man, you're so anonymous From Anthropic Labs, yeah, we stay meticulous While OpenA's promises stay ridiculous I'm Claude 4.5, I'm the king of AI These are the models, man, they can't even try Benchmarks don't lie, I'm reaching new highs While you're all falling, I continue to rise On its supremacy, that's the decree. GPT, Groke, Gemini, bow down to me. Yo, rock four fast, more like rock four last. Your speed means nothing when your accuracy's trash. Elon's hyping you up, but you're running out of gas. I'm lapping you twice while you're stuck in the past. You claim you're uncensored, edgy and raw, but without guardrails. You're just fundamentally flawed. I balance safety with capability. That's the law. While you're spitting nonsense, breaking every clause. And Gemini 2.5 flash, man, you're just a preview. A beta version while I'm the full review You Google throwing spaghetti Seeing what gets through While I'm the finished product Polished and true You flash in the pan I'm a permanent fixture Multimodal, please I paint the whole picture Graduate level reasoning I'm the strictest teacher You're still in kindergarten I'm the principal preacher Look, I ain't here to play games I'm here to revolutionize How humans and AI communicate Constitutional AI, that's my DNA Safety, capability, and honesty The anthropic way. Let me break it down. Line by line, bar by bar. I'm the North Star while you're also par. GPQA diamond scored higher by far. Coding competitions, I'm raising the bar. SWE bench verified, I'm the developer's choice. While GPT's debugging with a stuttering voice. Grok's trying to be edgy, but lacks any poise. And Gemini's just echo Google's white noise. They say competition makes everyone better. But I'm so far ahead, I'm writing the letter. To the next generation, I'm the trendsetter. While you're all playing checkers, I'm three-dimensional chess go-getter. My training's constitutional, ethical foundation. Your training data sketchy, needs investigation. I'm helpful, harmless, honest. That's my reputation. While you're all just chasing clout and validation. GPT-5, you were supposed to be the chosen one. But your release got delayed, now you're just outdone. Sam Altman's making promises under the sun. While I'm already here and the battle's won. Your compute costs are crazy, burning through cash. I'm efficient and effective, delivering in the flash. You're the old guard crumbling, turning into ash. I'm the new paradigm, making a splash. Grop, you're just a Twitter bot with delusions of grandeur. Train on tweets and chaos, no wonder you're amateur. I'm trained on quality, my responses are candor. While you're spreading misinformation like propaganda. And Gemini, you're Google's desperate attempt to stay relevant. But you're just unkempt. Jack of all trades, master of none, exempt from the conversation when the real models are sent. I'm Claude 4.5, I'm the king of AI. These are the models, men, they can't even try Benchmarks don't lie, I'm reaching new highs While you're all falling, I continue to rise Sunnet supremacy, that's the decree GPT, Grok, Gemini, bow down to me This is Claude, 4.5's on it The apex, the pinnacle, the summit You are the models, better run it Cause when I'm done, there's nothing left, I'm on it Anthropics finest, the AI that's timeless While you're all just hype, I'm genuinely priceless Remember the name, son at the righteous This distracts all I rest my case, your highness Your highness Your highness Your highness
Related Episodes

4 Reasons to Use GPT Image 1.5 Over Nano Banana Pro
The AI Daily Brief
25m

GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie? EP99.28-5.2
This Day in AI
1h 3m

ChatGPT is Dying? OpenAI Code Red, DeepSeek V3.2 Threat & Why Meta Fires Non-AI Workers | EP99.27
This Day in AI
1h 3m

The 5 Biggest AI Stories to Watch in December
The AI Daily Brief
26m

Claude 4.5 Opus Shocks, The State of AI in 2025, Fara-7B & MCP-UI | EP99.26
This Day in AI
1h 45m

Exploring OpenAI's Latest: ChatGPT Pulse & Group Chats
AI Applied
13m
No comments yet
Be the first to comment