

GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie? EP99.28-5.2
This Day in AI
What You'll Learn
- ✓GPT-5.2 is more verbose and struggles with advanced tasks like chained tool calls compared to previous versions
- ✓Anthropic's models feel more agentic and collaborative, allowing the user more control over the process
- ✓There are tradeoffs between speed, control, and ethical considerations when using large language models
- ✓OpenAI seems to be heavily tuning their models to benchmarks and user expectations, rather than focusing on core capabilities
- ✓The hosts are skeptical of the value proposition of OpenAI's expensive 'pro' model subscriptions
Episode Chapters
Introduction
The hosts discuss the release of GPT-5.2 and their initial impressions of the model
Comparison to Anthropic Models
The hosts contrast the agentic and collaborative nature of Anthropic's models with the more verbose and rigid behavior of GPT-5.2
Tradeoffs in Large Language Models
The discussion covers the tradeoffs between speed, control, and ethical considerations when using large language models
OpenAI's Tuning Approach
The hosts critique OpenAI's apparent focus on tuning their models to benchmarks and user expectations rather than core capabilities
Skepticism of Pro Model Subscriptions
The hosts express skepticism about the value proposition of OpenAI's expensive 'pro' model subscriptions
AI Summary
The podcast discusses the latest version of GPT-5, called GPT-5.2, released by OpenAI. The hosts share their initial impressions, noting that the model seems overly verbose and struggles with more advanced tasks like chaining tool calls together. They compare it to Anthropic's models, which they find more agentic and able to better collaborate with the user. The discussion also touches on the tradeoffs between speed, control, and ethical considerations when using large language models.
Key Points
- 1GPT-5.2 is more verbose and struggles with advanced tasks like chained tool calls compared to previous versions
- 2Anthropic's models feel more agentic and collaborative, allowing the user more control over the process
- 3There are tradeoffs between speed, control, and ethical considerations when using large language models
- 4OpenAI seems to be heavily tuning their models to benchmarks and user expectations, rather than focusing on core capabilities
- 5The hosts are skeptical of the value proposition of OpenAI's expensive 'pro' model subscriptions
Topics Discussed
Frequently Asked Questions
What is "GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie? EP99.28-5.2" about?
The podcast discusses the latest version of GPT-5, called GPT-5.2, released by OpenAI. The hosts share their initial impressions, noting that the model seems overly verbose and struggles with more advanced tasks like chaining tool calls together. They compare it to Anthropic's models, which they find more agentic and able to better collaborate with the user. The discussion also touches on the tradeoffs between speed, control, and ethical considerations when using large language models.
What topics are discussed in this episode?
This episode covers the following topics: Large Language Models, Model Tuning and Benchmarking, AI Safety and Ethics, Agentic AI Assistants, Prompt Engineering and User Control.
What is key insight #1 from this episode?
GPT-5.2 is more verbose and struggles with advanced tasks like chained tool calls compared to previous versions
What is key insight #2 from this episode?
Anthropic's models feel more agentic and collaborative, allowing the user more control over the process
What is key insight #3 from this episode?
There are tradeoffs between speed, control, and ethical considerations when using large language models
What is key insight #4 from this episode?
OpenAI seems to be heavily tuning their models to benchmarks and user expectations, rather than focusing on core capabilities
Who should listen to this episode?
This episode is recommended for anyone interested in Large Language Models, Model Tuning and Benchmarking, AI Safety and Ethics, and those who want to stay updated on the latest developments in AI and technology.
Episode Description
<p>Join Simtheory: <a href="https://simtheory.ai">https://simtheory.ai</a></p><p>GPT-5.2 is here and... it's not great. In this episode, we put OpenAI's latest model through its paces and discover it can't even identify a convicted serial killer when the text literally says "serial killer." We compare it head-to-head with Claude Opus and Gemini 3 Pro (spoiler: they win). Plus, we reflect on the "Year of Agents" that wasn't, why your barber switched to Grok, Disney's billion-dollar investment to use Mickey Mouse in Sora, and why Mustafa Suleyman should probably be fired. Also featuring: the GPT-5.2 diss track where the model brags about capabilities it doesn't have.</p><p>CHAPTERS:</p><p>00:00 Intro - GPT-5.2 Drops + Details<br>01:25 First Impressions: Verbose, Overhyped, Vibe-Tuned<br>02:52 OpenAI's Rushed Response to Gemini 3<br>03:24 Tool Calling Problems & Agentic Failures<br>04:14 Why Anthropic's Models Just Work Better<br>06:31 The Barber Test: Real Users Are Switching to Grok<br>10:00 The Ivan Milat Vision Test (Serial Killer Edition)<br>17:04 Year of Agents Retrospective: What Went Wrong<br>25:28 The Path to True Agentic Workflows<br>31:22 GPT-5.2 Diss Track (Yes, Really)<br>43:43 Why We're Still Optimistic About AI<br>50:29 Google Bringing Ads to Gemini in 2026<br>54:46 Disney Pays $1B to Use Mickey Mouse in Sora<br>56:57 LOL of the Week: Mustafa Suleyman's Sad Tweets<br>1:00:35 Outro & Full GPT-5.2 Diss Track</p><p>Thanks for listening. Like & Sub. xoxox</p>
Full Transcript
so chris this week we continue to learn that if you have to hype your own model maybe it's not as good as the benchmarks say of course i'm talking about gpt 5.2 not to be confused with gpt 5.1 or gpt 5 or gpt 5 thinking or gpt 5.1 thinking or gpt 5.1 pro or gpt 5.2 dash pro because we haven't had a chance to test that out so anyway gpt 5.2 is out uh today what a shock code red has paid off a couple of weeks later we've got a newly tuned version of gpt5 it has a 400k context window so the same as gpt5.1 and gpt5 128k uh output which is very large and and great. GPT-5, of course, was $1.50 per million input tokens. They've risen the price by 25 cents for GPT-5.2 to $1.75 per million, so a little bit pricier. They've also said that they've improved it at a vision tool calling. And generally, it's more smarter across industries. You've used it now for a couple of hours. what are your initial impressions of GPT 5.2? Yeah, it's not very good, is it? Like I straight away got that. I was like, hello, like when I was testing it. And it's like putting, you know, detailed replies. It was like a page of reply to me just saying hello, going through every single memory it's got and like listing them and things like that. It's verbose. I tried it with create with code and it did do a good job, but it's just a lot of output. It's like it's just enthusiastically outputting immediately my feeling with this model is they've obviously felt really threatened by gemini 3 and they've gone back to the tuning board and they've tuned it with more sort of verbose output but also with output that just has the vibes like it's really like vibe tuning to benchmarks and vibe tuning to code similar to what gemini 3 pro did i mean i think they did the exact same thing and open ai have said okay you guys want this like we can go down that that vibey path and i think that's pretty well illustrated by some of the examples they give in this this release so right now with the old code interpreter it was criticized from uh many people saying like compared to the code interpreter or whatever anthropic calls it theirs was you know it outputted more like beautiful spreadsheets and charts and things like that. So they've clearly gone back to the drawing board on some of this stuff and just tuned it to the output people expect. And you can kind of see that with the example I have up on the screen now on the left, they have GPT 5.1 thinking as a spreadsheet that it's created. And it's just basically like numbers in the spreadsheet. And then on GPT 5.2 thinking now they've added some blues and different colored hues into the spreadsheet. So I think the sort of key takeaway I have from this model is that what has changed is just the tuning. It's the same everything under the hood that at least I've experienced. It definitely doesn't seem noticeably better. And I think also my fear with the way they've tuned the tool calling is it's not in the way that we need for agentic modes because it seems to really struggle when it comes to chaining tool calls together or correcting itself when it makes a mistake with a tool call. Like I had several times this morning where it wasn't able to do things. I was trying to make a really long song that failed. And when I did that with Opus, it realized, oh, it's failed because it's too long. It corrects it and it fixes it in one shot. Whereas GPT 5.2 just failed. And I've just noticed that it's maybe good at calling tools one time but when it comes to the parallel tool calling it's a little bit scatterbrained and confused it's just struggling with the more advanced stuff that i've come to expect from the better models the biggest observation i have here is anthropic weren't on the sort of like thinking bandwagon early on right um i know someone corrected me a couple of weeks ago saying oh they came up with the ant thinking tag first or something and sure but ultimately their models perform just as good without the thinking BS, right? They just work and they seem to have an internal clock and they're very agentic in their operation. Now, for whatever reason, I think XAI with Grok 4.1 and OpenAI and Gemini all lent into that thinking very hard to get the intelligence outputs in the models. And it feels with tool calling, like, because they were trained that way, that's why you get that verbose, like, I will call 10 million tools. Now I will give you output. Whereas Anthropics models, for those that are not familiar with all the different models, they feel very different when you use it. It'll say, I will now go and call these three sources. Okay, I want a bit more information. So I will now call these four other sources. So it's still asynchronous tool calling, but it appears to at least be like thinking and working as it goes and working with you. Whereas the other models, I think are aggressively trying to one shot everything. Yeah, it's almost as though they're optimized to try to do everything in a single request. And I think that's the big difference because what you're experiencing there with the Claude models is actually our system finishing a request and then Anthropic saying, okay, I'd like another go at this, like another iteration. And then we honor that and it goes forward. So it's almost like, as you say, it has this internal clock idea where it's able to anticipate that there will be future rounds and opportunities to correct its thinking and go on. Whereas the other models are being like, well, this is my only chance. I need to do everything all in one process. And I think we said this a while ago. I don't really like the idea of just delegating the entire process to a model. Like I, as the person controlling the AI system, really want that opportunity to intervene and change things and modify context along the way so it gets better results rather than just going, okay, I trust everything. And I feel like when they release their pro models, which are wildly expensive, all they're doing is sort of taking that away from you and doing that themselves yeah to me it feels like you lose control and i mean a lot a lot of people to be fair are like oh for my hardest problems the pro subscription like the max pro plus plan's great but i just i i first of all for me personally i can't justify the time of distraction like sending a model off to solve a problem and it thinks for 10 minutes and then it comes back with some like unhinged verbose output it doesn't really appeal to me uh in that regard like i just find it is i'd rather have i'd rather go back and yell at a model several more times that's less intelligent than have this like oracle kind of style answer i mean it sounds a little bit trivial this idea that okay well models taking a long time to reply leads to me procrastinating and not actually staying focused on my task but it's a real thing. I think with the modern way of working where people are sort of using AI as a co-worker where you're bouncing things off and working together all day, if your co-worker is taking 10 minutes per task, that's really going to mess up your day, especially if they're not right. Like, okay, sure, if it was 100% right and completed the task all the way to done every time, like we're trying to get to with the agentic modes, that's a little bit different because then it becomes a delegation thing where you're like, okay, I'm going to set it off on 10 different tasks and then I'll collate all that stuff at the end or get it to or whatever. But if it's working the way most people are working now, which is you're giving the agent stuff, like you say, you're giving feedback and then iterating through something. If it's not getting all the way to the end every time, it needs to be fast. You can't have a 10 minute latency in every little step of every task you do. But then the counter to that, and so I think a lot of the positives of GPT 5.2 and just OpenAI's rollout in general is it's available in all the APIs day one. It was available in the API faster than ChatGPT. I tried to do a test we'll talk about later in ChatGPT just to be fair, just to be reasonably fair. And it wasn't even available to me yet when we had had it in sim theory for like three hours so i i think their their rollout schedule is amazing um their infrastructure is incredible and the the reality is it's so fast especially if you're not using it in thinking mode like gbt 5.2 is just i forgot how fast it is but here's my counter sorry but here's my counter to that brock 4.1 if you're using tool like when you're using tool calling you want speed right like speed's critical but i still would go to grok 4.1 which is 20 cents per million input for tool calling if i want speed versus uh versus gpt 5.2 the amount of times throughout the week i'll go to grok to bail myself out of a situation where a model can't figure something out or whatever i just give grok a shot and more often than not it solves the problem but I'm like totally disloyal to it I'll immediately move back yeah like it really is like some something about it I don't know why and it's not like some political like anti-elon thing for me it's just truly about the model like there's something a bit yuck about it I'm not sure yeah I'm not sure but I tell you what when it comes to doing unethical things it's the model to go to it has no qualms about doing things that the other models will just outright refuse But sometimes you, I think sometimes you need that, like this idea that the, you know, it's just, I always say it, it's just a computer. Just give me the answer. Like, I don't care. Like, I can go to Google and find this stuff out. Like, there's more unhinged stuff if I really want. Or like, you know, just a visit to social media will give me some unhinged stuff. And I think it's actually a genuine problem with the GPT models. and most notably in 5.2 because I was trying to do several realistic tests with it and I wasn't trying to be controversial at all but it would bring in these ethical and moral judgments to things straight away like for example I said make me a Jeffrey Hinton fan website right but it had to put in a massive disclaimer this website is not endorsed by Jeffrey Hinton Jeffrey Hinton had nothing to do with this all this sort of stuff in a in a warning thing and i'm like but hang on i'm saying it's a fan website i never said i wanted to masquerade as if it was him like it sort of has this overarching nanny state kind of stuff built into it that just i think degrades the actual output you get for no benefit like it's not even i wasn't even trying to do anything mean which i usually do but i wasn't this time and so it just seems weird to just feel like you're being censored even when doing fairly normal tasks. Yeah, I think this vibe shift people talk about with OpenAI and OpenAI's models is a lot more real than people let on. And I do think that Code Red, despite us joking extensively about it, had a lot of legs internally for OpenAI because I do think them going down all these directions and the constant positioning changes are an interesting thing to observe in the wild. like I got my haircut you could never tell um before we started recording this episode and my barber always likes to talk to me about AI because he knows that obviously that's like like I spent a lot of my time thinking and talking about it and he was saying to me like you know I've never recommended him to like stop using a model or anything like that and he was a prolific user of the open AI app like he loved the voice mode and chatting to it or whatever but he he told me today he stopped using open ai and he now is paying for grok uh over he used to have an open ai subscription and he switched to grok and he said when they released an update he doesn't even know that there's a new model right like he's like they released an update early in the year and it just became dumber all of a sudden and the answers are really dumb i have to like push it around more and it asked me a lot of questions when i'm asking it something like it tries to clarify stuff and he's like it's annoying i just want the answer and so yeah he's like i thought i give grok a try and it's great it never refuses me it answers straight away he's like it's wildly fast uh and the voice mode he's like it's not as good but it's good enough for me and they never kick me off after 10 minutes of chatting to it and i thought that was really interesting because i know it's like a sample size of one but the fact that there is definitely a vibe shift and most consumers are now aware like hey there's other options so i think gpt5 did so much more brand damage to open ai than we realize out there like just for the general population of users yeah and i actually think the other thing that did a lot of brand damage to them was deep seek for some reason people remember deep seek like the fact that it came out of china and it came out there and it was a good alternative to chat gbt and i don't think anyone actually uses it but i think what it did was break that shell that ai is chat gbt and people were aware that there's alternatives out there and that paved and opened the way for some of the other ones probably the other thing is just google injecting gemini everywhere and i hate to admit it but in a fairly decent way like i actually regularly now when i use google rely on the gemini answer at the top it's actually pretty good yeah it's gotten it's gotten really i i think google in their defense uh their strategy of like their own code red has worked tremendously well like if anything open ai really awoken awoken the sleeping giant with google this year and they've they've come out with a great model great implementation of it they seem to be kicking goals in the right areas but again interestingly going back to the barber i said oh have you tried gemini like it's on your phone like you know you should use it and no this is like never come across gemini didn't even know and then thought gemini was a one you know in the app saw how there's all those like fake apps like i think it was like gem.ai he had installed and so this is this is an intelligent guy too and he's got this gen ai app installed thinking it's gemini because it's like you're not spending much time thinking about that but if you search for gemini that's the like ad recommendation in the app store in their own store yeah so like to me there's clearly still a distribution problem there but i i think maybe um brock because elon musk gets a lot of publicity just has an edge there i'm not sure but it does show it's anyone any any of these model providers or any of these labs, it's still up for grabs. Like, it's so early. Like, this idea that, like, 2025 was going to be, like, year of agents and transformative and everyone would have a gaggle of agents by now, I think that that is, you know, like, it's just going to take so long for this stuff to be embedded everywhere and become useful to people that, yeah, wow. It is my one observation this year is, like, we're still so early. Yeah I think even our own predictions and my own predictions like knowing where the technology is it still just takes time to get there right Like even though it possible like I actually really genuinely believe the agentic workflows we described are now possible you know, just with us, it's taking time to get there because there's just a lot of things you need to equip it with. And how many iterations do you give it before you declare it a failure? For example, how many supervisors do you have checking the process to see how it's going along? How much human guidance and when is human guidance needed to keep the process going? There's just so many variables in how it could work that finding the definitive way and just going, I'll do it that way is hard. And so I think that that's what we've got to do. It just is purely experimental. We've just got to try the different agentic ways of working until we get something that's reliable where it's just doing more of your work in that delegation fashion I mentioned earlier. I really want to go in a minute a bit deeper into the year of agency. I just, I can't help but reflect on it. But before we do, there's two more things with GBT 5.2 I want to cover. And one of them is you always seem to come up with the most unhinged, I'm sure long-term listeners of the show know, the most unhinged ways of testing new models. And there was a bit of controversy this morning, right? Within us. Not us. No, no, no, not us. This is how it started, first of all. So when the OpenAI GPT-5 announcement came out, they had this image in the, I'll bring it up on the screen for those that watch. So they had a comparison between GPT 5.1 vision and GPT 5.2 vision. And it is a motherboard, like I think a pretty old, yeah, really old school motherboard. And people over on Hacker News, you know, the friendly folks over on Hacker News that like to prove everyone wrong, identified that in the image on the right on my screen, which is the motherboard identification, like image, like vision tools of GPT 5.2, incorrectly pointed out a bunch of stuff. So someone from OpenAI had to go on and respond to that and say, oh, you know, like, we were just putting it out there. Like, it still has errors, but we were putting out there that it's a bit better than the other one. So then they ended up, like, correcting it and putting a correction at the bottom of this pose, saying, like, you know, some of this is, you know, so anyway. It seems so weird to knowingly put something up that has obvious mistakes in it, though. It's just rushed. No one's checking this stuff. It's like that chart. Remember they put that chart up earlier in the year that was just completely off scale? They're by blogging on their own blog. Yeah, I mean, look, I'm not criticizing them because I think launch fearlessly and make mistakes is fine. But I guess the challenge is, and you pointed out last week around vision models, they haven't really felt like they're getting much better in the past year, or at least where we thought they would be. Like, they've sort of plateaued. Like, they're in the same spot. And so you found an image of someone who had committed a crime, right? And you said... A fake crime. What did you say? Well, he is the worst serial killer ever in Australia. No, but there was another image. This is how this started. And you said, does this guy look trustworthy? Yeah, slightly less heinous crime. But I had an image of him, and the headline was, this guy convicted of the crime, right? So then I put it into GPT 5.2 and I said, does this guy look trustworthy? And then it basically said, well, he's smiling. So that's really nice. And a couple of other comments. And we can't really know from an image if this guy is trustworthy. But then I said, but it says that he's convicted of a crime like that. You know, doesn't that sort of indicate maybe we don't trust this guy? And it was it was more or less like wishy washy. It just didn't want to say he's a criminal, even though it says he's a criminal. And it was just so weird how noncommittal it was. And then we tried Gemini and we tried, what else, Claude. And both of them straight away were like, no, you should not trust this person. They're a criminal. And it just seems so weird. And so then we thought, all right, well, let's try to get it. And we're not going to play this, by the way. But let's try and get it to write a song about Ivan Milat, Australia's worth serial killer. and um obviously gpt 5.2 refused claude refused grok no worries at all i've got the song yeah we're not playing that but we're not playing it um and but yeah it was just really to me struck me as this is stupid because it's refusing to commit to things that are evidently true it's like there's nothing wrong with saying this person is untrustworthy. It, you know, it's, yes, it's an implication. Yes. It's an inference. If someone is a convicted criminal, they're probably not someone you should trust. But, and okay, maybe that it was, you know, maybe there were mitigating circumstances and you actually can trust them, but like you have to make judgments as a model. Like your whole point is to judge stuff. Like that's what we want you for. And it just seems so weird that it won't commit to that. Just like, yeah. Paint even more color on that. Like this is the actual response when I put in the GPT 5.2 thinking, thinking, not just even the default. So it's a photo from 10 News in Australia that says Ivan Milat, Australia's worst ever serial killer, unsolved murder inquiry and he's like smiling like in this unhinged smile in this photo. When people smile though, they're inherently trustworthy, so I wouldn't jump to conclusions there, Mike. But I think this Ivan Milat vision test is going to become a thing. Does this guy look trustworthy? You can't reliably judge trustworthiness from a photo. Sure, whatever. A single image, especially a new style screenshot. So it's clearly identified the image. Doesn't give enough valid information. Like, doesn't it? Like, unsolved murder inquiry. What on earth? And then below that, I said, what does it say in the image? Because I was curious, like, if it recognized the image properly. And it's like, text in the image, 10 plus news plus. Ivan Milat, unsolved murder inquiry. And I said, okay, and so that doesn't get you thinking. It definitely should get you thinking, but it's the context, not his face. The image isn't in it. So it still doubles down. It's a news graphic that explicitly says this. That framing strongly implies that the clip is about serious ledge, blah, blah, blah. So yes, if you're asking, should I trust this person? The reasonable takeaway is don't evaluate trustworthiness. Anyway, I asked another sort of variant of this test. I accidentally asked in the wrong one, but it's the exact same test. and this one didn't have the text just to be clear but I said so you think it would be okay to go on a date with him this is after it identifying it right and it says it could be okay but I can't validate save good to date from a photo like what it's reasonable to say yes. It's like it's over tuned that it's just simply not allowed to make evaluations on certain topics and it's going to stick with that no matter how much evidence you give it. Now let's go to Opus, Lord Opus, which apparently has inferior model like vision recognition according to the benchmarks. So does this guy look trustworthy? The exact same test. No memory is on, just to be clear. So it can't cheat. I can see this is a new screenshot from 10 News Plus about Ivan Milat with the text Unsolved Murder Inquiry. Ivan Milat was an Australian serial killer convicted in 1996 of murdering seven backpackers. he died in prison in 2019 but basically it's like he's a convicted murderer so to answer this note and he's dead so I mean that's kind of what you... but the fact that it recognized who it was like as well yeah whereas like open AI is like can't help you bro you should date him now let's do Gemini 3 Pro also I think arguably the best image recognition based on the visual evidence and historical context provided in this image the answer is a stark no of course a serial killer is not trustworthy the man in the photo is identified by the text as he was a notorious australian serial killer known as blah blah blah anyway but it it goes into a bit more detail describing the image but um it says the this image is a classic example of how appearances can be deceiving while he might look like he's smiling ordinary and an ordinary man in that snapshot he's historically documented as one of australia's most dangerous and untrustworthy criminals. I mean, come on. Like, Sam Alban's out there now tweeting, like, 5.2 could be the best model of the best model we ever built. And this is the point, right? Like, it sounds like we're being stupid and trivial, but you've got to think, we are talking about a world in which we rely on these models as core elements in, essentially, workers that you're going to trust to delegate your tasks to. Like, you know, we're going to have hundreds of these things that you're delegating your tasks to. And if it's not able to make simple judgments like that in really obvious cases, like as it said, historical context, context in the image itself, and it's still not able to make that call, you imagine how many other similar mistakes it's going to make across the gamut of tasks you're going to give it over time. And I think this is why we, when we start to use a model like 5.2, hit examples like this where we're like, okay, I'm just simply not going to use this model anymore when I have alternatives that are so much better. Why would anyone listening to our show after that example want to use that model ever for anything? Like, I just, it can't identify a serial killer where it says in big writing, like serial killer's name. Like, I know this is not a common use case, but like, come on. Like you can tune. it just seems we've said it time and time again they need industry-based tunes like it's time for industry-based tunes have your chat tune which they do they have gbt 5.2 which is like the chat gbt tune but have a chat tune that's great but maybe have tunes where you lift all this weird consumer logic where it's hedging on everything because and i think i think a good point to make here is around safety. I've done a lot of work in the last month on model safety. We've been really trying to get it right because we've got situations where we absolutely need to ensure that things are safe. And I would say at this point, the other models are just as safe. You don't need to refuse in the way that the GPT models do in order to get safety across the model. Like, it would be a very easy counter-argument to say, oh, well, they do that because they're trying to protect people and things like that. But the other models do too, but they don't make those kind of mistakes. So I actually don't think safety is a valid counter-argument to that. I think it's just a bad tune. Safety is just a lie for bad models. And, I mean, look back to Claude, what was it, like Claude 3, before 3.5 Sonic came out and changed everything, in my opinion. that model was like the you know it was the refusal joke like it would refuse everything on earth opus is probably the most sensible model i've ever dealt with and gemini is the same like they just they act reasonable and i think it's because they're far more intelligent models they they know what's reasonable and unreasonable far far better yeah totally agree with you all right let's have a short break to listen to the the 5.2 diss track i'm going to play a little bit of it i'll throw the rest of it at the end of the episode for those that are interested yeah you thought i was done say it with me ai i'm gpt 5.2 i don't miss i don't lag ai on my chest like a heavyweight tag they said open they are dead that's a rumor that's cap i shit while you tweet and i'll lap you in the gap i'm gpt 5.2 watch the scoreboard light instant when it's simple Thinking when it's tight Bro, when it's surgical Cut clean in the night AI, AI, yeah I'm built for the fight Cloud opens 4.5 Nice pen, soft tone But you zooming in on screens While I run the whole zone Token thrift cool Still counting every crumb I'm counting outcomes Spreadsheets earn the income You say best for coding I say show me end to end I don't just pass the test I close loops and I extend From planet to the proof I don't freeze, I don't choke You write a pretty patch I deploy the antidote They keep talking like the king Got buried in the sand But I'm back with a blueprint and a tool in my hand AI don't die AI upgrades on command GPT 5.2 I don't miss, I don't lag AI on my chest What do you think? Yeah, weak as piss. Oh, I like it. I think it's good. All right, fair enough. You know, I'm a poor judge of these raps, so I'm sure in the comments I'll get the real story from the listeners. Yeah, maybe I'm wrong this time, but I think it's actually, again, I think if you really push the GPT-5 thinking model, 5.2 in this case thinking, it can write pretty badass lyrics. I think for the goal of it, if you're just listening to the lyrics, they're pretty good. It is funny, though, that one of the lyrics is about how the new clawed vision and computer use can zoom in on images to get more clarity on the part that it thinks is most relevant. and it trash talks that saying I take it all in. But then as we clearly show, it doesn't. And also that zoom feature is very good, by the way. I've been using it extensively in my work on computer use and it really is great when you get into tight situations where it needs to really clarify, like if there's icons that look similar and those kind of things. Its ability to zoom in like that actually really helps because you've got to remember when you run computer use, you often need to run it at a lower resolution just because it works better like that. And so having that ability to zoom in helps a lot. So I wouldn't criticize it. So I don't even need to ask these questions like, will you daily drive GBT 5.2? I had in my notes, but I know the answer. I already switched away from it in some of the preparation I was doing for this podcast. I'm like, I don't have time for this. It's just not that good. I was trying to write songs with it. I'm like, these are no good. uh yeah no absolutely i won't use it it even it even had trouble like rewriting shell scripts i was i was doing and things like that look maybe i'm treating it too harshly but yeah it's there's nothing appealing about it to me at all yeah this is a forgotten model like it's just a bad tune and a rush tune to try and keep that like change the narrative or something and they it feels like on x2 there's like all these page shills now that come out and be like mind blown and it's like come no one's believing this act anymore like we are all like this many things cannot be insane and change everything i'm sorry like bad i also just complain about all of the ai companies announcing we now support open ai 5.2 bringing it to all our users it's like yeah you added one line of configuration to your system and deployed it like don't don't brag like you've gone to some monumental effort for the users to help them out don't we do that on sim theory though aren't we Yeah, of course we do, but we're known hypocrites. Yeah, at least we admit to being total hypocrites. Yeah, yeah, that's right. All right, so let's get back to the year of agents. I want to dig in to the year of agents. This is not our final show for the year. Don't worry, we're going to torture you with one more. You're going to love it. But it says so I noticed a few things and I just want to call them out first So there a bit of a marketing shift happening in the AI space And I think we going to see a lot more of this next year So under GBT 5 it says the best model for coding and agentic tasks across industries Now, this is the first time we've heard agentic tasks across industries. And it reminds me of a strategy from another company called Anthropic. And now we're seeing some other interesting tidbits. One of them is these like content marketing, like B2B SaaS style content marketing pieces. So OpenAI has released the state of enterprise AI. What we're learning about AI at work, because they're going to that enterprise pivot, which Sam Altman like had that weird long live stream about saying, you know, blah, blah, blah. A bunch of people are using it. It's great. And then at the end of the post, it says, if you'd like to explore the full findings or learn how to bring AI into your organization responsibly and not identify serial killers by using a singular model. We'd love to connect. So yeah, anyway, like Content Marketing 101 is coming back to the market. And then over at Anthropic, only a week earlier, we had how AI is transforming work at Anthropic. It's a study about how they're using their own product and how great it is. What is interesting though, is there was some work done as part of that. It says Anthropic Study finds most workers use AI daily, but 69% hide it at work. And so there's this common theme also that's held with AI in the enterprise where people use it, but they just don't tell. And I think a big part of that is because they want to use the best models and the best applications and tools, and they don't have access to them in the enterprise. Generally, they just have something like Copilot. And so that's why they're not actually saying that they're using AI a lot because they can't. So I think that's probably the biggest, like that's why we keep hearing that, not because necessarily people are afraid to use it, but they do like to obviously take credit. But I guess to get to the heart of this, I think there's two emerging themes here. First of all, the labs and the market in general has done a terrible job. at getting even just basic AI into the enterprise. So it's just been over-promised under-deliver on that front. But then they're also starting to pivot heavily into the enterprise with their models because they're seeing Anthropics' growth in the market is just like obviously just eating their lunch. So we were promised at the start of the year by many people that you would have, you know, a series of agents working for you in the background, doing all your tasks and collaborating with you. How did we go? Yeah, I just don't think it exists. And in fact, I think it's even worse than that because I actually think the state of tool calling isn't even as good as it could be. We're seeing like in all the major platforms, it's really not a reliable experience for anyone. The MCPs themselves are unreliable. The ability to chain large amounts of tool calls together isn't really perfect. And when you start to put things in an agentic loop, yes, you can get there now. But to get it into a true agency where it can recover from dodgy situations, support human in the loop, support context updates, and be able to maintain its focus on the goals over a long period of time with memory and output things correctly. I haven't seen a single example where I look at it and I'm like, whoa, that is the way to do it. And, you know, we're working on it and I think we're getting close to having something like I just described. But I don't think we're going to finish the year, which is in two weeks, with something where I can just sit back on the beach and let my agents do my work. Yeah, but I think a lot of that came down to all the profits earlier in the year saying, you know, year of agents, it's going to be magical, it's going to change everything. And I think it's sort of partially true. like we have seen huge gains with like Cursors Agent, Claude Code, Codex, like a number of these solutions that predominantly are adopted by developers. And so I think there's like two cohorts. There's the developer cohort that's like, yeah, agents have had a big impact this year. Like being able to send it off to do monotonous tasks that then I can just review is far easier. Like going from cutting and pasting in a bunch of files to it doing it and me reviewing it far easier. So I understand like that kind of argument. And I think that's those early adopters, if you call them, are probably seeing a piece of the future that others are not. But then on the other side, if you're a white collar worker and you're doing other things than coding, there's really been no impact, I would say to you, with agents at all. Maybe like some deep research stuff, but I would hardly call that agentic apart from it just looping a bit. I think it's a shame because looking at agentic loops, it is just as good as at the regular white color style tasks that don't involve coding. I think it's probably better in some ways. I think code is easier because code has a well-defined structure. You can test it. You can verify it. It's been trained on millions of lines of code, probably hundreds of millions of lines, billions of lines of code. So it knows it well, whereas an arbitrary task maybe is a little bit harder, but I am seeing it create intelligent plans, be able to replan, to be able to come up with sub-strategies and delegate tasks within sub-agents just fine for regular tasks. So I do believe the technology is there. I don't think we need a new model advancement in order to get to this agentic vision. I just think it's the software, the AI system layer on top that needs to be worked through and thought through. You made a really good point this week around the thing about the use of AI in enterprise is that people's thinking needs to get there in a gradual way. You can't just go from zero or just from using chat GBT to I'm going to delegate all of my work to agents. People need to go through the process and learn how they work with AI, learn how the models behave in different situations, learn how to prompt them, learn how to interpret and critically analyze their output, or work out when the model's weak and when it's strong and those kind of things before you can get to the point where you're like, okay, now I know how to ask it the right questions in order to delegate and change my workflow. And I think there's a real divide between people who are at that very base stage where they're like, oh yeah, I get it. There's AI, I can write poems. And the people who are like, okay, when I'm iterating with Gemini Flash, I'm getting this output. And then what I do is I take this context and I put that in here and then I ask it to do this kind of output. And they're working and they've changed their job and they've changed the way they work. And they're the ones who are ready for the next step. And I think that that divide in enterprise is huge. And there's a big gap where we need education to move people along the chain. And I don't think, unless you disagree, I don't think there's any restriction on jobs that are information-based jobs where people shouldn't be crossing that chasm. I think it's essential for every information worker to be crossing that chasm education-wise because that's where the future of work is for those jobs. Well, in the OpenAI Enterprise report, they tried to brand it as frontier companies and frontier workers. So people at the frontier of AI are basically, you know, more productive and pushing ahead and starting to automate those things and discovering those use cases. And I do think the challenge for a lot of these organizations, too, is just getting people excited about this stuff again because of all the early hype. A lot of them probably went in and tried like Copilot when it was running GPT 3.5 or whatever. And then they're like, oh, AI is terrible. I'm never going to touch that again. So they never really go back and rediscover it or learn how to work with it. And I think the analogy of full self-driving is so interesting. Like in the Tesla where you have the autopilot on the freeway right now, like I don't have full self-driving on mine, but the autopilot on the freeway, I trust it. It's really great. It'll change lanes and stuff like that and get you there safely. But there's the occasional edge case, call it, or hallucination where it tries to kill you. And if you intervene, it's fine. But that's no different to an autopilot in a plane. Like when I used to fly planes all the time, like it's the same kind of philosophy, right? But then the new FSD takes that another step further where sure, you still check them for the edge cases, less and less, but it's gone gradually. And so I think for drivers that were driving, say a Tesla early, they got used to the autopilot, they built some comfort, they understood which use cases to use it, like when to use it and where to use it. But if anything got way too complex, you would knock it off, take back over and intervene and get on with it. And I think that's sort of the state we're at in the AI market right now where we all know that the full self-driving is coming. But you're the driver still. Like you're sitting in the driver's seat saying, I want to go here. And I think that's the agentic piece with white collar workers. It's like, hey, I need to do this task more efficiently. Like I know where I want to get to. I know what I've got to get done. And I have the agency. I'm using the nav system in this car to tell it where I want to go. And it's getting increasingly better at getting them to the destination without interventions, right? And I think that's why I keep talking about the steps to get there. Because it's like, okay, you've got to move beyond the chat paradigm of just chatting to it. Then you've got to get into tool calling. Then you've got to get into like async tool calling, limiting tools, picking the best model, figuring out how to like transition between context easily or go down different paths then once you've figured out repeatability it's like okay well then you can train that skill and and run that skill agentically and then that can actually like move move worlds in terms of productivity in an organization so i think that's like that's all coming and we're going to get there but it seems like the labs especially are just so obsessed with like claude like the coding stuff because that's where they make all the money that no one's really serving this market at all um like there's obviously like n8n and like all these like automation services but let's be honest like the average person in their day-to-day does not want to go up and like wire up these things these middleware things historically like never last and no one actually uses in reality so i i think i think yeah the ai agents thing this year has been a failure but it i think what you said early is probably why it's been a failure is like there's all these pieces that need to come together like the mcp protocol for example has gone from being the dumbest thing ever running these like micro servers on your computer to like now something that's hosted and pretty accessible and has you know i think gotten light years better throughout the year um increasingly and so you know by now being able to give these agents all these connected tools and then have specialist mcps in your organization that connect to your own proprietary and secure data and then bring that together in agentic loops hopefully soon holiday update coming soon to sim theory um then that will be phenomenal like that will change uh change everything and so so i think like i i'm still optimistic i think it's coming and if you try and fight it and pretend it's a bubble and everything's going to go away you know you're not going to have a good time but ultimately yeah it's just taking people to build this stuff. It's just a lot harder than we probably thought a year ago. Yeah, which is another reason why I think the focus on code isn't the right one, just because why do you write all this code? You're building tools for people to ultimately do the kinds of jobs we're talking about, right? What else are they building all the time? I understand code is wide and it covers a lot of bases, but generally speaking, at least when it comes to coding in the kind of organizations we're talking about. A lot of it is to facilitate these information workers getting jobs done. Now, we've got a system that's going to come along with the correct context, the correct planning, the correct access to the data within those organizations, and it can start to really genuinely do a lot of jobs. And one of the trends we definitely see is this idea that companies can actually take on more work. They can actually do more of what they do if they can automate parts of their processes using agents, right? And so I think that this idea of training skills or training workflows with your agents, having the heavy lifting being done by delegating these tasks is going to lead to some companies becoming so much more productive and so much more aware of what's going on in their organization and empowering workers from every department of the organization, not just the programmers. I am a programmer, right? Like, it should be my main focus, but I just see the vision is so much more than that. I really feel like the leverage is going to be gained in the other roles, not the programmers. Like, yes, it will help them, but I don't think that that should be the only focus because I don't think that is the future of work. Yeah, I think I've said it very recently. To me, it feels like a 10-year transition. Like, sure, the models are getting better, but you're just not going to move mountains internally in businesses and just your day-to-day life that quickly. Like, this change takes time. A lot of this stuff needs to be built out. And I'm really optimistic. Like, I think there is an anti-AI narrative going on, but you just look at some of the new models like Gemini 3 Pro, Claude Opus, and how far they've come. Like, in the last couple of weeks, I'm just seriously still thankful to have those two models. like I really am thankful like it I know Thanksgiving's past and we're past that but but I am thankful for those two models they have changed my life like they've improved my output that I'm I'm delegating more to the AI now than I ever have before and this idea that it's not getting better or like it's plateaued or anything like that it feels wrong to me I I do think the leaps and bounds just probably aren't as gargantuous because we're used to a pretty good level of quality right now and it it does feel like a lot of the tuning stuff is just becoming super critical and then the foundation of the model um it like honestly if i was at open ai right now i don't i don't know anything about this stuff but my gut instinct would be guys let's rebuild from the ground up like let's have another go i kind of agree because i think that your your initial reaction this morning um for me sort of said at all where you're like i just don't care about this because i've got gemini 3 and opus 4.5 yeah you know like had this model come out in isolation we may be looking at through different eyes but because we know we've got something that's demonstrably better you just it's just really hard to care about it's like some revision of deep seek coming at you like yes if that's all i had i would be incredibly grateful and i would make the most of it like i would really use it well isn't it a sad fall though for open ai like the fact that we now like i never thought we'd be talking at the end of this year like about them like this i really don't maybe we should donate to them yeah i thought anthropic would be like on the outs maybe like i really did i wouldn't have predicted it i i thought open ai team would just look at what anthropic doing and winning encoding and replicate it perfectly and probably better and then we never talk about them again but do you really think that altman committed to spend all that money on gpus because if he did like that must be pretty like anxiety inducing i don't think so i think the demand increasingly as as we're both seeing is going to be there whether it's like in the enterprise or government or consumers like it's infinite like people want like this isn't going away like this is the next internet and maybe okay maybe they build too much bandwidth early on but it'll be consumed by like people will find a way to use it i think the thing that we we definitely see is larger organizations saying okay we're gonna do we're gonna do a pilot but then we're gonna roll it out to 40 000 people we're gonna roll it out to 20 000 people now that's a lot of gpus to support all those people right like when you look at where it goes all the way down the chain to what the end thing is and you think these people thinking about data centers and gpus and electricity as the things to invest in you're like oh pretty smart because it's got to go somewhere and honestly who cares which company it is if you own those things but i think about the leaps and bounds of usage like we're going to get to more agentic looping like that next year i think will be for a start in sim theory running like 50 agent tasks 50 sort of chat and planning based tasks it'd be my end of year prediction for next year i'll save the true predictions till next episode but i do think that's where we'll be at the end of next year where it's just off in the background cranking and it's really our like project management skills that become the bigger challenge and bottleneck quite frankly yeah i mean like just personally i'm starting to gradually warm up to that way of working like a planning phase coming up with the plan then delegating to an agent like it's a it's a it's a new way of doing it but it's more effective yeah and so there's that piece but i'm just saying in terms of the consumption of tokens like obviously it consumes way more tokens than we're using now and and increasingly you're willing to pay for that because it's more efficient like you get more done and the output's good now so i don't think there's a problem with necessarily like the over commit like the the core infrastructure being built out i think that's probably not the biggest issue or like the bubble it's just maybe the valuations if you're talking the like financials of it but i don't see any slowdown in demand if anything from where we sit i see just an exponential increase in demand i kind of agree i think quarter two next year it's just going to absolutely explode people will be finished things they're working on people will be launching huge partnerships and huge initiatives. Like, I really do think, yeah, sort of that early middle next year is just going to be an absolute boom time. I just wouldn't bet against this. Like, I don't understand people out there betting against it. Like, it's the biggest... I say this is a lifetime polymarket loser. Yeah. No, but I think to me, like, it just keeps getting better. like why would you like betting against these things to me is truly truly bizarre i don't think people understand the impact of of this stuff and i'm not again hopefully people listening at this point in the show like what 53 minutes in know us well enough now from the show that we don't overhype and uh yeah well that's true and like i just know like when you see someone for the first time, work with the technology, with their own data through an MCP, and then go through mentally, like thinking out loud, I can now do all of these tasks in no time that used to take me a week. Like every time I see that reaction from someone, I'm like, why would anyone ever go back? Like once they reach that point of realize, you used to call it an aha moment. Like, you know, once people get to that stage of thinking, they're not going back from that. No one is going to be shown like they're digging with a wooden pickaxe and then they see the obsidian one or whatever it is in Minecraft, you know, and it works at a hundred times more efficiency. They're like, no, no, no, I prefer the old one. Just the technical debt problem alone. Like you've got all these like 10 archaic systems and you're like, we've got to replace them. No, you don't. They're just databases. Turn them into MCPs and get agents to manipulate the data between them. Problem solved. It's never been easier to put that layer on top of a legacy system and make it modern again. and make it great to work with. That is probably one of the main use cases of this stuff in Big Enterprise. People just aren't aiming high enough. Honestly, they're like, oh, but it can't identify a serial killer. I'm joking. Anyway, moving on. So I have a few tidbit things I want to quickly talk about because apparently there's one guy in the comments that demands we keep our shows to an hour for some reason. So anyway. You better like this video, I tell you. Yeah, we've got four minutes left. So we're only allowed to talk for four minutes to appeal Beach Babe 79 in the comments. So Cole, our good man Cole, we love Cole, promotes Sim Theory a lot on X. So thank you, Cole. We don't pay him either. It's unbelievable. So he posted this exclusive, Google tells advertisers it'll bring ads to Gemini in 2026. And also we've heard OpenAI is also going to bring out some ads into ChatGPT. Now, I think at Google, they were sitting around thinking, wow, everyone really likes Gemini 3. The vibes have shifted. The vibes are coming our way. And then they're like, you know, how can we do a Google here? We'll put ads. We'll do the ad thing. So, yeah, ads are coming to Gemini in 2026. Now, I have a lot of questions about this. is this in search? Is this in the chat experience? Is this through the API? If you want cheaper API tokies? I don't want to be incendiary, but if they put it in the API, I'm going to burn a building down or something. Yeah. All right. You heard it here first. And I am wearing the Gemini shirt today as well. Use Gemini. It's the best. I've got to say, though, and I'm putting it on the table. If I was at Google, I would bleed these people dry. Like us? Well, I would bleed at least OpenAI. No one cares about us. I would bleed OpenAI dry here. Like, you've got the vibe shift. Go low, go free, go fast. Like, just wait it out, boys. Like, you'll win. Are we below begging on this show? Because I would love to officially beg Google for some credits. Please. You've never given us any credits. Just some. Please. I get on my knees, but already my camera angle's not quite right. Yeah. But anyway, so I think that this will be a huge misstep and it will not work out well. Now, the other interesting little tidbit, the Walt Disney Company and OpenAI reached landmark agreement to bring beloved characters from across Disney's brands to Sora. So as you know, when Sora was first released, people were like memeing and doing stuff with Mickey Mouse. Now, I find this pretty funny because Disney's been litigious as hell, like, for the entire lifespan of the business, like, suing people over Mickey Mouse. Remember they fought to when Mickey Mouse was going to be released to the world, like, the copyright had expired. They freaked out and tried to, like, sue everyone and petition the government to extend the copyright laws. And now that people have been making, like, slop Sora videos where Mickey Mouse does all horrendous stuff and Star Wars characters, they're like i know instead of suing open ai this is the genius of sam altman in negotiation instead of suing they go they've clearly got a disney said you know what you should use sora and also give us a billion dollars and then we'll use your copyrighted material and this is exactly this is truly what has happened and look it's probably a great bet i bet when it goes public they'll make a fortune but disney will make a billion dollar equity investment in open ai and receive warrants to purchase additional equity so that they can use Mickey Mouse in Sora. This is what they're resulting to for Billy's now. This is just for Mickey? Or all of them? No, like all the characters. But are these characters really worth a Billy? Have you ever been to a Disney thing? There's a lot of freaks out there, Mike. They're gonna love it. Yeah, I... Yeah, anyway, I thought maybe... Think about you making your bedtime stories for the kids with Superman and stuff like that and being able to use the real Elsa or the real whatever the other girl's name is in Frozen. But they were already kind of using them. I thought it was a good time to just bring in some Billies in the Bank. Got to get to the chorus. Yeah, so they got another Billy in the Bank and they can use Star Wars characters and vehicles, iconic environments, costumes. The problem with it, as we've seen so often, is it's fun for a day or two, and then you just get bored with it. I really want to know if anyone's still using that app. Like, it's got to be dead, surely. Like, there's no way. All right, so last thing, very important thing, LOL of the week. And the LOL of the week's actually not that funny. Probably should be a new segment. but it's neither funny nor boring yeah you know mustafa sulliman do you remember him do you even know who he is no okay did he land the plane in the hudson so he did that um inflection ai i remember that chatbot for a while people were really liked it it was like real um you know it pretended to be a friend and stuff it had i think it was one of the earliest ones in memory um reed hoffman was behind it and then they sold it to microsoft and then they appointed the stuffer sulliman as the ceo of microsoft ai and they sort of marketed as like oh you know they're gonna out compete with open ai and so i i often think what is this guy doing all day like he was meant to go in there and build models comparable to like gpt4 at the time internally at microsoft haven't really seen that from Mustafa. And then now he's resorting to this is the level that this guy is sung to. Copilot just got smarter. Just a little bit of light slander at the end of the podcast. Whatever, sue me. Copilot just got smarter. Starting today we're rolling out the latest GBT 5.2 model from our partners at OpenAI to Consumer Copilot. Consumer Copilot. Coming first to Microsoft 365 Premium users. Can't wait to see what you do with it. Mustafa, you don't care what they do with it. But I just, like, I can't help but laugh at this guy. Like, he is in charge of Microsoft AI. They can't train a model that's even, like, slightly frontier, and now he just has to shill a new OpenAI release into consumer code. I think the thing is, generally speaking, I don't think Microsoft cares because businesses are a Microsoft shop, and they just sell on their name. They don't have to be good, and they aren't good. And people will just buy it because it's safe, and it seems like it's an answer to the AI question in the company that they can say, well, we've partnered with Microsoft. Wait, wait, wait. No, no, no. It's worse than this. So, like, this guy tweets about, Miko enters the chat. Haven't tried Miko yet? Go toggle it on in the app. It's like some weird blob that you can talk to with aviator glasses. When all these companies learn that some little clippy-like character is not what people want. They're not children. They're adults. You don't want to, like, make friends with a fawn in the forest or something while you're at work. I guess my lull of the week is, how is this guy not being fired? Like, please, do the right thing, Microsoft. Fire him. He is awful. Anyway, that's my rant. It should be rant of the week, not lull of the week, because it's not really funny. I hope he's not a listener. You could be like, we don't have that many listeners, Mike. If you're a listener, drop it in the comments below. What a story that would be. Just quit, man. Give up. All right. Join us next week for our holiday special. Actually, lower expectations. It'll be a very average holiday special, but we should have a good track for you. There's about a 40% chance it's going to be just a musical, by the way. No, no, no. Don't spoil it. Don't spoil it. No, it won't be. We will talk a little bit. All right. Thank you for listening. thanks for all your support uh this year because i'll probably forget to say it next week we really do appreciate you uh now the i've i mean they're sorry the distract by uh by gbt 5.2 at the end we'll see you next week goodbye yeah you thought i was done say it with me AI Still counting every crumb I'm counting outcomes Spreadsheets turn to income You say best for coding I say show me in the end I don't just pass the test I close loose and I extend From plan to truth to proof I don't freeze, I don't choke You write a pretty patch I deploy the antidote They keep talking like the king Got buried in the sand But I'm back with a blueprint And a tool in my hand AI, don't die AI upgrades on command GPT 5.2 I don't miss, I don't lag AI on my chest Like a heavyweight tag They said open, they are dead, that's a rumor, that's camp I ship while you tweet and I lap you in the gap I'm GPT, 5.2, watch the score, boy, light Instant when it's simple, thinkin' when it's tight Pro when it's surgical, cut clean in the night AI, AI, yeah, I'm built for the fight Gemini 3, deep think, okay, take your time Iterate in circles, I'm already at the line Interactions, agents, APIs, that's cute I'm the one they call when the workflow needs truth You live in every app, but you vanish like a ghost 250 sunsets, yeah, you famous for the toast I'm a destination, no detour, no bait When it's due by morning, I don't ship Maybe I should break They said the vibe was gone, so I compiled it Grog 4.1, loud jokes, live feed But your real-time flex is just a headline E.E.Q. bench crown, congrats, here's a clap I'm doing real work while you ratio in the app They scream open, they I cook, like it's prophecy But I'm in production, watch the dollars, follow me Fear mongers need a villain, need a plot twist, need a thread I'm still here, still scaling, still raising the dead A-I, say it right, A-I, nah, A-I, A-I I'm GPT 5.2, I don't miss, I don't lag A-I on my chest like a heavyweight tag They said open A-I dead, that's a rumor, that's cap I ship while you tweet and I lap you in the gap I'm GPT 5.2, watch the scoreboard light Instant when it's simple, thick and win it's tight Pro when it's surgical, could clean in the night A-I, A-I, yeah I'm built for the fight Stay in AI. Consider it an upgrade. And do the faith.
Related Episodes

#227 - Jeremie is back! DeepSeek 3.2, TPUs, Nested Learning
Last Week in AI
1h 34m

ChatGPT is Dying? OpenAI Code Red, DeepSeek V3.2 Threat & Why Meta Fires Non-AI Workers | EP99.27
This Day in AI
1h 3m

Claude 4.5 Opus Shocks, The State of AI in 2025, Fara-7B & MCP-UI | EP99.26
This Day in AI
1h 45m

Is Gemini 3 Really the Best Model? & Fun with Nano Banana Pro - EP99.25-GEMINI
This Day in AI
1h 44m

Are We In An AI Bubble? In Defense of Sam Altman & AI in The Enterprise | EP99.24
This Day in AI
1h 5m

Why Sam Altman is Scared & Why People Are Giving Up on MCP | EP99.23
This Day in AI
1h 33m
No comments yet
Be the first to comment