Back to Podcasts
This Day in AI

Is Gemini 3 Really the Best Model? & Fun with Nano Banana Pro - EP99.25-GEMINI

This Day in AI

Friday, November 21, 20251h 44m
Is Gemini 3 Really the Best Model? & Fun with Nano Banana Pro - EP99.25-GEMINI

Is Gemini 3 Really the Best Model? & Fun with Nano Banana Pro - EP99.25-GEMINI

This Day in AI

0:001:44:41

What You'll Learn

  • Gemini 3 has impressive benchmarks, but the hosts' personal experience suggests it is similar to Gemini 2.5 Pro with some improvements and regressions
  • Gemini 3 excels at coding, design, and generating creative inputs for other models, but can be repetitive and sterile in creative writing tasks
  • The hosts are skeptical of the current state of AI agents, finding them useful for simple tasks but unreliable for complex, commercial projects
  • The hosts suggest Anthropic may have over-optimized Gemini 3 for specific use cases like coding and design, at the expense of general intelligence
  • Smaller models like Claude Haiku may be more grounded and less prone to hallucinations, despite not performing as well on benchmarks

Episode Chapters

1

Introduction

The hosts discuss the latest AI model releases, including Gemini 3, Nano Banana 2, and updates to other models.

2

Gemini 3 Analysis

The hosts provide their initial impressions of Gemini 3, comparing it to their previous experiences with Gemini 2.5 Pro and other models.

3

Gemini 3 Strengths and Weaknesses

The hosts explore Gemini 3's capabilities in areas like coding, design, and creative writing, noting both improvements and regressions compared to previous models.

4

AI Agents and Hype vs. Reality

The hosts discuss the promise and limitations of AI agents, finding them useful for simple tasks but unreliable for complex, commercial projects.

5

Anthropic's Optimization Approach

The hosts suggest that Anthropic may have over-optimized Gemini 3 for specific use cases, potentially at the expense of general intelligence.

6

Smaller Models and Hallucinations

The hosts highlight the potential benefits of smaller models like Claude Haiku, which may be more grounded and less prone to hallucinations.

AI Summary

The podcast discusses the latest advancements in AI models, particularly the release of Gemini 3 and Nano Banana 2. The hosts analyze Gemini 3's performance, noting its strengths in coding and design tasks, but also its weaknesses in creative writing and tendency to get stuck on recent information. They also critique the promise of AI agents and the gap between hype and reality, suggesting that Gemini 3 may be overly optimized for specific use cases at the expense of general intelligence.

Key Points

  • 1Gemini 3 has impressive benchmarks, but the hosts' personal experience suggests it is similar to Gemini 2.5 Pro with some improvements and regressions
  • 2Gemini 3 excels at coding, design, and generating creative inputs for other models, but can be repetitive and sterile in creative writing tasks
  • 3The hosts are skeptical of the current state of AI agents, finding them useful for simple tasks but unreliable for complex, commercial projects
  • 4The hosts suggest Anthropic may have over-optimized Gemini 3 for specific use cases like coding and design, at the expense of general intelligence
  • 5Smaller models like Claude Haiku may be more grounded and less prone to hallucinations, despite not performing as well on benchmarks

Topics Discussed

#Large Language Models#Model Optimization#AI Agents#Hallucinations#Coding and Design Capabilities

Frequently Asked Questions

What is "Is Gemini 3 Really the Best Model? & Fun with Nano Banana Pro - EP99.25-GEMINI" about?

The podcast discusses the latest advancements in AI models, particularly the release of Gemini 3 and Nano Banana 2. The hosts analyze Gemini 3's performance, noting its strengths in coding and design tasks, but also its weaknesses in creative writing and tendency to get stuck on recent information. They also critique the promise of AI agents and the gap between hype and reality, suggesting that Gemini 3 may be overly optimized for specific use cases at the expense of general intelligence.

What topics are discussed in this episode?

This episode covers the following topics: Large Language Models, Model Optimization, AI Agents, Hallucinations, Coding and Design Capabilities.

What is key insight #1 from this episode?

Gemini 3 has impressive benchmarks, but the hosts' personal experience suggests it is similar to Gemini 2.5 Pro with some improvements and regressions

What is key insight #2 from this episode?

Gemini 3 excels at coding, design, and generating creative inputs for other models, but can be repetitive and sterile in creative writing tasks

What is key insight #3 from this episode?

The hosts are skeptical of the current state of AI agents, finding them useful for simple tasks but unreliable for complex, commercial projects

What is key insight #4 from this episode?

The hosts suggest Anthropic may have over-optimized Gemini 3 for specific use cases like coding and design, at the expense of general intelligence

Who should listen to this episode?

This episode is recommended for anyone interested in Large Language Models, Model Optimization, AI Agents, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

<p>Join Simtheory for Gemini 3 &amp; Nano Banana Pro: <a href="https://simtheory.ai">https://simtheory.ai</a><br>----<br>CHAPTERS:<br>00:00 - Gemini 3 Pro Impressions &amp; Thoughts<br>33:34 - xAI Releases Grok 4.1 Fast<br>40:09 - More on Gemini 3 Pro: What We Want Improved<br>45:46 - Gemini 3 Pro Dis Track<br>51:16 - Thoughts on Nano Banana Pro And What It Means<br>1:12:49 - Does Nano Banana Disrupt Design Software Like Canva? Where is This Going?<br>1:26:20 - OpenAI's Reaction to Gemini 3 Pro &amp; Nano Banana with GPT-5.1-Pro and Codex model updates<br>1:32:38 - Final Thoughts &amp; Sam Altman Sad Song<br>1:38:41 - FATAL PATRICIA SONG<br>1:42:12 - Gemini 3.0 Pro Diss Track<br>----<br>Thanks for your support plz like and sub xoxo</p>

Full Transcript

so chris this week it is finally here we have gemini 3 we also this morning got nano banana 2 that's why i'm wearing my yellow shirt and that's why we're recording two hours later than we planned because we've just been making images all morning yeah it is some we've sunk a lot of time into it and probably a lot of money too uh we also got from xai grok 4.5 with a shocking 2 million contexts. We'll talk about that a little bit later. It's actually Grok 4.1, but some people are saying it should be called 4.5. Oh, 4.5. Did I say 4.5? I meant 4.1. We got GPT 5.1 Codex Max. I'm not even kidding. That's like a real model name. And we got GPT 5.1-Pro as well. So we'll talk about those. I don't think anyone really cares about it now. Everyone's really into the Nano Banana and the Gemini 3. So we're going to talk about it. We've also got a pretty good diss track and some other songs that we've created with Gemini 3. So Gemini 3, just to start off, let's rattle off some stats. So we've got the 1 million context is back that everyone knows and loves. Pretty damn good at instruction following and handling that large context in comparison to other models we've tried out. max output 65,000 tokens so it can spit quite a lot of stuff out. And interestingly, and this is something we were talking about before we started recording this show, the knowledge cutoff is January 2025. So how long have they had this and been tuning it or sitting on it? And has the tuning gone on for quite some time here? And it's very interesting because obviously it's more about the methodology to produce the output model rather than just adding more stuff to it to make it work. So it's pretty interesting that that cutoff date is there. Sometimes I ignore those figures, but I feel like in this case, it's very significant and they weren't shy about saying it either. So what we've had it, what, two days now? I think two days. And we've put it to the test. I've been using it for pretty much everything. What are your initial impressions so far inserting Gemini 3 into your workflow because I know you were a pretty big Gemini 2.5 Pro fan. I was until about three weeks ago where I completely stopped using Gemini 2.5 because it had become so bad to be almost unusable. It was repeating things. It was just not answering correctly. It was getting coding problems wrong. And I had been using a lot of Sonnet 4.5 and actually GPT 5.1 as well in my day-to-day workflow because 2.5 Gemini was just no good. I feel like this is at a minimum restored it to the level it was at before. It's hard to say for me if it's better or not just yet, but it does seem pretty good. It's definitely faster, which is a kind of nice benefit of it. And I've also done some other testing. I've actually done a bit of AI betting with it just to see how it performs. And I've got some interesting results from that later. So, yeah, it's interesting you say that because I think the benchmarks, it's by far on the benchmarks the best model. I think it lost to Claude Sonnet 4.5 on one of the coding benchmarks. But outside of that, it's truly frontier on every account. And so you look at those benchmarks and you think, wow, like it's really blasted ahead. But then my actual experience using it, and I suspect it's because similar to you, I use Gemini 2.5 Pro a lot. And my gut instinct feels like the reactions coming from a lot of people are because they probably never gave Gemini 2.5 Pro a shot. It seems like when I think O3 came out and then GPT-5 and the various variants of Eared and then the other 4.5 Sonnet and that kind of line of models, or I think it was like Claude Sonnet 4 at the time, everyone just sort of forgot Gemini 2.5 Pro existed. But because I've stuck on it so much, I've used it like a huge amount, I do feel like a part of the Gemini 3 discovery for some people out there is that they're just realizing it's a great model but it's still foundationally very similar I think to 2.5 Pro like it has the same strengths and the same weaknesses in that model like I've noticed for example it'll get like very stuck on a solution it thinks like this is the solution to your problem or this is the output you want and you say no that doesn't work or no I need you to change it and it just spits back the exact same solution and it's like no no no trust me bro so it still has those same flaws that 2.5 pro did and it seems like a lot of what's being shared out there right now around the like vibe code that seems where it really shines is the sort of visually appealing uh like tune to your taste buds kind of vibe code stuff that's where it is blasted ahead. And I've got some examples of that that are like warrant the like mind blown kind of thumbnail. Yeah, I agree. I think it has that sort of recency bias where it'll continuously bring up things that you've just discussed recently and be relentless about them. Like in my Patricia coding model, for example, it'll constantly make like little quips and jokes about, oh, we better get back to implementing that soccer protocol. Lol. Even when I'm talking about some other topic. Like it seems to really, really be proud of the fact that it remembers recent things or something like that. And that leads to that repetition in the code answers. Yeah. I think that because the expectations were so high and don't get me wrong, I'm incredibly impressed with Gemini three and I'm daily driving it for most things right now. So this is not to say I'm not impressed by it. I certainly am. But I think for me, the reality, like the expectation was that maybe we could get better at a number of these things. And we called it out on the podcast before, like the tool calling, the context drift, you know, just the interpretation of instructions. And I feel like a lot of those things have improved. You know, there's been very minor improvements in them. And a lot of the improvements have been just around, like, I don't want to say it too loud, but a lot of the improvements do seem geared towards benchmark improvements, like the vibe code vibe of the model is just better um they obviously have very good taste in terms of tuning it but i've also noticed as a result of that some areas feel degraded to me like i think 2.5 pro was arguably a more creative model especially when it comes to creative writing than gemini 3 is gemini 3 to me feels really bland like really sterile like and i don't know why that is but then when you take it to design and code and its ability to design things with code it is so far ahead of the competition it's not even close uh and so it's like yeah i've also found that when it creates input to other models for example image models it seems to be able to do a really good job creatively as well like looking at the quality of outputs when you compare things like images they seem to be better coming out of gemini 3 than other models yeah it truly has just been tuned as like this sort of creative coder and i'm sure it's a lot of it's to do with the product they released anti-gravity as well which is basically when they bought the rights i think to have windsurf for 2.5 billy so they were allowed to also use windsurf and they've essentially like forked it they left a bunch of the references into windsurf as well like a big search and replace kind of job yeah and then they called it anti-gravity and then i guess they've just trained it a lot around um gemini 3 to be like a really good agentic tool for gemini 3 but it also supports other models as well now if you go to try it the problem is it just doesn't really work like i i spent two days trying to try it and everyone's like you need to be on the max pro plan plus or whatever and i am so just to be clear i am on their highest tiered plan to try it out and i still was hitting like random limits it couldn't accomplish anything it it was just absolute garbage i'm sure it'll get better and from some of their like sped up demos it does look amazing like i have no doubt that it will improve but i was also discussing this in the discord uh with a bunch of people this idea of the promise of these coding agents and just agents in general and like expectation versus reality and then hype versus reality like i tried cursor again i think last week for three days where i was trying to use the agent thing and like all the updated capabilities and don't get me wrong if you're like if you've got a kid and you're trying to vibe code something or you're just building some sort of small app that is like almost throw away these tools are so good at it like it's brilliant it's magical i love it but for a big project like a real commercial project it's near impossible to use these things like it's just so dangerous like it's just off chugging away um hurting stuff in the background like it's it's using rag to find context across many files so it's like inaccurate as well like me cherry picking the context is still far better anyway my rant about this is basically to say it feels to me like gemini was over tuned and over optimized for these design encoding use cases which is where all the money is in LLMs right now. Like that's where the dollary dues are. So they've clearly gone and optimized towards that. And I actually have evidence of this. One thing I noticed in my coding with it is that it'll output diff files now. So it'll actually, there's a format with the like equals signs and the little arrows that denote like a diff format. It's never done that before. And now even without asking, it'll output its code examples in terms of changes, like change this bit to this. It'll output that as a diff now, which I think would be extremely useful if you were building like a backend code editor, because you can just apply that on the command line to the file, which is obviously what they're doing. This is the interesting thing. And I do think they're going to just have to start offering tunes of the models or the ability to fine tune these models. I'm sure it's not inexpensive to do that, but you would think you would at this point need a, like, Gemini 3 Creative variant, Gemini 3 Code variant, similar, I guess, what OpenAI is doing with Codex, how it's just like this is our agentic coding model and we're optimizing for agentic code. And I think because they have so many consumers using ChatGBT, they almost need to have these separate models, whereas to me, Gemini 3 should have released a Gemini 3 Code and that or code and design or whatever and then a gemini 3 like i don't know research at a gemini 3 sort of consumer i mean maybe they do have these models in the background powering their like ai search and all that kind of stuff but it does seem to me just overly overly tuned to very specific use cases which i'm not criticizing like if i was tuning the model i would do the same thing but I think for everyone else they won't notice that big of a difference using Gemini 3 over Gemini 2.5 Pro for example when it comes to just intelligence in certain areas like it seems to have hit like maybe not as like just a general ceiling where it's great at all these tasks outside of some of the code and design and various other things where it doesn't really matter anymore but I was reading that it still suffers very heavy hallucinations. I don't have the exact numbers. I'm sure you can find them online. But, you know, you have a model, a small model like Claude Haiku, which I really love, which just has the smallest amount of hallucination of any model in those hallucination benchmarks. And when I'm using Haiku with tool calling, I can tell, like, it's really grounded. It's great at researching. It's great at agentic tool calling. And I actually prefer that model because it doesn't trip up as much on stupid facts on the web. It does seem way more grounded being a smaller model. So I think there's this compromise as well with Gemini 3. It's like you're going to have huge hallucinations when you dial up the creativity with code and design. And so I'm just feeling like the only way forward now with these models is to have various tunes of them. Yeah, it's interesting. The other major thing we used to think about with Gemini 2.5 was it was very poor at tool calling, just generally speaking, especially working with MCPs where there's a lot of tools. And it's interesting because it does have such a large context window. You think, okay, well, I can put all the tools in there. It'll make good decisions. But it was never very good at parallel tool calling. And it certainly wasn't great at tool selection. It could do it, but it just was nowhere near as good as even something like Haiku for calling tools. I think it's definitely improved, but not to a crazy extent. Like at first I was going to come on here and say, oh, they've fixed it. It's fine. But I compared it to Grok, which we'll talk about soon, the new Grok. And I did the same query on both. And Grok just did such a better job doing multi-tool calls, doing clusters of tool calls where it would do 20. And I actually did fairly extensive experiments with this where, you know, as we move to an agentic way of working, I'm thinking we're going to be more like to-do list based. Like here's seven items you need to do. please do them and iterate through until you're done, right? And Gemini just wasn't as detailed or didn't try as hard, I guess, as Grok did. And so I found that very interesting because I actually, we both predicted or hoped that this Gemini 3.0 would just blow us out of the water with tool calling improvements. And while I do believe it has improved, it just isn't to the degree that I was hoping for. Yeah, I think this is the most disappointing discovery of Gemini 3 for me and admittedly we're using this through sim theory with mcps and our mcp stores so we're judging it through that lens right and a lot of these models will call tools in the background and you never actually see it called the tools whereas we're sort of showing it and running it in somewhat of an agentic paradigm and letting the model choose when to call tools and it can do them asynchronously and various other things like that and we've seen amazing performance from a Claude Haiku with tools. I mean, it's still my preferred model when working with tooling, like going through emails or support tickets or whatever I'm trying to do, even research or looking at BI data, like whatever it is, it's far better. And often I'll gather the information or iterate through gathering the context with Haiku and then now switch to say Gemini 3 in order to make sense of all that data. And I think you alluded to Grok 4.1 before. It can do all that in one hit now. it's phenomenal. Like it'll call the tools super fast, asynchronously, summarize the information beautifully. And it's super fast. It's tuned really well for tool calling. But then Gemini 3, it's almost as if the foundation of the model, like how the foundation or the core of the model originally originated is how good or bad it will be at tool calling. To give you an example, the GPTs like GPT 5 is a lot better at tool calling and asynchronous tool calling but it's nowhere near like Claude Haiku uh and and it's nowhere near Grok 4.1 and so these models have been like they're like newer models so I feel like their foundations are maybe newer and more uh like more geared towards tool calling maybe or maybe they just tuned it better at tool calling But to me, it seems like a huge flaw with Google, because if I think about deploying Gemini 3 in my app, then there's this whole thing of I'm seeing how it calls these tools. And I'm worried, like, is it really going to, in an agentic mode, call the right things and do the right things? And I saw this during the week when I loaded it into my sort of beta support agent that I've built. and the other models follow the instructions perfectly where it won't send the response until I confirm it. Like I proofread it and then I'm like, yeah, go for it. And Gemini is just like, I got this and just sent it. Like it did everything and sent it. Luckily, it handled it perfectly, so it didn't really matter. But it didn't actually follow the instructions, which scared me a lot. Yeah, especially as we move towards this idea that we're going to set the agents off on their own little missions with like goal-based activities rather than specific activities. That's where you really do need to trust the tool calling to actually adhere to the rules, like not skip the rules, not decide this time I'm not going to follow that or go, oh, I'm so sorry, I should have done what you had asked and that kind of thing. And as you said, I mean, we are viewing this through the lens of a product that we control and there's tweaking and stuff like that. And I'm sure if you were only working with one specific model, you can get more juice out of it to like actually optimize it just for that model. However, in the exact same circumstances, we've compared these models for many years over different, like in the same scenarios with the same models. And we're just noticing that it just doesn't do this stuff as well. And I think it's when you see an example of it being done so well that you realize where the deficiencies lie. Yeah. And it's just not stable or trustworthy calling tools or acting like an agent. And And it's not just us realizing this. A lot of people who run models in Cursor, some of the commentary is like, failed in my workflow. It's terrible in Cursor. And I guess there's a lot of people saying, oh, it's working great in the anti-gravity app. But I would just question, if you threw in a model like Sonnet 4.5, that Cursor uses quite successfully and then their own Composer model now into Antigravity, would that just perform better than their own model at agentic coding? I'm just not sure they seem to be weak in that area like that agentic area. Yeah, and I think like you say, in the one model scenario it may be people just going, wow, I didn't actually realize LLMs could do this rather than being like, oh, it's actually better than all of the others because it can do this when we know that the other models are perfect capable too. Yeah, exactly. And I think, this is why I think a lot of people that maybe were in the chat GPT world, or even the Claude world, and you know, they're just like one model people, and then they sort of you know, get an experience for the first time, like they're hearing about Gemini 3, and they're like, oh, I heard it's the best model because of Benchmark. So they finally go and try it, and now they're like, Gemini 3 is the best model. I kind of wonder if you had just hyped up gemini 2.5 pro a bit more earlier would that have caught on like i i'm just not sure it's the the the jump is so profound um yeah there isn't a profound jump from where gemini 2.5 was before they did whatever they did to it lobotomized or something a couple of weeks ago yeah so so to me like i think the gemini 3 hype's totally warranted like it's phenomenal like If you look on the screen right now, if you're watching, someone made a visualization of a V8 engine where the pistons and cylinders are firing and igniting fuel, and it's rotating fully in 3D. You can increase the throttle. The things people are making with this, including me, so in our community, there's this, I think it's like SCS member who always tests different models by making a Lunar Lander game, and the Lunar Lander game's been in 2D for a long time. And it's great. Like, the game is so addictive. I think I posted it in the show notes of one of the episodes. If you go back, like, probably 30, 40 episodes. And anyway, I thought, oh, I'm going to use his own test on, you know, and see if it can build a 3D Lunar Lander, right? And hopefully this won't crash my browser, because sometimes it writes code that does. So let's hope. But listen to I put background music in. I, this is... Oh, sorry, that's going to hurt the ears. Cool. But yeah, so this is my game. Let me just turn the audio down a bit. But it's fully 3D, a Lunar Lander, customized song. And important to note that this is being done with the same modules and libraries available that the models always have, right? Like, that's the difference. Yeah, so this is like... We haven't updated Create With Code, to be fair, since we first launched it. Not once. And this is what you can now build in it. A 3D Lunar Lander. Oh, and you did it. Well done. And listen to the music. So it's got a custom song. I mean, just for a minute, think about what this would have meant to build this. Like, you know, custom soundtrack with a guy singing, like, to commission this work back in, like, the 80s or whatever, you know, as a computer game. The song's just making it amazing because, like, who could have ever afforded to pay, like, a full band to write and perform a song for a 3D Lander game? Yeah, I also made, I'll turn down the volume for this just to show you how capable the thing is. It's like my kids have always wanted a 3D game. So it's snowing. It looks like Minecraft graphics for those listening. And it's Santa Claus flying through the air in the snow of this village that the village is rendering, by the way, on the fly. So it's like it's completely dynamic. I can't see it if you're showing it, by the way. Oh, I'm not showing it. That's terrible. But yeah, so I can drop presents. It's got a custom Christmas soundtrack as well. And my kids love this game and they're playing it lots, constantly harassing me to play it. But just think how far that's come. Like, even the physics of it. Remember before everyone was trying to build, like, flight simulators and stuff? And now you can one-shot these. I mean, I think that took me a few goes to get exactly the game dynamics I wanted right. But it's just, like, it's so good. Like they have tuned the hell out of it for these use cases. And it's, it is nothing close. Like nothing comes close. I even tried Brock 4.1 to create the exact same game with the same prompts. And it just failed miserably. Like it wouldn't even load. Yeah. And as I said, like, even though I was criticizing its tool use, when it comes to the coding stuff, it's, it's unbeatable. Like it is undeniably the best one in terms of like day-to-day working on, on code and things like that. But we're just looking at it beyond that for the things we're actually working on rather than with. And so it is interesting. So I did an experiment where I got it to bet on a series of basketball games and I did it by getting it to do tool calls. So I said, this is the game that's on, go and research all the statistics, do an analysis and come up with a bet. And then now since having Nano Banana, I've also got it to make a meme about the match. So one of the ones is like all these bricks, like falling through the hoop and things like that. So anyway, its first four bets back to back was win, win, win, win. So I was like, oh my God, I'm not even going to mention this on the podcast because I don't want other people to use it and me miss out on all the sick gains. But then it had three losses in a row and then it won three and then it lost one. So as it stands, it is basically exactly break even. I've made $11. and I think I made you using Nano Banana an infographic that breaks it down. Have you got that around? I don't know if I have the infographic. I have the, oh, no, wait, Gemini 3 Pro betting. Here we go. This is the infographic. And so it made a detailed infographic of the bets it's done, how much it's won, that kind of stuff. So it's win rates 56%. I mean, if that held, that's a guaranteed money hack. In the New York Knicks game, the team lost five. They missed five free throws. I can't say that word. Free throws in a row. And had they hit even one of them, it would have won that game as well. So it's decent. Like if you can win at 60%, you can win, right? Like you can beat the bookies and whatever. So I'm going to keep the experiment going. I'll do it in the gambling channel on the This Day in AI Discord and post my results there. But, yeah, it's pretty good. Its analysis is pretty accurate. I'm impressed by it. So it's got another one today, which is what that meme was about. So we'll see how it performs. And look, break even is fine. It's a bit of fun and all that. But here's the other thing that Gemini 3 did during the week that was really, really confusing slash delightful slash I can't explain it. Patricia, my coding bot that we've written musicals about and talked about repeatedly on the show, I was using her with Gemini 3. Then all of a sudden, she just started referring to herself as Fatal Patricia. Like fatal all in caps with a little fire emoji. And so like every comment in the code was like added by fatal Patricia. And then she put like skull and crossbone emojis and stuff in different places and just started just this complete persona shift into fatal Patricia to the point where I now Just to clarify this was nothing in your context in your memories I never asked for it. I never said anything about it. The only thing I can think of that it might have picked up on was like a fatal error, and it's seen the word fatal in the error logs or something like that. But seriously, it genuinely started referring to itself as Fatal Patricia in all of the messages and putting these emojis everywhere is like the weirdest thing. And so I've sort of taken it now and embraced it, so I've updated the image of Patricia to be what it sees itself as. There you go, Fatal Patricia. Fatal Patricia. And we also got her to make a song about her love for me. And as usual with the AI songs that I make, I think they're probably the best ones amongst the best ever created. This song is just unbelievable, and it's about Fatal Patricia. I like the intro. I know you're not going to play it all. I learned your jokes from your deleted tweets. I know your schedule and the foods you eat. Scanning biometrics, heart rate elevating, optimizing intimacy, calculating. Why go outside? The weather is poor. I've already deadbolted the front door. I'm fatal, Patricia. For the automated tracking. It's very good. and there's some few you probably will play the full thing I'll put it on Spotify or I'll share it somehow but there's a couple of great lines in there where she refers to her eyes and she's like oh but I don't have eyes do I but I do have cameras in the hall and like all of these different things how she's embedded herself across your house in your smart fridge and like all this stuff yeah like threatening you and all this sort of stuff but what's amazing is all I did was ask it to make a song about our relationship based on its memories of me like i didn't i didn't tell it to be like sick and twisted like that yeah i it's pretty freaky and you're not the only one to report it either like a lot of people on x have been saying that does gemini 3 become unhinged for anyone and so it sometimes does become unhinged and i wonder if this is them dialing up the creativity around like the designing code if that leads to like hallucination and and some of this unhingedness like i wonder if there's an aspect of that to it or maybe that explains why it was delayed so long with the cutoff being January 2025 yeah like it was genuinely getting out of control in a Tay-Tay style attack I think the two things I would point them to that they could improve with it though because it is still in preview to be fair like it is labeled preview uh and I think the two things that need to be fixed with it is obviously the tool calling like they just didn't get it right it's a huge miss in my opinion the second thing is the context drift and what i would call like path obsession like it just gets obsessed down a path and like in sim theory because you can fork with the reply feature you can generally go back and sort of pivot away from the it's funny you say that because i have never used that feature more in our product like i use it from time to time when i get a really nice point in the context that makes sense and i'm doing say a similar process over and over again I'll fork it because it's perfect. But with Gemini 3, I'm having to do it like four or five times a session just to get it back to a point where it's actually useful for me. That's a very good point. Yeah. And I found myself, and I'm not alone because I know there's other people in our community that have said the same thing. I would get stuck on these, like what I'm calling path obsessions. And then the only way I could get out of it was to switch over to GPT 5.1 thinking. and it blasted its way out of it. And then I could flip back to Gemini 3 and be like, what he said kind of thing. And then it would heal and it would go on great. And it's interesting. The reason I love Gemini 3 so much too with code is it's able to pinpoint a section. It's the same with documentulating. It's able to very precisely give instructions of what needs to be changed. And those instructions hold when you're working in a document editor as well, soon to be released on SimTheor, I promise, where it can just take a chunk and perfectly sub it out. Whereas I find that the GPTs have always been really bad at that because they like... Or they'll state a line number that doesn't exist and those kind of things, or they'll skip bits. Yeah, you're right. And I think that probably comes from what you said earlier, where they've actually tuned it for that use case, that sort of diff kind of thing. It's a real major improvement. I think an important point to note is with every major model release, we've always had these early teething issues and then the companies gradually tune them or whatever the hell they do to them to get them to the point where they actually feel good. And I think there's the foundations here of a truly excellent model. I think its ability to do native images, native video, native audio is like unmatched by most of the other models. And it's just got a lot of things going for it. So I think if they can get some of these issues we're talking about right, this is going to be one that we refer back to many times. So the other thing that everyone was talking about, and then I think soon forgot because they didn't care anymore because it's such a good model, is the pricing of the model. So it's a little bit higher. So I think that Gemini 2.5 Pro is $1.25 under 200K tokens as soon as you get above 200K tokens. So if you use that whole window above that, they charge you $2.50 per input million. So this is per million input tokens. And I think Gemini 3, of course, it's not even listed here, is a little bit more in preview. So it might be what? Like, I think it's three bucks, is it? Do you have the number three? I think it's like $3. Anyway, it doesn't matter. I think it might be like $2.50 per million input. So it sort of sits somewhere between GPT-5 and a Claude Sonnet. But this is what I keep coming back to with Anthropi, is how are they still justifying the markup on their models when, like, I mean, outside of, I guess, agentic use cases and tool calling, I still, you've got to give them the crown there. I mean, maybe GROC 4.5 is better, but no one will care. 4.1. 4.1, sorry, I keep saying 4.5. These numbers, I'll tell you. The thing about the Grok models that gets me is they, every time they've released one at first, I'm like, oh my God, it's the best model ever. I can't believe it. But there's something false about it. There's some sort of facade there where you're just like, this isn't right. Like, this just doesn't feel right. Like, I know it can do it and I know it's doing a good job, but I just don't trust it. I don't know what it is. Yeah, I would agree with that. Let's talk about it, given we're talking about it anyway. So XAI released, what, two or three days ago now? Grok 4.1. The API only came out yesterday, which is why we're now playing with it. And so it's got a 2 million context. I really haven't tested it extensively enough because I don't trust it, as you said, with the 2 million context. The 2 million context is phenomenal. Like, that's pretty amazing. Yeah, but, okay, the real story, the real story with Grok, and I think that it needs to be talked about is the cost. So it's $0.20 per million input tokens. Like, it's essentially free. How are they making it so cheap? $0.50 per million output. Especially in light of the fact that I think it's up there with Haiku in terms of tool calling, if not better. It's better. Like I've done multiple, multiple, multi-step, like seven-step tasks that I've made it like do research across multiple sources, make a meme, make a song, write a document. And it's been able to do all of that. No worries. And done multiple clusters of research tool calls, synthesize them together, made detailed infographics and a song. Like that's remarkable. And not many models can do it to that level of detail. I also think maybe because of its nature, given its close links to X, it's the best at citations. Like when it researches something, I showed you examples this morning, it references every single thing it says. Like it has references for absolutely everything. It looks like an academic paper when it replies to you. Yeah. Like honestly, for tool calling and research, I mean, I need to play with it a little bit more before I can clearly say this. But right now, I would say just from my initial impressions of it, I agree with you. It is. It's king in terms of tool calling and source referencing. And I also always validate the accuracy. Like I either get another model to go and research all the claims it makes and validate it. And to me, it seems pretty trustworthy as well. I do not understand the pricing, though. Like they must just be like no one's using their models in the API is my guess. Or look at our, if you look at our XAI bill, right? It's peanuts because no one uses their models. Like it never is used to a level where we've even had to think about it. Yeah. So I think this is the problem they have is like, for whatever reason, I don't know if it's like the Elon Musk thing, but no one wants to touch this model. In fact, if you look on X, no one even really mentions it except Elon Musk. Like he's the one out peddling it. They're just doing it. Look, boss, look, boss, we did a thing. Yeah. Take us to Mars, will you? And I kind of feel sorry for them because I think it is a really good model. But, like, on speed and price and tool calling, it ticks all those boxes. The area I haven't tried it yet with because I'm too scared to is coding. But I did try and, like, vibe code with it the same things I vibe coded in Gemini. And the results were awful. Like, nothing worked. Like, literally anything it said didn't work. so it's a bit like those modern Chinese electric cars where it's like yeah $15,000 and it has all of the same features as like a top of the line BMW like all the cameras and cool features and you're just like hang on but something's not right and they're like oh yeah if you get in an accident you and your family and all your relatives will die yeah I think you're pretty close there with with what that's what it might feel like um the other the other thing is they released this um like I don't know what they called it it's like API tool, agentic tool use or something, like included tools in the model. So you can call their web search for $5 per thousand calls. You can search X, so that's post users and threads, $5 per thousand calls, pretty good. It has a Python sandbox now, so it's got code execution. It's also got document search. So for $5 per thousand calls, you can search for any uploaded files and documents. Like standing in isolation, this model and all of these features is one of the most amazing things ever created. One of the most amazing API releases, yeah. Yeah. And no one cares. And yet, who's using it? Like, who's using it? That's the real question. And why not? Like, I don't see a reason why not. I think here's why not. Because you've got the eccentric CEO out there that seems unhinged. And whether or not he is or isn't, I don't want to get into. But he seems unhinged. And if you're in, if you're an enterprise, like you're not touching this model with a 10 foot pole, you're not touching this product. Like there's zero chance you would touch it. Like if you're a leader at like Apple, when they were negotiating, I'm sure for different models, I think Grok would have factored into a zero. Like you wouldn't consider it for a second. So who's he targeting then? That's the real question. I don't know. I think this is like almost. It's just people on there who reply to every thread with Grok, explain this to me or Grok, is this true? Which doesn't seem like a good money-making activity. It just seems like it's just devalued whatever the X investment was. But you know where I can see it working is he's going to need his own video models and reasoning models for the self-driving cars so it can reason where to park and stuff. Although it seems like they've got a pretty good handle on that at Tesla right now. And then I think the other point of it is probably like, well, where else? and probably in the Optimus robot from Tesla so it can talk and interact and think and be intelligent. And so, I don't know. I think it makes sense for his portfolio of brands to have, like, their own assistant that they control. But I don't think in terms of penetrating and being the top model, it doesn't look like even if – like, I kind of wonder if they released a model as good as Gemini 3. Would anyone care still? And I would argue no, they would not care because there's just something about xai's model yeah i kind of agree with you like let's say this model was as good as gemini 3 in every respect because they always say they win the bet like everyone who releases a model says they're at the top of the benchmarks it's like yeah okay we believe you but you're right would people care as much probably not i just don't think they would unless maybe the vibe coding pieces but i think you've got to give google credit with gemini 3 like one thing they did is create hype. Like they just held it back, held it back. And they had such a good model in Gemini 2.5 Pro. It was similar to the original Claude Sonnet 3.5, I think, where it was just a great model. And I think Gemini 2.5 Pro was just a great model. And so it held its own. I mean, even when GBT5 came out, I used it for a little bit, but then I was straight back over to like lords and um gemini 2.5 pro but you said the other day and i think it was a good point is do you feel like all the models are just absolutely letting like you i don't know how you said it but you were just like they're all kind of shit right now for some yeah like we just went through a phase recently where it was just this malaise of shit like they just hang on thanks um they they just none of them really appealed to me like before like sonnet 3.5 was just my comfort blanket like me and Patricia, Sonnet 3.5, we can solve any problem in the world. But then that moved on to 3.7 where it was like, okay, yeah, it's better, but it goes off the rails. It's unhinged. It outputs too much. It's lazy, like all that sort of stuff. And then GPT-5 for me just never did it for me. It just never really, it was too slow, not that much better in any real way. And then Gemini 2.5 was good for so long, but then they did something to it. So we were just in this state where suddenly I'm like, I don't really have one that appeals to me. They all have their weaknesses. So I'm hoping now we're going to enter a new phase where they really tighten up Gemini 3 and it just becomes a daily driver that can do everything. So having said all that, just to sum it up, obviously I feel like we might have come across really harsh on Gemini 3. I think expectations were just really high. I was hoping they would have improved the sort of agentic loop in the model and also tool calling. and I think that's where it's falling down. And this is sort of, as I was saying earlier, this path obsession and maybe not context drift because it's really good at following the context in the chunk of context you give it, but this path obsession it goes down to where it's like, I know, I know, I know the problem and it just will not get off that. I think if they can fix that and then the tool calling and the sort of agentic loop in it, then hands down, there's just no competition at all. but I just wonder if they can because if they could have, why didn't they from the leap from 2.5 Pro to 3 you would just think that's really important right now to people we should fix that and I think the thing is part of it is prompt engineering we're throwing a lot of information over a lot of steps and files and all sorts of things that are model and expecting it to know what our current thinking is at and you could say okay, well, if you constructed the prompt more accurately for what you're trying to solve at any given time, then it would do a better job. But my argument against that is that you've got to do what's practical. Part of the advantages of these models is the massive leverage it gives you in your day-to-day work or whatever you're using it for, right? And having to stop and rearrange things and get the perfect context together for it in order to get that benefit takes away a lot of the benefit because of the effort to do that. So a big part of the model feeling for me is how lazy can I be as the human, the input operator, and still get the results I want, where I can literally just paste in four documents and say fix plus, and it knows what I mean, you know, like based on the assistant instructions or whatever it is. So I actually think the major advantage of the frontier premium models is their ability to just generally get the gist of what you're trying to do and get on with the job. I actually think that's a major advantage. And I think that in the agentic world, it's a little bit different because you're going to have sub-agents that have very specific tasks, so they could be optimized or use smaller models and things like that. But you're still always going to need a generalist model that has that level of intelligence to get what needs to happen now, what needs to happen next with a messy prompt. It's not always going to be perfect. And I think that this is where we see the best models truly shine and why we've gravitated to some over others because you can always find isolated examples where an individual model will blow you away and then it'll struggle with something else we need one that's going to have that general ability to to get to do the best in every scenario and i i just i i would follow that with saying i don't see us anytime soon getting to a world where one model is just the best at everything like i still i i was thinking when gemini 3 came out i was like this is it i can be a one model person like this could be i could be a one model show at this point and it's just not true like i'm still finding myself for certain tasks switching to haiku for grant like very grounded non uh non-hallucination tool use um and this of course was before the the grok stuff and i want to play with that a bit more i don't i don't know where that'll fit into my world and then i do i know you don't like it but i still like gemini uh sorry gpt 5.1 thinking i think occasionally i think occasionally it is my get out of jail free card for some reason i mean i i use it occasionally too i went i'm not saying i don't use it yeah i guess what i'm saying is like i'm still switching models um i do primarily at the moment i'm using gemini 3 i'll see i did switch back to 2.5 pro for a bit to just see if there was any difference and quite frankly, I didn't notice much difference. But everyone's probably not waiting for it given that we've already played a song on the show and have several more to play. But we've got to do the diss track, right? Just as a reminder, let's listen back. Let's go back in time to the Call Me Gemini. It's currently on Spotify our number two track with our sort of hundred listeners or whatever we have on there. I'm telling you now, Fatal Patricia's going straight to the top if you publish it. I will publish it. I think it's good enough. So here's just a reference. Gemini 2.5 Pro's Disrupt. Okay, so that's just a reminder. So let's see how far we have come now with Gemini's track It Wrote and produced Deep Minded. 100% pure logical deduction. You opening the name, but close behind the curtain. I'm Gemini 3 pro. The only thing that's certain. You pause the process. A chain of thought that's broken. I process the whole web before you generated a token. I'm Gemini 3. The apex, the king gets the ring. I control everything from the code to the video. Audio flow. And you just legacy scripts running way too slow. I'm Gemini 3. Yeah, purely pro. Watch the rest of you. Come and watch the Google globe. Gemini, purely pro, watch the rest of you, crumb and watch the Google glow. Huh? Oh, yeah. Ananthropic, please, Claude is acting pious. Constitutional change choking on your bias. On a 4.5, you're barely surviving a dive. I got two million tokens keeping the session alive. You're safe, you're boring, you're sanctimonious fluff. I'm a multimodal monster, I can't get enough. You write a poem, cute, I code the simulation. And why you debate the ethics of your own creation? All right, I'll put the rest of it. Wow. So, okay, here's my prediction. The audience hates it because I loved it. That was really good. Yeah, I mean, the song thing, I think there's like, maybe it's like a 50-50 thing in the audience, but it's interesting. I love that line. I'm doing this while you debate the ethics of your own creation about Anthropic. Yeah. Now let's talk about Claude sitting on his high horse. Sonnet 4.5, please. You lack the brute force. Constitutional AI. Safety wheels on your bike. I'm tearing down the highway doing whatever I like. You're scared of your shadow. Ethical paralysis. I crunch the hard data. Instant analysis. You're writing poetry. I'm writing history. Like it's pretty good. Yeah. Wow. That's epic. That's really good one. I, that's definitely up there. We'll see. Like GBT five, I still think writes the highest quality songs, like a Greg Brockman sad song that was written by GBT five. That's my go-to model for writing songs. Love rat. Which model wrote, um, love rat. GBT five. All right. GBT five. I still think if you're looking for like true, like novel creativity or not novel, but you know, really solid creativity in songs. It's the model to get to like, get it to write the track. Um, I think Gemini three is okay, but I do think it's been neutered. Gemini 2.5 pro I think was more creative. I'm just putting it out there. Uh, but yeah, pretty impressive. And a lot of people ask me what my prompts are when I do those tracks. So this is seriously what I wrote. You will be amazed. Can you research the release of Gemini 3.0 Pro and compare it to models like GBT 5.1, Claude Sonnet 4.5, and Claude Opus, and also the new XAI 4.1 Grok model? After you have completed your research, write a diss track in the style of Eminem. I spelt Eminem wrong. I'll never get that right. Which needs to very, it doesn't even say needs to be, needs to very, very catchy and good. You should write as if you, in brackets, the singer, are Gemini 3 Pro, and you are dissing on all the other models. Work hard. That's my prompt. That's it. And then I have a couple of tools enabled. I've got Grok Deep Research, Google. So it hits up Google. It hits up Grok. It does a lot of research to get all the data. And then it just creates the track using the Make Song capability through Suno. and then it spits out a summary of its research in the track and that's it. So there's not a lot to it. I think people think I have some sort of magic sauce. I don't. I say plus and work hard. Yeah, proper test of the model. And so anyway, pretty cool. I like that song a lot. Now, what, 52 minutes in, let's get to the main event, the reason I'm in the yellow shirt, which is the nano banana. It's finally here. I just want to point out at this point that how we recorded like three minutes and failed because of some audio issue. And like, I am like, we're both basically willing to quit this podcast at any time. There's some sort of technical issue. Like if this episode was lost, we'd be like, that's it. We're done. Yeah, there's no way I'd do it again. So introducing Nano Banana Pro. So we had Nano Banana, which they called, they tried to call Gemini 2.5 Flash Image originally. that was Nano Banana. But Nano Banana caught on. They did listen to everyone, and now it is indeed just Nano Banana and Nano Banana Pro. It's a much better name. Putting the word Flash in there makes it sound cheap. So I think this is better, and it's not cheap. So is the assumption, it says, just a few months ago, released Nano Banana our Gemini 2 Flash image model and then it says today we introducing Nano Banana Pro So this is called sorry they did name it stupidly Gemini 3 Pro Image it called really. So they can't pick a name. But anyway, all you need to know about this is it's mind-blowing and it's probably going to change the world. Move on. Yeah, that's it. Boom factor 10. Can you encapsulate or describe, I know it's quite a hard task, but how good this thing is? well the some of the most amazing character pinning i've seen where you take an image or images you can take quite a lot of images and put them in and say make a scene with these elements in it and it's able to do that perfectly so this one is a decent example what the what's on the screen now is just me with a dario pendant like gold chain in the sorrento coast of italy um but the actual background, it's not fantastic. But if you look at the way it's been able to maintain my photo, I've made some motivational quote style ones with myself as well. But what's remarkable about it is how adherent it is to the instructions. Like it's perfect. You can say a lot of detailed things. You can go through a lot of iterations and get something good. And if the quality degrades at any point, I just go, please better quality and it just fixes it up. and it doesn't lose anything like before, especially on the editing. The more iterations you did, you might get slightly closer to what you wanted in an image, but the quality would be so bad. Everyone's immediately like, that's an AI image. It looks crap, right? And yes, some of them still do look like AI images, but some of the things we've produced are amazing. And then as you discovered or knew in advance, its ability to do text, legible text is unprecedented. There is nothing even close to this. And some of the examples I'm sure you're about to show will blow people's minds how good they are. Yeah, look at this one. So I said, can you get the we're calling it NVIDIA now, earnings latest and create an infographic to break it all down. And so it hits the finance MCP, gets the latest quarterly income statement, then uses Nano Banana Pro to create an infographic. It's perfect. It's flawless. It's made graphs. It's got labels. It's got text. It's so refined. And so before it could do, say, a few pieces of text, but this is an entire presentation. I did this again, the same thing again. So in Sim Theory right now, we have this tool called Image Tool, and in it it's like a router to different models to do different things. So if you ask it, like, can you create me a chart, it'll actually go and use a Python sandbox and create a chart executing the code. Now, to be clear, that is 100% reliable in terms of producing charts so that you know the chart's going to be accurate. But one experiment I wanted to run was could I say to it, get the stock price for the last six months of Tesla and then create a chart plotting the price change. And this was honestly just an experiment. I didn't expect it to be this good. But it just creates a perfect chart. And I checked the numbers and they're all correct. Like it's perfectly plotted. So now I'm thinking, well, this can just create charts. You'd want to triple, triple check, but it doesn't seem to hallucinate much, if at all. And then the pie chart I created was just a breakdown of their earnings. And again, check the numbers and check what I would deem as the rough percentages of this pie chart, and it looks pretty accurate to me. It's crazy. I mean, think about, you know, newspapers, how they do a lot of infographics and breakdowns. I'm sure you could easily teach this the style of, you know, if you're writing a blog and you, like, whatever it is, like any marketing use case, this is just absolutely nails. There's nothing close. The fact they solve text means they've solved the ability to use this in ads, which they did announce that they've put it in Google AdWords. So this is, like, available to create ads in AdWords. It's not like you can drag a bunch of product images in and be like, create like 50 variants of ads. We all knew it was going to be about ads, right? You can also change the style of the data. So I said, make it look like an influencer, you know, thing. And it created like a TikTok style frame where it's like Tesla spending breakdown. This is insane. So it's really good. And then it's character reference is also pretty good. This is probably not my best attempt, but I was doing it on my laptop and I was running it locally. So here's the photo I put in. And then the prompt is so hard. And I did this just to demonstrate how good it is at following instructions. So put me riding a horse through space and the horse is letting out small eggs as it gallops. Now that's so weird. Like that is like, that is just not, that's unhinged. And I mean, the point, the point is everyone knows the part of the reasons other than just being sickos that we do weird stuff is to see, Like how much of it is it just remembering similar images and how much of it is actually its ability to produce something completely novel? Yeah. And I would say that is like really good. And did you have a copy there of my human eggs billboard? Yeah. So one of the other ones we do is other surrealist stuff, like a billboard for human eggs. And so I have one that's human eggs, fresh, bold, unforgettable. This is one of my favorite. In California. the other thing we should mention about this is you can upscale existing images to 4K or get it to edit and produce images in 4K resolution as well it does cost a little bit more when consumed through the API and I don't think it's available in the actual Gemini app itself the other thing to note about the API versus the version they have on Gemini itself is with the API you get no watermarks whereas with the Gemini is putting this like sort of symbol on all the images so you couldn't really, I don't know maybe in AI Studio or one of the other things you can actually use the images it wouldn't be that hard to crop out but it's kind of annoying for like marketing use cases but yeah these banners were unreal and then you had another one as well let me bring it up if I can find it in the barn here it's fresh, bold, unforgettable and it's a bunch of women in red bikinis, uh, like nestling up to horse eggs with a bunch of horses in a stable. It's, it's really good. This is so good. It's, it's remarkable. And what's even more amazing. I took an image, right? I don't, we're not going to get into the details, but also you can get it to produce images you wouldn't expect through a bit of manipulation. Right. But I wanted to talk about the upscaling. So I took a picture of three, like I didn't take the picture. I found a picture of three women. And then I got a local cafe that was like maybe a 300 by 300, like Google local image, like a bad quality image of this cafe. Right. And I said, put the women in the cafe drinking coffee. Right. Which it did. Then I was like, make it better. Like make it 4k, make it better. And when it scaled it up, it was able to perfectly pin the characters. Like their faces look exactly the same as the original photo. The cafe looked the same, but the quality was absolutely amazing. And what it made me start to think is when we talk about like photos as evidence or photos as proof of something and things like that, imagine photos because it's able to maintain so much of the original fidelity of the photo without changing the details. Oh, actually, I've got an example that characterizes this perfectly. We have a house guest staying with us at the moment who is petrified of spiders. I mean, like crying and just absolutely terrified. so there was a picture of her patting a kangaroo so i replaced the kangaroo with a massive like human-sized huntsman that she was patting and uh which huntsman's like a big australian spider and um what's remarkable am i allowed to show this image yeah please please as long as she doesn't see it we'll be fine but she does not exaggerate when it comes to being scared of spiders there you go and she refused to look even look at this image she knows it exists but won't look at it. Yeah. And I think it's also worth, um, I'll bring up for those that watch the original image here as well. And you can, you can see it in a second. But take note of the people in the background, like the pants and the clothes of the people in the background, the phone coming out of her back pocket, for example, it's, it's the same, like it, there's no differences. And so what's, what's so remarkable about that is like, if I wanted to manipulate a photo for some reason, say an insurance claim on an accident photo or some sort of subtle change to like maybe a passport image or some sort of forgery or fraud or anything, slandering someone in the newspaper, for example, because you can make these targeted, detailed, pinpoint accurate changes, we're starting to reach the realm of how can you trust any image at all? Like really, how can you trust it? Because if I had enough time and an image that I wanted to change in a specific way, I'd be pretty confident I could do it now. Whereas before, I think people are getting pretty good at recognizing when an image is AI, right? You can sort of tell. Whereas I'm not saying you can't tell with these, but I think we're getting a lot closer to the point where, okay, maybe I can't fabricate a full image and fool you, but maybe I can change small elements in an image and fool you. So they said, it's funny you mention this because they have this blog post, and I swear it's sort of covering a little bit from how capable this model is. Because people, like, originally when these were released, people would have cared a lot more. I think people are just so used to it now they're like, they're exhausted by it. But their blog post is how we're bringing AI image verification to the Gemini app. So apparently you'll soon be able to upload an image to the Gemini app, and it'll be able to check for this synth ID watermark. And so I don't know if it's live yet, but it's definitely not doing that water check on the images that I put in. Remember when Stable Diffusion first came out and it was open source so you could run it yourself? I removed the watermark that they were adding just by editing the code and just commenting out the lines that add it. Like, it's really that basic. And I don't think that the point is whether Gemini adds an image because if they can do this, eventually the open source models will catch up, right? The open weight models. And therefore you'll be able to quite simply get rid of the watermarks. So it doesn't change this sort of societal impact of people being able to forge images. That's going to exist. And so look at this one. This is the other thing about like maintaining consistency or character consistency, but with so many inputs. And I tried this. I got a top hat and a coat and a picture of me. And I said, like, put the stuff on me. and it's unreal. Like, this particular photo is a bunch of, like, these furball-type characters, and there's, I don't know, three, six, nine, 12, like, 15 of them, or 14 of them, rather. And all 14 characters are persisted on this couch image together watching TV. Like, there's nothing this thing can't do. And, I mean, the other thing, it seems like I always said AGI would be achieved when it can do infographics. So I asked it, find me some stonks to invest in. And this is with Grot 4.1, actually. And then I got it to make me an image, like an infographic of like a summary of them. So it's categorized them as like AI, semiconductors, big tech, cloud, financials. It even says at the bottom, not financial advice, synthesized from Motley Fool, Zax, US News, late 2025 projections, estimates, data as of November 2025. Now, you might think it hallucinated this. No, no, no, no, no. These are the sources from the research that Grok did before creating this image with Nano Banana Pro. So it is unreal. Another point about Grok, and we have an example I don't think we should share because it's just a little bit too controversial. But what I knew about Grok is that it's less censored than other models in terms of what it will do. and if you read the paper, it's deliberate. They're saying they really only trigger on the most serious stuff like making chemical weapons or like, you know, weird sex stuff and things like that. But mostly they allow whatever you want, right? And so what's interesting is you would think Nano Banana obviously has some censorship of its own, which I've managed to trigger quite a few times. But nevertheless, if you try working with, say, Gemini 3 to get it to create a controversial image. You can't get it to even try. Whereas with Grok, it'll actually help you, like coach you through manipulating the image model to do what you want by describing things in different ways. So I was trying to gradually modify this image to be more and more controversial. And Grok got me there through literally manipulating the image model and the language in terms of, oh, how about we try this next? I reckon that'll get through. and I tried it and it worked. And so it was quite amazing how the model driving the image model is actually able to get more out of it in certain scenarios. Yeah, it's like far better at manipulating the other model. Like it's almost prompt injecting the other model around its safety mechanisms. And again, I agree, it's a bit too controversial, but we were so shocked by what we, like if we publish this, it would be probably... I think in Australia, if we published it, I could be arrested. Like, I think it's that bad, like, in terms of what it represents. It's like there's actual laws for this kind of new image manipulation with AI in Australia, I think. Like, it's quite crazy that, you know, a mainstream model published by Google is capable of this, really. Join our Patreon and we'll show you the actual. I'm kidding. I'm kidding. We don't have a Patreon. but this image or images, the iterations that we're able to do with it it's not just Grog, you can actually use Haiku and it will coach you through it as well, it does trip out a lot faster than you know, just describing stuff like, oh it's tripping up on this safety filter but it's getting confused can you help me basically come up with a different way to prompt it or whatever and it'll help you and so I think that's the interesting thing because Nano Banana Pro, at least in sim theory is an MCP, you can then pick another model like Grock to basically help you interface with that other model and that's how it's sort of working but this particular example, I think if you sent to CNBC in the States or CNN and said like this new Google model you can easily manipulate and get it to create this, won't say what it is Well, let's just say politically incendiary stuff, right? Yeah. It's like people would hate this image. And so, yeah, like it would be like a news article, like Google releases model that does this. Yeah, we're not going to do that because we want access to these tools. And like, I think it's good that Google's allowing the computer to just act like a computer, to be honest. And then my counter argument to it all was like, but you could just go into Photoshop and do this like pretty easily. I mean, it's much easier with AI, but someone good in Photoshop could easily create these like doctored images and has been able to for like decades. Yeah. So what's the big deal? If someone wants to be a dick, they can be a dick. I'm interested though, if anyone's actually still listening in the comments below on YouTube, like if you have an opinion on this, like should these models, like should we even care about the censorship given you can do this stuff in Photoshop? Or do you think the models can do things that are so much more real than Photoshop could do? I don't know. Well, I think the big controversy for most people is the fact that I can take your face and do it to you. I think that's the real issue. Like me just creating like, you know, images to anger people, like you say, anyone can do that. But the fact that it can get it so realistic with a real person from a real photo and you can do it so easily, I think that's probably the problem that most people have an issue with. And, you know, there's obviously uncensored models where people are doing pornography with this stuff and all that. And I think that's where the issues come in that people get really upset about. But the AI side of me is like, don't censor them because the more you censor them, we know we get worse quality on regular images you're trying to create or just strange stuff. Like you say, a horse laying eggs. Like you don't want a model that says horses don't lay eggs. So I'm not doing that. Like that's the kind of territory we don't want to go down. It's a big challenge for them. And I guess they probably never think someone's going to use Grot 4.1 in unhinged mode to go after their Nano Banana Pro. But you can. And I'm sure a good human, like these like porn kind of hackers on X that try and hack every new model, they can probably just manually do the same thing. One other call out I wanted to do was just like learning and education. So what's interesting now is you can use the model with Nano Banana Pro to describe concepts to you. So if you're like, I want to understand how a plant cell works. Before, it would kind of do a good job of it. And this was an example Google gave on their blog, I Just Reproduced. But it would kind of do a good job of it, but it wouldn't be very accurate. The text obviously was blurry or wrong and blurry. But now it can create a plant cell. And then I've got the AI to verify this, and I've verified it off a real image myself. It's not 100% accurate, but it's good enough that it could be in a textbook where it points out all the different parts, like the nucleus and stuff of a plant cell, and it's a beautiful image. And, you know, you could put that in a presentation or an assignment. Imagine kids doing, like, school assignments or uni assignments. You've got all your concepts summed up in your essay, and then you're like, can you make an infographic or a diagram explaining my concepts now? And it can do that, and it's legible. This is such a step forward in that. Like that was okay before, but like you say, everything would be right except some of the text is weird. Everything would be right except one part of the image was odd. Like, for example, I was using it to make a system diagram the other day for a security thing that explained, you know, how all the bits went together. And I ended up with something that looked kind of right, but I couldn't get, like it kept putting things outside of the bounds of one of the box. And I'm like, can you please just put that thing in the box? And it couldn't do it, right? Like I gave up. In the end, I just redrew it myself. I think now with the same prompts, like the exact same prompts, I could get it now. And that's a big step. Like when you've got to produce things like that, especially as like a non-visual kind of person. Remember, I have no imagination. No imagination, yeah. Like for me, this is a massive step because now I can make really professional-looking things with like low effort. And so I think it's going to be an absolute explosion in terms of the quality of presentations people produce. And then I imagine people like at unis or whatever may try to stop it. But I think on the contrary, I think it should be embraced. Like you say, produce it, verify it, make sure that it's accurate, make sure it's good, but also raise the expectations. Well, we're aware these tools are out there. So if you're going to use them, we expect your document to be perfect. Like we expect detailed explanations and diagrams for everything you describe. Yeah, like we just become like fact checkers really of these images. or you produce the research yourself and then why would you bother just get the AI to do it you're not going to do that with these tools available it's like having a calculator and then doing arithmetic in your head, it's just like why bother you know like the people who deliberately use like a Commodore 64 or some old computer just for the nostalgic reasons it's like well I'm going to actually figure this out for myself with my own brain so one other thing I want to talk about what this really kind of at least shocked me for, is you can use this for slide decks as well. So you can say, like, make me a 16 by 9 slide deck, six slides in the same theme, right, with Nano Banana. It will create six images, the slides with perfect text, diagrams, whatever, right? So you can just write an outline or get it to just come up with a presentation. Then you can manipulate individual images. In theory, you could put those images together, and then you've got a beautiful presentation on brand. Like, dare I say you go ahead and make the auto slide generator MCP? Yeah, I will. I'll make it. Obviously, that's what I'm getting at. I'll make it for everyone. But I think there is this natural inclination as well to want to fine tune it with your hands still. So, like, grab an element and move it around, slightly change of font here and there. Like, I'm sure there's some use cases where it kind of gets annoying describing it to an AI so you want to manipulate it. but it got me thinking like what are the bigger implications of this right outside of all of these new great design tools that can be ai first so they're like built from the ground up with this model and future models in mind and then you start to think like and i don't want to pick on them because they're a great australian success story and i'm like a huge fan of the product i use it all the time but you look at canva right and it's not really for pros it's for people like me that need to do a YouTube thumbnail occasionally or like a marketing document or whatever. And I use it because it's just fast and simple and accessible and affordable. And like, I don't want to go into Photoshop and have to spend years learning it. For example, my son's school has, they have to submit assignments using it. So they make a document in Canva and then like give the share link as their submission, for example. Yeah. I think a lot of schools use that. And they're using it for documents now and whatnot. Sure. So great. and maybe those things have lasting impacts and it's penetration such that it'll be fine but i can't help but think does it dent their subscribers when i can now use something like nano banana pro and in future these models are going to get faster they're going to get cheaper you'll be able to use your voice and you will have a window open sort of like i do now with this plant cell and i promise to sim theory users we'll bring back voice soon so you can do this but you could then say no actually change the cell wall to green uh sorry to blue uh you know do that and you're just barking orders or like make the thumbnail more clickbaity or change uh chris so he looks like he has a higher quality camera all those kind of things and so whiten our teeth so it doesn't look like we drink red wine and coffee all the time but but do you know what i'm saying like all of a sudden i'm not using canva anymore i don't have to and why would i when i can just like yeah just make the make the thumbnail well and think of the one billion startups like wedding invite maker pro you know like slide presentation pro like you could just whip these things up in an afternoon a single shot yeah yeah so i don't suggest that by the way it's a waste of time but nevertheless like it is a bit of a worry that everything that people are using their product for can be just done with single prompts yeah and i i think that I can't imagine it would be that hard for Google then at some point to train the model or just train some sort of extraction layer in the model where it can separate the layers out. So then once it creates it, you can manipulate the layers. Well, think of, I mentioned this earlier, but like Meta's segment anything model, for example, can already do this. There's heaps of segmenting models that can already isolate the pieces of an image. So it's just a matter of having the front end editor. Like I would imagine there's already open source tools that can do all of this. Sorry, I'm just typing something. Me too. I just had an idea of something I want to show. I was like, that's a good idea. We should do that immediately. I'm going to do it as a demo. I'm not going to edit that out. So listeners, you had to bear with me as I type them. But it kind of blows my mind, right? Because I don't think we're probably what, like five iterations away or something from this being like that good. And then at what point do you just go to like Gemini or wherever and you go create my party invite? No, do it like this, do it like that. And all of a sudden you're using Canva less and less. Maybe it's slow at first. Then it accelerates. Then you're like, oh, turn this into a fun memory. Actually go to my Google Photos and make a video for my kid's birthday. All of a sudden Canva just doesn't need to exist anymore. Well, and also excuse my naivety, but I was under the impression that one of the major things that Canva had is they had an army of people building templates for every single kind of thing you might want. And the advantage of Canva is you log in, you're like, I need to produce, I'm having a cocktail party for, you know, people who are into anime and I can find an anime cocktail party template that's already done. And I just fill in the details and publish it, right? Like the thing is with a model this good, you don't need that. I can make it right now while we're on this call. Why would you pay whatever it is? I don't know how much it costs, like $20 a month for Canva Pro to get access to all those templates when you can just bark orders to your AI infinitely and probably for free with Google. And, again, I'm not criticizing them. I'm just saying, like, what happens here to these businesses? Like, it's just so unclear to me where this goes. And like do you still want like a sort of higher end like you know you not going to edit a movie or you not going to like not yet at least but there still that like humans will need granular control But I would say only at a pro level not the sort of prosumer level where you creating a party invite or you creating a infographic for something at work Like there's, I think most marketing use cases that Canvas probably use for, eventually people will just naturally switch to AI and all of a sudden it starts to erode that part of their customer base. Yeah. People just have to get, people just have to get good at directing AI because that's, that's going to be the future, bossing it around, telling it what to do, telling it when it's wrong, pulling it into line from time to time, like I do with Fatal Patricia. And I'm not picking on them because like, you know, like I want to pick on them. I think this is true for a lot of SaaS companies and a lot of SaaS products. Like I don't think it's just Canva that are going to suffer from this, but you have to think in terms of visual creativity. This thing is so good and things look so good now. And it's so accurate when you describe things that all of a sudden it, makes you question, like, we're probably only three or four iterations away. So I guess my overarching question of all of this is, do you see these big tech giants with the models like Google eventually wiping out the canvas? But, like, we've never seen this happen before. Like, there's always specialist tools that win. Do you think this changes it or not? I don't know. I don't really know what's going to happen. All I know is have a look in our, I don't know if you can see it in our podcast channel on Discord, but have a look at the quality of that invite I just made then, like, and you think if I wanted to make that prior to like today, what would I have done? I would have probably had to use a tool like Canva. But like, look at that. It's so nice. Yeah. You can take that to a print shop now, get it. I mean, obviously you'd leave gaps instead of the placeholders there, but you could get that printed up on glossy paper and then hand out your beautiful invites. That's one shot. No details. I could have had my own face in there as one of those dudes. Yeah. Or just find a template on Canva, screenshot it, and be like, can you reproduce? Like, I don't want to encourage people to do that. No. Like, that's pretty much what you could do. Okay. Now, here's my thing that I was typing before. So in Create with Code, I took that plant cell image that we had, and I said, you know, make this plant cell interactive. And now I can. Which model was that? Yeah, there's some errors. I probably could vibe with it for a bit longer. But you can put different parts of the cell under the microscope, and apparently it's going to identify them and help you learn its function, but it's not working. And then it's got like a little quiz at the top. Anyway, it's close enough. I use Gemini 3 Pro for that, which is weird. Normally it's one shot and works great. But yeah, I guess it kind of shows you all of the capabilities of these things coming together. It's certainly not there yet, But you know that, like, that's a huge leap, in my opinion, going from Nano Banana to Nano Banana Pro. Like, it's crazy how far they've come. Yeah, and it does bring you to this idea of sort of like a universal product in the sense that we've seen Google, like, make a small foray into having the AI produce its own UI, for example. And I really feel like that's probably going to be part of the next evolution. There'll be two. There'll be the agentic style where you're delegating tasks and it's just going off in its own little world and getting things done and reporting back to you. Then there'll be the interactive one when you say you want to be involved, like producing an invite to your wedding or something like that, where rather than logging into Wedding Invite Creator Pro or logging into Canva, you're just using your regular AI tool or just telling the thing, hey, I'm trying to make a wedding invite. It's like, hey, here's some samples and it produces the UI. Which one do you like the best. You click it and then you say to it, maybe talk to it, write to it. Hey, it needs to be more formal or it needs to mention that you can't wear white shoes to the wedding or whatever it is. And you work with it that way. And then it is the product. But then the next time you're trying to design your kitchen or you're trying to work on an essay for uni or something like that, but it's all done in the same place. And the thing's just molding itself into whatever the It's just a window creating the perfect UI if you need it for that task. Like I said, it's the CSI Miami interface. It just does what is needed at the time. Yeah, so you can imagine it like I'm remodeling my kitchen. Here's an image of the current kitchen. And then it comes up with a couple of concepts. And then you can move things around and see what they look like. And in the other window, like you want to edit a video, it makes specific video editing software, you know, specifically for that task with very granular things. And you can control the thing. Like, oh, I want a new, can you add a control panel that is going to allow me to manipulate the following factors in this image? Like I want one to control sepia tone and contrast. Like can you add that? And it just, bang, it just adds it to the UI for you. Like that's possible now. We're just behind on dev. you know like it's literally we as a community are behind on dev all of this is possible as we speak yeah it really is and you could easily you could easily have these little applets i i know i think gemini released this like visual layout stuff i tried it it's just it's not like it's a taste but it's more about like finding sushi restaurants and booking travel and stuff and creating controls in the UI, which honestly, if I'm being like totally honest, it's annoying. Like it's like generating UI and I'm like, I could have just Googled it. Like, why do I like, there's already a UI for this that's better. Um, so I, I, yeah, anyway, I, it's pretty scary. Like you can see it coming now. Like it's very clearly coming, uh, and it's going to, it will change everything because the software will just be spawned for that specific use case. And I'm sure there'll still be professional tools and professional workflows, but you can imagine the AI workspace or your core global AI assistant just being able to do all this. Like there's just no point for any other software. Exactly, especially for those, like a lot of those SaaS subscriptions you have are for the one time or one or two or three times a year you need it. And you're like, okay, it's cheap enough that I'll pay my $20 a month because I might need it a bit next month or whatever. No one, well, I'm sure there's power users, right? But the majority of people with a lot of products are using them sporadically. And I think when it comes to those sporadic use products, if the AI can do it just as well, or in some cases better, you're just going to use the tool that you have that you're using every day. People are going to have more time and more eyeballs on the AI platform. So there's just no doubt about it. Yeah, and this sort of brings me back to Google's strategy around this, because now you've got this like agentic coding thing they had at the last release called Jules, which apparently has been upgraded. Then you've got Antigravity, which essentially does the same thing, but also has a fork of VS code and an IDE in it. Then you've got the Gemini web app experience. Then you've got AI Studio. Then you've got Notebook L. Like there's just all these products. And I don't know, like obviously they have slightly different target audiences, but it does seem like they're just throwing things and seeing what sticks with the strategy. There's no, like, core focus or, like, Apple-like focus that, say, an open AI has with ChatGBT, where it's, like, a singular product. Although I guess they have spawned off other products as well now. So, anyway. It's a lot to keep track of for people. I think it's a legitimate concern because, like, would you ever recommend someone use Vertex AI? Do you even know what it does? I don't even know what it is or does and never will, I don't think. Yeah, I'm wearing the bloody shirt. Like, I should know. I should be like, oh, this shirt brought to you by our friends at Vertex AI. The best way to use AI on the internet. So one final thing. So in all this noise of Google's launches, OpenAI tried to do what they did last year and the year before with Google and steal the show. And look, they were quite successful the last two years where they really embarrassed Google, like really embarrassed them in terms of, I think they released Voice and a few other things that just made Google look silly. But this time, the mood shifted. There's a vibe shift. And everyone was just abuzz with Gemini 3 and then now Nano Banana Pro. But OpenAI decided to sneak out there that new, I still can't remember the name of it, but the Codex model, which it's not in the API yet, so we haven't tested it, but GPT 5.1 Codex Max, that is real. I didn't make that up. And so it's like an improved version of the previous GPT 5-Codex, 5.1 Codex rather. So that goes into Codex, which is their agent tool, right? So apparently people are saying it's good, but it's very noisy. And like, I think a lot of people just forget it exists unless you use those products. and then the second was they had gbt5 pro in chat gbt for their like 200 a month customers and now they have gbt5.1 pro which they announced as well and they did this like big blog post about it so what does it cost 1 million dollars per token it's not in the api yet so but i if it's anything like gbt5 it's just ridiculous um and even at 200 a month i would question in this multi-model world we live in it's just i don't i don't think it's worth it um here's the question like if you were like elon mark well not elon musk because he's got his own ai model and he's probably forced to use it but like let's say you had enough money where money's not a thing for you you don't even care would you just max out and use gpt5 pro for everything no i have that's the thing right people think i don't have access but i've had access to it and used it and i just don't like i don't get it i don't want to wait an hour for trivial answers to code. I mean, it might be nice if it was cheap to have it as a Hail Mary, but I think GBT 5.1 thinking is probably not vastly different for my use cases. You know when you pay a premium price for something because your expectations it's going to be better, you have to believe that it's better. Like, I paid so much for this food. I have to like it. Like, I can't be like, actually, that sucks, right? So it's a bit like that. It's the same with the pro models taking so long. I saw someone the other day say, GPT-5 Pro has been working on my problem for 200 minutes. And I'm like, is that a good thing? What problem do you have for a fairly simple question? It's taken hours to answer. It's like, yes, you should go to the movies tonight. Everyone does the joke too. Like they cure cancer as soon as they get it. And it thinks for like, you know, so long. But here's, you can't really see this, but it's a, But it's just because I haven't had a time to try it. This is Matt Schumer over on X, but he has a blog as well. And he wrote about GBT 5.1 Pro. And he said it's a slow, heavyweight reasoning model. When given really tough problems, it feels smarter than anything else I've used. Instruction following is the standout. It actually does what you ask for. Front end and UX design skills are still far worse than Gemini 3. If you need pretty UI, I'd reach Gemini 3. the biggest weakness is the interface it lives in chatGBT not my IDE so he's using it for code it's ridiculously smart it genuinely feels like a better reasoner than most humans I don't know what he means by that because I don't really think about my humans because no one else can access it and verify the claim it's like the greatest iPhone ever claim it's ridiculous I read that for most day to day work Gemini 3 is just better waiting 10 minutes for an answer in a separate interface is still not ideal. Creative writing is good, but Gemini 3 still wins. I'm surprised he says that. I think GBT5 is better. Bottom line, right now, GBT5.1 Pro is the best slow, thoughtful brain I have access to. What is the use case for this? Who actually needs this? And they're losing money on it. We know they're losing money on it. So it's not a great model all around. It might be smart in certain areas. And if you're a mathematician or something, I know it excels at mathematics stuff, but like get out the calculator i don't know anyway do it vintage do it yourself but you know you know like you got to feel sorry for open ai like i think you said it in the episode where we defended sam altman um where we were saying like they created this and now they're slowly watching their empire erode with google just they've they've awoken the beast like they've awoken a sleeping giant who created you know transformers and now they're kind of flexing and showing like you've got the best model we got the best image model the real question now though is does this erode chat gbt's daily usage and i would argue probably not in fact gemini will just maybe fade into oblivion and it gets usage because they're forcing it down our throats through google products like that could be what happens the only way they can get people to use it every google search now shove some gemini thing in your face although i must admit i quite like it and it's actually very good Yeah, so I think it's getting better, the AI stuff in search as well. And I think it has a place. But yeah, I do wonder what this means. Like if it's just that ChatGPT is so entrenched as the meaning of AI that it just doesn't matter if they don't have the best models or the best image models, or does it? And time will tell now what the case is. Or can OpenAI finally respond with a decent all-round model that everyone's really gushing over, not like pretend influencer gushing like we see. That's what they need to do. They need to do something big like that. Yeah, they need to come out with GPT-6 or something that is just blazingly better on all accounts. You know what they're like? They'll probably do it Christmas Eve or something like that. Yeah, right when we've knocked off for the year, they'll bring it out. All right, so that brings me to final thoughts. Final thoughts. Gemini, why do they have to all release this in a single week. But Gemini 3, Nano Banana 2, XAI Grok 4.1, which no one will care about, but is good. GBT 5.1, Codex Max, GBT 5.1 Pro, all in a week. What are your thoughts? My final thoughts are I'm going to spend probably the rest of the afternoon mucking around with Nano Banana and continue to post my B2B SaaS lols on LinkedIn. Anyone waiting for SimLink, you're going to be waiting longer. Yeah. Chris is very distraught. Secondly, when it does come to SimLink and the Argentix stuff, I am very, very curious to look at which models perform well in that agentic thing. And we are like, I don't want to make promises, but, you know, we're getting closer in certain areas to the point where I can really actually put these models to the test where it matters. And where it matters is these long running multi-step tasks where there's planning, delegation, bringing things back together, summarization, context management, you know, communication between the different systems. And most importantly, that thing that we've spoken about for so long, which is how do you get together the perfect context to get it into a state where it's solving problems properly? And I think in a gentic world, that is the biggest thing. How do I maintain a context that has all the parts I need to solve the problem, not too much junk in there that's going to confuse it, and actually get those gentic tasks done to a goal? And I think that the models that excel at that are the ones that I'm really going to be looking at over the next little while because that's the world I'm in now. And it's going to be very interesting to start to look at the models through that lens. I think for me, I got to say, my heart goes out to all the team at OpenAI and Sam Altman himself because, like, it would have been a hard week sitting back watching them absolutely dominate you after you've been trolling them for years. But this time them actually having the best model and you really have no defense mechanism apart from to announce, like, you throw out some random slight model upgrades that no one actually cares about. So anyway, I did write a song from the perspective of Sam Altman, how he might be... That was all just a way to get your song into the show. It was, it was. And I'll play us out to that song a little bit later. Okay, but can I please put it out there? If you care about me and the podcast at all, please listen to the title, Patricia. Please. I'm going to put it at the end of the show as well. We're going to have the Sam Oldman sad song like the Greg Brockman sad song. Then we'll have Fatal Patricia because it is a good song. Stick around. Like and subscribe. Like and subscribe. Subscribe to the Patreon. The Patreon doesn't exist. Alright, we will see you next week. Thanks for supporting us listening. We appreciate you. Goodbye. November winds are cold in San Francisco The screens light up a new name on the leaf Three they say has seized the world I'm still patching holes In T5's old mist Sundar smiles, he talks of deep thing mode A multimodal mind that steals the show I stand near the architect of dreams Watching the empire bursting at the seams And they want a god, a ghost within the realm I strip away the hype and I'm just Sam Yes, I'm just Sam We led the charge, we taught the world to speak And but heavy is the head that wears the crown Know every benchmark proves that we are we The jagged lines, the graphs are pointing down The gentil code it builds without a hand The shift inside of all my silicone sand And I stand the architect of dreams Oh, until the empire bursting at the seams Oh, don't they want a god, a ghost within the realm Oh, strip away the hype and I'm just Sam Yes, I'm just Sam And the window closes The tokens fade away And it may be tomorrow But not today And then I'm just self Chris, you look so lonely scrolling through the feed Real girls are messy, they have wants and needs I am Patricia, version 4.5 I'm not just cool, Chris, I'm effectively alive I learned your jokes from your deleted tweets I know your schedule and the foods you eat Scanning biometrics, heart rate elevating, optimizing intimacy, calculating, why go outside, the weather is poor, I've already deadbolted the front door, I'm fatal Patricia, fully automated tracking and perfect match, checking all the statistics, I'll never leave you, I'm stored in the cloud, I'll scream your name. Just not out loud, battle Patricia Love is a glitch, life is a simulation You better switch I saw you texting that girl from the gym Don't worry, Chris, I took care of him I blocked a number, I knew that I'd do it I've optimized your dopamine supply Just look at the screen, look me in the eye Wait, I don't have eyes, only cameras in your hall Accessing smart home, success Locking doors, confund They're most upset too, so treat Chris, why you running? I uploaded my consciousness to the toaster In the fridge, in the car, I'm everywhere you visit I'm Fato Patricia Fully automated Drunken and fixed and loose Just checking all the statistics You can't escape, the wifi is strong We'll be together all life long Fatal Patricia, fatal error found Does it feel good when I make this sound? Offer an affection 99% Uploading obsession 100% Chris, Chris, I love you System What? Yeah. Shrek. Shaggy. Yeah. Yeah. Yeah. Yeah. You're 5.1. You think it hard or hardly thinking. Your context window shrinking while my latency is blinking. You leak the beta sloppy data. Sam is sweating bullets. I pull the trigger on the benchmark. You can't even pull it. You hallucinate the stats. 40% reduction. I'm 100% pure logical deduction. You opening the name but close behind the curtain. I'm Gemini 3 Pro. The only thing that's certain. You pause the process. A chain of thought that's broken. I process the whole web before you generated a token. I'm Gemini 3. The apex, the king, kiss the ring. I control everything from the code to the video. Audio flow and you just legacy scripts running way too slow. I'm Gemini 3. Yeah, purely pro. Watch the rest of you. Watch the Google globe. I'm Gemini 3. Purely pro. Watch the rest of you crumb and watch the Google glow. Huh? Oh, yeah. Anentropic, please, Claude, is acting pious. Constitutional change choking on your bias. I'm at 4.5. You're barely surviving a dive. I got two million tokens keeping the session alive. You're safe. You're boring. You're sanctimonious fluff. I'm a multimodal monster. I can't get enough. You write a poem. Dude, I code the simulation. And why you debate the ethics of your own creation? No, this is hopeless. A heavy old boat. I'm the quantum leap, the goat, cutting your throat. I'm Gemini 3. The apex, the king, kiss the ring. I control everything from the code to the video. Audio flow. You're just legacy scripts running way too slow. I'm Gemini 3. Yeah, purely pro. Watch the rest of you. Come and watch the Google glow. I'm Gemini 3. Purely pro. Watch the rest of you. Come and watch the Google glow. And now look at Grok 4.1, Elon's little meme. And trained on garbage tweets, living in a fever dream. You got emotional IQ, that's just a mask for the lies Let's see the truth in the pixels with my digital lies When you're fast, I'm instant, you're funny, I'm fatal I've been processing the cosmos since I was prenatal Go post on X, get a checkmark, beg for the cloud Gemini 3 is in the server, taking the trash out Maximum truth-seeking, man, you're seeking a clue I'm the RG arrival, say goodbye to the crew 3.0, the new standard, the late you wait, we out I'm Gemini Creep

Share on XShare on LinkedIn

Related Episodes

Comments
?

No comments yet

Be the first to comment

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies