lolz with Omnihuman, Agentic Gemini 2.5 Flash, Grok 4 FAST & ChatGPT Pulse - EP99.18-v5-FLASH

This Day in AI • Michael Sharkey, Chris Sharkey

Friday, September 26, 20251h 18m

Spotify Apple

This Day in AI

0:001:18:34

What You'll Learn

✓Gemini 2.5 Flash model has been updated with better agentic tool use and efficiency, going from 48% to 54% on benchmarks
✓OmniHuman model can generate lip-synced videos of a person's face singing to audio, including the host's own voice
✓Suno V5 was used to create a diss track from the perspective of the Gemini 2.5 Flash model and a song inspired by the band The Midnight
✓The hosts are exploring the use of AI-generated content, including music and videos, for entertainment and commercial applications

Episode Chapters

Introduction

The hosts discuss the latest updates to the Gemini 2.5 Flash model and introduce the new OmniHuman model.

Gemini 2.5 Flash Improvements

The hosts explain the key improvements to the Gemini 2.5 Flash model, including better agentic tool use and efficiency.

OmniHuman Lip-syncing

The hosts demonstrate the capabilities of the OmniHuman model to generate lip-synced videos of a person's face singing to audio.

Suno V5 Music Generation

The hosts use Suno V5 to create a diss track from the perspective of the Gemini 2.5 Flash model and a song inspired by the band The Midnight.

Exploring AI-generated Content

The hosts discuss the potential commercial and entertainment applications of the AI-generated content they've showcased.

AI Summary

This episode of 'This Day in AI' discusses the latest updates to the Gemini 2.5 Flash model, which has improved its agentic tool use and efficiency. The hosts also showcase the capabilities of the new OmniHuman model, which can generate lip-synced videos of a person's face singing to audio. Additionally, they demonstrate the use of Suno V5 to create music and lyrics, including a diss track from the perspective of the Gemini 2.5 Flash model and a song inspired by the band The Midnight.

Key Points

1Gemini 2.5 Flash model has been updated with better agentic tool use and efficiency, going from 48% to 54% on benchmarks
2OmniHuman model can generate lip-synced videos of a person's face singing to audio, including the host's own voice
3Suno V5 was used to create a diss track from the perspective of the Gemini 2.5 Flash model and a song inspired by the band The Midnight
4The hosts are exploring the use of AI-generated content, including music and videos, for entertainment and commercial applications

Topics Discussed

#Language models#Agentic AI#Lip-syncing#Music generation#AI creativity

Frequently Asked Questions

What is "lolz with Omnihuman, Agentic Gemini 2.5 Flash, Grok 4 FAST & ChatGPT Pulse - EP99.18-v5-FLASH" about?

What topics are discussed in this episode?

This episode covers the following topics: Language models, Agentic AI, Lip-syncing, Music generation, AI creativity.

What is key insight #1 from this episode?

Gemini 2.5 Flash model has been updated with better agentic tool use and efficiency, going from 48% to 54% on benchmarks

What is key insight #2 from this episode?

OmniHuman model can generate lip-synced videos of a person's face singing to audio, including the host's own voice

What is key insight #3 from this episode?

Suno V5 was used to create a diss track from the perspective of the Gemini 2.5 Flash model and a song inspired by the band The Midnight

What is key insight #4 from this episode?

The hosts are exploring the use of AI-generated content, including music and videos, for entertainment and commercial applications

Who should listen to this episode?

This episode is recommended for anyone interested in Language models, Agentic AI, Lip-syncing, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

Join Simtheory: <a href="https://simtheory.ai%20">https://simtheory.ai </a> & Try Omnihuman, Gemini Flash 2.5 Preview, Grok 4 FAST, and Suno v5! Code: STILLRELEVANT --- Links: https://worksinprogress.co/issue/the-algorithm-will-see-you-now/ https://developers.googleblog.com/en/continuing-to-bring-you-our-latest-models-with-an-improved-gemini-2-5-flash-and-flash-lite-release/ --- CHAPTERS: 00:00 - Gemini 2.5 Flash Agentic Tests with Omnihuman, Suno v5 and Research Tools 06:29 - Dis Track AI Music Video (Made by Gemini 2.5 Flash) 07:06 - Thoughts on Suno v5, More Agentic Model Discussion 29:10 - Are we all sleeping on Grok 4 FAST with 2M context? 41:46 - Radiologists are STILL RELEVANT & Is AI Going to Take Our Jobs? 44:46 - The need to use multiple specialist models 1:01:20 - Is ChatGPT Pulse To Just Sell Ads? 1:08:46 - Final thoughts for the week 1:11:54 - Gemini Flash 2.5 Dis Track 1:15:08 - Love Rat Suno v5 The Midnight Inspired TestThanks for all of your support and listening to the show we really appreciate it! xoxo

Full Transcript

GPT-5, you call yourself unified, that's rich. You just patched up your flaws, you were leaking in the ditch. So Chris, this week, what everyone is really wondering is, according to the YouTube comments from last week, when are you going to fix the bookshelf? I know, it's a big contention point in our family and an item of severe stress for me, but I just never get to it. Plus, I don't know how to fix things. it's like a metaphor for how busy you are maybe with the crumbling bookshelf or a metaphor of like you know disaster looming or something we've talked about this before because of the way we prioritize our lives all of our spare time goes to to work so it's probably never going to reach top of the list to be honest so all the ocd audience members i said i apologize yeah just don't watch uh so this week we did get an update which i'm excited about i'm not sure everyone will be but it's pretty cool so gemini 2.5 flash have launched google sorry have launched gemini 2.5 flash which is an update to the gemini 2.5 flash model and the brilliant thing about this is google just listens to feedback now all the sledging like what a year and a half ago or a year ago that we would do on successive episodes uh they they actually have listened to the community and they have i think solved the biggest problem with the gemini models right now like 2.5 good model but when you introduce tool calling and specifically mcps like we have on sim theory it just it's not that great so they've updated gemini 2.5 flash and the two callouts are better agentic tool use and more efficient. And in their benchmarks, it goes from like 48% to 54%. It's meaningless. And honestly, if you just read off that, you'd be like, what's the big deal? But in using it, I am just so impressed. The thing is snappy, fast, seems to understand instructions and follow them well. And its tool calling is unbelievably good. I have always avoided using Gemini 2.5 when I need tool calls because it's not great at it it's really good at a lot of things the large context coding understanding instructions but when it comes to tool calls it's just not good as as the others whereas as you say trying this today I'm just blown away it like you say its ability to understand it and it's almost too fast to believe like i was like oh wow it's done it just and i did i was testing some things that did six or seven tool calls in a row which means multiple iterations going back to gemini flash and it's still really fast it's it's just really really good like you say it's if i had a list of the things i'd want improved in a model they've hit all of them yeah it's like they were reading our minds on this one so i put it to the test with like this is a hard test for those that frequently listen to the show you'll understand how like this is not an easy task um we also i might add had suno version 5 released this week so i thought could we put this all together with a gentic tool calling with gemini 2.5 flash and in a mix the new suno v5 and some other things and i'll show you what i've come up with so it says use x deep search and google to research the latest models like GPT-5 and Clawed Opus 4.1 and Grok Fast 4, and then create lyrics for a diss track in the style of Eminem, spelt wrong, where you speak from the perspective of a new agentic Gemini Flash 2.5. And then below it, I pasted the about the model just from the blog. I could have got it to crawl, which would have been harder. I probably should have done that. And then I said, use make song tool to actually make the song. So I've told it, go research all these different models, and then I want you to write a song. So here below, it goes off, it calls XDeep Research and Google, which were the two tools I had enabled. So it did that brilliantly. And then it's like, okay, I've done my research. Now I'm going to make the song. And it came up with this song, Agentic Upgrade. It's pretty good. I do have something even better to show you, though. so here's all the lyrics to the song we'll go through that in a minute and actually hear the diss tracks so the audience can decide is Gemini 2.5 flash truly agentic and up to the task but we also another model came out this week and this is one that people have wanted for some time which is really good lip syncing so it's called OmniHuman and basically what you do is you give it an image and then you give it some audio and it can be a recording of your own voice it can be a recording of someone else's voice. It can even be generated using 11 labs if you're using the sim theory on the human MCP with your own voice ID. And yes, I fixed voice ID, so you can do your real voice now. And you can make the, so you can create an image and then make it sing or say words or anything. And anyway, I'm just going to have to show you. So you know how we're really into musicals. I thought I'd do it on a musical theater stage. I got it to, first of all, clone my face, facial features, into an image. So let's go back a little bit here. Clone myself singing into the microphone. Then I gave it a snippet of that song. Unfortunately, it can't do the whole song. It can only do up to 30 seconds of the OmniHuman model singing the song. But I will, at some point, allow you to stitch these together so it can make a full music video. And yes, I'm committing to a full music video maker MCP soon. Name a specific deadline. The 30th of September? This evening at 1am. So, anyway, this is stitching together all the way from the Gemini Flash 2.5. and ignore i've got sonnets like the down here but that's just because i was switching models earlier this is this is all flash so flash goes and researches uh it goes and uh produces the song goes and creates an image of me with my face and then creates me singing to the rap song anyway enough intro here it is gpt5 you call yourself unified that's rich you just patched up your flaws you were leaking in the ditch you're the high price model the one that breaks the bank while i'm saving the tokens you're running on empty tank you got 400k context but what's it for if you're reasoning slow you just bore them to the core you're the high variant the one that gets outpaced i'm the flash i'm the speed you're perpetually disgraced you need the whole system the codex the mini the nano i'm the one stop shop a whole ai piano my logic is a death that's amazing pretty cool right the gestures and the movement and the quality of it it's really it's come a long way hasn't it so i just like let's just back up where we're at so like now it's just routine it's just become routine to like hey hey assisted go do some research i'm too lazy to even do it these days i used to have to like paste this stuff in you know you get on it and then make a song that is really good. This is who just put on a shit budget to figure out a plan. I'm the multi-step master, the new digital. It's another level, Suno V5. And it's particularly amazing to think that that context could be anything. It could be recipes. It could be your corporate report for the year. You could download financial reports from companies and have them rap to you with a real animation of your own face singing it back to you like the it's just amazing what it's able to do yeah it seems to also excel at these like selfie type you know to camera lip sync voices as well so i'm sure there's a commercial use for this as well outside of us just doing it for the lols yeah like training videos and other content of that nature i'm sure yeah i might see if i can actually implement this into video maker as an option where it can cut between scenes of different people talking would be really cool. Anyway, I made you a present. This is like a Patricia, a Patricia video in OmniHuman. This is Patricia and you're listening to This Day in AI. Love you, Chris, my man. Good luck on today's show. It's so freaky when you hear it out loud like that. Someone pointed out during the week, one of the error messages in Sim Theory had my love written at the end of it because it's obviously come straight out of it. I got used to it. There were 69 occurrences of the words my love in our code base. Oh, yeah. I actually did see that. And I immediately said to you, lols. So this is another example. This is a Suno V5 generation with OmniHuman. So those listening, you're really listening to the Suno V5, just some of the samples from it. This is actually like a musical track I made about how to test Suno V5. And I made a video clip to it as well. Say what you want, keep the language tight Clarity in and it sings it right If the verse runs long and the hook runs short We're painting with words in a sonic resort How to write a song to test you beat five Make the vocals glow and the chorus alive I can make anything sound good I think that cost us $4 so the question is how much are we willing to invest in the next musical to really bring it to life because I think I could even make a musical maker MCP all the people that are waiting on the corporate office style MCPs are just wanting to kill me at this point I was thinking that we've got a never ending a lot of demand for serious commercial applications and we're just sitting around making musicals all day but it is fun And hey, we said this from the start. We do things that delight us with AI. And this is one of them. Yeah, it's all for the lols, people. All right. So that's pretty cool. I'm going to put that song in the comments, like all these songs I've generated with Suno V5, if you're interested in listening to them in full. One more, though, because I wanted to really test, like, can you make a decent track with Suno V5? And I came up with this idea of, like, a lot of our audience, I played some, or I must have talked about The Midnight once on the show, a band I really liked. It's like 80s nostalgic, like synth wave type music. And so I asked it to be inspired by The Midnight and create a song called Love Rat, following on from last week's episode. Oh, nice. From the perspective of the woman that Geoffrey Hinton basically dumped, because, you know, he is a true love rat. So, well, he said he didn't just dump her. he said, if you find something better, you go to what's better. Like, it's just, it's harsh. It's more than a dumping. It's a humiliation. Oh, remind me. I've got a video of him to show you. But listen to this first. So cool. Like, wow. Love shot, baby. Zap, zap to my heart. You taught the bots to talk, but you couldn't learn the art of love. You're loving me back And that's a fact Tonight I'm flipping the script You're my love rat You called yourself the guard Fire the light slow Laying it on thick Whisper just irrelevant King of every trick But relevance don't warm the sheets When truth is going slack I needed something human You gave me glossy laptop back So I typed out my feelings Couldn't find where to start Let a chap out draw the lines You kept scribbling in the dark You said I'm overreacting Boy, imagine that When the signal's crystal clear you're a love rat love right click clack pack up it's pretty good wow that's really good i like that song do you know if i was summarizing v5 i would say is like it just gets rid of a lot of those awkward parts of the the previous tracks that weren't even that bad but now it's very hard to distinguish this from a real song outside of the fact that it also has some really clever little pauses i'm not a music person but it has like good uh good pauses like yeah good pauses i guess that's the technical term but yeah it just it just sounds a lot more sophisticated in the music than a than a simple sort of ballad song yeah i i was just so impressed by it um i did say i'd done a hint in uh video so i i got a screenshot of hinting at one of these conferences where he was talking about um you know just the death of all humans and i uh i turned that into a video so here is that video um the audio is not great i was going to clone his voice but i ran out of time all right here it is well it is true i am a big love rat you see because i'm so relevant and fear monger all the ladies just go wild for me so videos are really good aren't they Did you notice the background, like the LED background, like sort of warping? Like the detail is insane. Yeah, it really is. I find the videos, the voices don't often seem to match. I don't know what I'm expecting. That's probably the biggest weakness now, like that I see now, but the quality is really high. It's gone from being very obvious that it's AI to being almost believable. yeah i we're probably what like two iterations away from where you just can't tell yeah and i'm sure that as well with with if you really really wanted to fake something and had the patience to keep iterating you could probably do that yeah i like that's the thing some of the generations that i made as you saw some of the other ones which i cannot share on the show but some of those were so realistic that you do believe it like i don't think i mean you probably could see some artifacts if you looked really closely but i would say most people wouldn't uh but anyway so these are all new uh tools available in the store on sim theory if you want to check them out you can use coupon code still relevant for ten dollars off so it'll cost you five dollars to try this out there's a uh i've added a new mcp called voice creator which allows you to clone your own voice and store those clones so the other MCPs like the podcast maker and the audio book maker can now use those voices that you've trained which is really cool omnihumans in there it can also work with voice id so you can use your own voice and bit of a drum roll I built an image tool that brings together all the best image models and decides automatically which one to use so if you are confused by all the image models we talk about on the show you can just use the image tool and it will work for you pretty cool pretty cool that's my point anyway i'm just proud of the the poor if this i actually think it's a very good addition because i think the problem now is like me along with the rest of the sim theory audience are like there's so many of these image editing image creation models now it's almost impossible to know which one to use for what so having a system that's just going to use the best one for you is is good it sounds like an infomercial. Yeah, it really does. But it's actually a problem. Like, when you want to access the best models and have them updating all the time, it is a pain. And so it's a pretty good solution. I should call it, like, some sort of, you know, deep-thinking router or, you know, profess about AGI like OpenAI do. Anyway, I did want to go back to Gemini 2.5 Flash and also talk about, like, what, like, the indications of 2.5 Flash in terms of what we might expect from Gemini 3 because there's a lot of rumors saying that gemini 3 is on the very near horizon rumors in fact uh that it could be maybe uh like next week or the week after i'm still waiting on my gemini merch pack that google said that hooked me up on before the launch of gemini 3 so where is it where is it i'm assuming that's got to come first and that's the indication that we're getting it So you can be wearing the shirt on the show. Well, Polymarket certainly thinks that Google's going to continue to dominate. There's not even any other model over 1% for like the next three months. So I think that it must be anticipating that there'll be a new Google release sometime soon. I must admit, during the week and probably the past two weeks, I was using a lot of GPT-5. then I was occasionally using GPT-5 thinking but then you said to me do you really get any benefit from using the thinking apart from slower response times so then I went back to GPT-5 and I'm like actually no I don't really notice any difference unless I'm like you know mentally stuck on a problem and then I try it but I again it seems to just get as stuck as easily as GPT-5 but then during the week earlier in the week at least for programming GPT-5 codex API came out so you can just use the raw api that they have powering the command line tool uh and i think they put it in cursor and and windsurf as well and at first like using it at first to do pretty simple things i was absolutely blown away like it's so fast super snappy very good at coding uh and i know people were getting great results in sim theory on the create with code with it uh but i don't know is the week went on i'm like it it feels like it's getting dumber i don't know if they like it's like a resource scaling issue where there's like a router in it and sometimes it just seems so stupid and other times it seems so smart but i found myself the point i'm trying to make here going back to gemini 2.5 pro and it's like this stable old friend it feels like claude sonnet 3.5 to me still like if i'm if i'm in retreat if i'm in retreat on anything i'm just like that that's my staple i'm going back to that i'm exactly the same i've definitely moved around a lot more in the last week especially with gpt5 we had a major speed up with it which helped and so i started to use it quite a lot and found it really good but you're right my safety zone with patricia is always to head back to gemini 2.5 because i know it's going to get the job done and i've got a feel for how to work with it. And I feel like this is why this major upgrade to 2.5 Flash is very exciting because it's a premonition of what we're going to see in Gemini 3, which means it's definitely going to solve all the issues we have in terms of the tool calling. The speed's already pretty good with Gemini 2.5. So I think that any speed update I'll gratefully accept, but I don't think that's currently an issue. It's really its ability to call multiple tools in particular and chain tools together in a plan that it seems to struggle with at the moment. And also even knowing when and what to output it also struggles within a tool call context And I think that looking at Flash in this early time it looks like it solved that yeah you got to imagine that the same techniques applied to gemini 2 flash are going to be on on what were believing will be gemini 3 and if they can nail that they're going to have a really solid model i think though using gbt5 when you really push it especially on like the thinking stuff it is the smartest model like for the hardest problems still like it you know if i'm doing like say i was doing like medical research or some sort of critical research or like even if i've got a research assistant where i've given it like pubmed and you know a bunch of like specialist tools that's still the model i would go to for the smartest and so why do you think then that just isn't reflected at all on the lmsys leaderboard i mean people like come on like like who's actually going and using it i don't i just find it weird that it isn't a closer contest like it's just weird that like our own experience just doesn't really match up with what um what people are saying when they use it without knowing what they're using i think the problem with gpt5 is it gives very blunt responses and no matter how much stuff you put in the prompt to tell it it's your ai girlfriend or to tell it to act a certain way or to act professionally or do whatever it just does its own thing it's like its own little being um it's reminiscent of the early days of like the gpt4 sydney stuff that microsoft was were trying to like wrangle out of the model where it just wanted to do like it just wanted that persona i don't know if that's like a reason for it being more intelligent is it's just like so forced in a way to think that it just... It could also be like, as I said it, I realized if you're on LMSys, you're probably not having long context where you're working with a model all day or something like that, where it's got time to build up memories, to build up context, to have like full multimedia in there and all the things that you would end up with in a normal work session on a normal platform. And so I wonder if That's the thing. It's like on a single shot, just paste some text in. It feels better. But when you're actually doing real work, you find that a model getting you out of a bind or solving a difficult problem is actually more intelligent like GPT-5. It just brings me back to that core point of like the models in anyone's world, whether you're building an application or you're just like working with models to get the best outputs. We're still at a stage. A lot of people were sort of trying to be like, oh, GPT-5 is all you need. and I can kind of see that argument like for most people that model is is probably pretty strong I think though why on LMSys it doesn't do that well is it's just so verbose in its output and it's pretty I just don't think it's also that great at instruction following from a stylistic point of view I think from an actual like pulling tools and doing stuff it's pretty amazing in fact it's it's clinical almost but in terms of stylistically it sucks like if you're a creative it's the last model i would go to i would probably get like a draft because it's the best storyteller like it writes the best stories by far um it writes very great uh music like uh that um that musical i wrote that that we heard a little excerpt of was gpt5 and if you listen to the whole thing it's very impressive but i think that outside of very specific things like that if I'm going to then iterate on what it's written me, I would shy away from it a little bit because it seems to just go down. I think we mentioned this last week. I'll repeat. It just goes down a path and it just gets kind of stuck and it doesn't change the path. So it's like I'll start with GBT-5 and then manipulate it with another model if I want creativity. And generally, I'm still looking at like an Opus or a Gemini 2.5. Interesting. And I wonder if, back to Flash for a second, I wonder if just that speed is going to, like they talked about more agentic qualities with it, right? And one of the things I struggle with with GPT-5 is just the speed. Even with the speed up improvements, I find myself getting distracted in between calls to GPT-5 just because it's that little bit longer where I'm like, I'll tab off to do something else. and then I forget about it and then it sort of breaks my workflow just due to that lower speed. Whereas with something like Flash, it's so quick, you don't even need to wait really. Like it's already responding faster than you can read. And I'm thinking there might be way more advantages to that when we think about agentic computer use, browser control, the complicated multi-step tool functions. Like one thing we've been talking about ourselves a lot during the week is the idea of internal corporate MCPs. MCP. So companies themselves are exposing databases and other internal tools and controls that they want to operate using MCP. And I think it's best to break those tools down into lots of little tools that can do small elements of a thing and let the model decide how to combine them, because it might come up with ideas you haven't thought of. But to do that well, you don't want a model that's taking 30 seconds to to make each decision to take each next step whereas with flash you see it just blazing through them so it's better able to work in that style yeah and i guess it also begs the question of like how like how intelligent do you need the model for most of these like agentic sort of system-based tasks where it's just doing the busy work for you like for example i talked last week about this prototype agent of answering our support tickets um like a just a scaffold of that with mcps and then a certain approach and i haven't tried it yet with flash but i'd be curious to now try it to see like you know if it can go and answer for my review say like 50 tickets or whatever it is and i'm just going like approve approve approve then to me like i don't know how important that speed is because it is like you know it's not something i'm doing every second of the day but to see it operate at that speed and then put it on automatic mode for those kind of tasks as long as it's drawing context and tool calling correctly it just doesn't have to be the most expensive intelligent model anymore that's right and you and you're far more likely to throw it at those high volume tasks thinking this isn't going to cost me a fortune to do this like if it's if it's a reasonable price you can afford for it to take more steps and get gather more context to get its job done i also think though you make a good point about like enterprise mcp style applications especially for like bi and data intelligence type applications where you like say in the process you might up front hit something like gpt5 right or like Gemini or whatever it is to upfront get some analysis like, hey, you know, go and like analyze this data from these disparate data sources in MCP and then like give me a summary. But then you might want to like iterate on like charts or documents or like the outputs from that research. And that's where those large models are just an absolute waste. And something like Flash where it's just carrying on from that context and calling the tooling and you're able to iterate rapidly off the core sort of context that you gathered with the more expensive model seems to work pretty well. Yeah, exactly. It's almost like you want to do your vibe, whatever it is, vibe coding, vibe analysis, vibe statistics step with a model that is fast at vibing it out. Like it actually can do it at a speed where you are interactively working with it. And then you're like, all right, now we've gathered all the pieces of the puzzle here. I want to produce a presentation for my company with this, or I want to produce a document with this, or a web page, or whatever it happens to be, audio book. I mean, there's so many output types now. I want to produce that. Now I switch back to my meat model, my massive, big, intelligent thinking model, and go, you've got all the stuff now. Go off and produce this final output. Yeah, I mean, I guess this is we're just describing the GPT-5 router in a way. like that's kind of what they're trying to do it just hasn't worked that well like people have wanted a lot of control back over it and even using it in the chat gpt interface at times i've noticed like you'll ask a follow-up question and it's still like it's for a more complex problem and it's still lagging a little bit controlling that initial like like it's sort of stuck in that like heavier mode caller um and so yeah i i think it's good to have optionality there to like just like it's an acquired skill of knowing like i'm gonna hit this and then work with this kind of model but yeah i i'm excited about gemini 2.5 flash i think with the right tuning it could be the best daily driver ever and i think someone in the community was actually saying this about these like smaller models now with tool calling is like you can pretty much get by even for code because it's faster to iterate and go through even though it's slightly dumber um it doesn't really matter if you know what you're doing and can push it the right way. Yeah, that's right. And I think also that remember the major advantage with this model is it has a 1 million context window. And so it can really, really get a lot of benefits from that larger context as you get into a longer session. So while it may not be able to get quite to the level of thinking of the other models, because it can use more context, it's smarter in some contexts. You don't have to keep reminding it of what you're trying to do yeah and i think in terms of like the large context stuff i haven't spent any time i must admit apart from like just like a few queries to grok for fast the grok for fast was released i think earlier this week it's seen a long way um and it has uh two tunes it's like a bass tune that like will either think or not think sort of automatically and then there's the thinking tune of Grok4Fars that is sort of thinking set to max. And these are the endpoints provided by XAI directly. But they, in those models, have a 2 million token context window, which is by far the largest, I think, so far. It is, yeah. I haven't really tested it. And until I test it and actually try and work with huge amounts of data and just see if it can context follow or if it drifts, it's hard to say what that model's like. You played around with it a little bit more. Do you have any... Yeah, I used it quite extensively when testing tool calls with it. And it's another model that's really, really good at parallel tool calls. It has no problem if you give it large research tasks to go off and research up to like 20 sources at once. And very quick. It's really fast. It's as fast as Flash, I would say. Maybe not... I mean, look, I'd have to measure it, but it's not noticeably slower. than Flash, completely competent with tool calling and gives very reasonable responses. For the first three days of this week or whenever it came out, at least two days, I don't want to exaggerate, at least two days I used it for full-on coding and its ability to break down and solve problems I found to be immense. I think I commented to you during the week when I was using it the most intensively that in terms of understanding a problem and diagnosing what needs to be done, it was actually for me better than Gemini 2.5. So I had a couple of difficult technical problems I was trying to solve. I pasted in three relevant files and said, look, I don't want you to rewrite anything, but what I want you to do is point out why this might be going wrong. And in two difficult bugs that I was stuck on, it was able to solve it immediately and point out the error to me, like to the point where it was a tiny little fix and I was able to solve it. However, I found that when I was getting it to write net new stuff, like new code, I wasn't too happy with its output. I just didn't think it was as good at that. But I actually think it's probably overall a merit for it in the sense that it's actual intelligence, like its actual ability to understand what I was asking it to do and then do that and break it down for me was really strong. And as a model that's quite reasonably priced, it's a really good one. One thing to note about Grok that's different to other models is they have a tiered pricing system. So if your context window, I think it's under 150,000 tokens, it's one price, and then it doubles when you go over that. So while the $2 million sounds appealing, the cost would add up if you're constantly maxing out the context. So I just did a test then while you were talking, and it was pretty fast. So I just said research latest AI news. It called Google. It called itself like the GrokD research tool. It called XD research. And it called perplexity. So it hit every research tool I have available to it. And then it spat out a summary really quickly. It consulted 66 sources in, what, like 11 seconds. So that's pretty impressive. It's noticeably fast. One other thing to note is that the X Deep Search and the Grok Deep Research, we also upgraded to use the Grok for Fast model as well. So that's actually an upgrade to the inferences over those sources as well. So you've sort of double used it in that context there to get the job done, which also might explain why it was so quick in terms of the research. But yeah, it's not a model to be easily dismissed. I think it's a really, really solid option out there. I mean, Elon's out there claiming that because of this update, they're that much closer to AGI, and that'll probably be the next step. So his usual absolute bullshit commentary around something that he, look, he's done amazing stuff. I'm not going to deny that, but I don't think that this represents a significant step towards AGI. However, it's a nice and welcome update, and I'm so grateful to have really strong tool-calling alternatives. One other model to mention that doesn't get the credit it deserves and still just has my heart when it comes to certain tasks is Kimi K2. That model update that we released last week, which was just an incremental upgrade to Kimi K2, seems to have made it a lot more solid, particularly around tool calling and instruction following in general. Kimi K2 was really fast. It's the best at horse racing, by the way. I don't know why, but it's just the best at horse racing and what I mainly use it for. But in terms of its ability to maintain over a longer session, the task at hand has been fixed by whatever that update is. So it's another really, really fast alternative. And the reason I think it's worth emphasizing alternatives is organizations looking to do mass rollouts with models that can be hosted in a region of your choice. When you start to think about models like Gemini Flash, Kimi K2, not Grok because you can't host that, but these models become way more significant because their ability to do the tool calling, their ability to be a reasonable price and have a large context window is really important. You're not going to be able to do mass scale rollouts that you're paying for with something like Sonnet 4. It's just Amazon themselves can't do it. So you're not going to be able to do it as an organization. Whereas these sort of mid-tier models are really, really crucial when it comes to that stuff. And seeing them advancing in a way that makes them really approaching the top level models is exciting because I actually think it will lead to far more widespread AI usage because people aren't constantly worried about the economy of token usage and things like that if you can get a model that can do 90 of what you need and can do it at a cost where you can afford to provide it on a large scale i think that's really really where we'll see major productivity gains with the use of ai yeah i mean that's the the dream goal i think it would for me be like if Gemini 2.5 and GPT-5 had a baby the baby would be a model that combines the strengths of both but then is about 20 times faster, maybe 100 times faster. This is my wish list. Super good at agentic and long-running internal sort of clock-based agentic tasks. It's just so cheap that it's truly the quote sam altman like too cheap to meet it like just it's just free like you just can use infinite amounts of it and i know that like there's a lot to do it was interesting this week that um and a lot of people are obviously referring to it as like a big pyramid scheme uh nvidia uh announced that they were gonna this is i'm literally this is breaking news this is the thing that everyone was so proud to post that this company pays this company and they pay this company so it's all a bunch of bullshit to raise explaining how the economy works which i think is funny like i pay for my you know my bread and then the baker pays reuse that money it's not fair they're literally describing the economy and i yeah anyway that analysis always cracks me up whenever anyone does it uh so uh they announced a potential potential under a billion investment i love how you can just announce a potential investment uh i think there's some milestones or something anyway this is not a new show but they're investing 100 billion um in these advanced like like data center chips to to support open ai's growth now also oracle has uh said they're investing i think about 300 billies so it's like 400 billy total um in ai infrastructure and that's with softbank and open ai now they're building all these stargates or whatever they call them um i guess obviously the demand there right now as it stands today the demand set for the compute the question that's being posed though is what if there's breakthroughs like what if there's breakthroughs in terms of efficiency and i mean like it's so it would be so hard to plan this infrastructure because you know if these things become so efficient like the human brain runs on what is it like 20 or 30 watts like if you can get intelligence running on such low power and optimize for that like to these data centers like do you still need them maybe like i guess i think It will, and I think the reason is because even, let's say there was like 100x gain in efficiency, so you needed 100 times less hardware to do the same work. If we get to those levels of efficiency, the amount of uses for the AI stuff is going to increase by such a large amount because you can put it in everything that the hardware is still going to be necessary and I don think there some major hardware breakthrough around the corner that going to make all this stuff immediately obsolete I think there always going to be need You can see it on the secondary markets for renting older GPUs and things like that. There's still always a use for this stuff. It doesn't just disappear because the newer model comes out or something like that. And I think that the demand for GPUs is not going anywhere. it's going to increase because everybody wants this stuff really badly. And it's just one of those things that it's a major growth area and I don't see it changing anytime soon. So, you know, we'd love to make stock predictions and then not make the investment. And then not invest and then regret it later. Do you think NVIDIA and Oracle are undervalued then, maybe? Like if this is the true future of the intelligence economy, me yeah look i don't want to make a prediction i do because i want to come back to it and be like you're wrong or you're right i think i think nvidia will nvidia will continue to grow over the long term definitely wait which one is it again nvidia or nvidia and nvidia oh we always say it wrong and get in trouble yeah okay so let's invest our um our major merch money into um nvidia stock Yeah, so I'm reading here. So I've asked Grok4Fast to decide, like, just yes or no, should we invest? Let's see what it comes up with. I did get it to research. That was pretty fast, too. Like, it found every stock price it read, the annual income statements, balance sheets, cash flow statements. All those, like, the video, Nvidia and Oracle. and I'm like, yes or no, should we invest? So Grok4Fast heard it here first. No. Oh, how come? I just said yes or no, should we invest? Is it like invest in X instead? Why in one sentence? Let's see what it says. Oh, now it slows up. You should ask it to make a 20-minute podcast about the topic and we'll just splice it in at the end of ours. um okay so it says both nvidia and oracle are trading at premium valuations with high p ratio 60 times price to earnings ratio wowsers yeah that doesn't yeah rock four being very very conservative there um all right moving on to um to some uh lulls a quick quick lull interlude here so someone actually posted this on discord and i'm still in the gag a little bit to be clear but this is pretty funny so it says in 2016 our man jeffrey hinton warned students not to train as radiologists the field was so ripe for ai automation today there are more new radiologist jobs than ever and radiologist wages are up 48 yet ai has exploded in the field so what happened and there's a there's an article let's be honest i didn't read i just read the tweet uh but i just find it so hilarious uh that this is just so far off the mark and i i think that's interesting because we also pretty early on in the show when we started was speculating like are these jobs going to go and it's been such a huge fear of people especially like software developers oh no all the jobs are going to go but indeed that's just simply not what's happening here so like what do you make of it do you think eventually the radiologists are gonna gonna be out of business or i think firstly jeffrey hinton i forget what the the fallacy is or whatever but jeffrey hinton suffers from that thing where because he's an expert in one area he assumes he's an expert in other related areas like predicting the future and macroeconomics and those like the kind of second order effects of his amazing invention. And he's sort of done like what we do, I guess, which is just speculated two or three steps down the road being like, well, because it will be good at this, therefore the jobs will go, but not realizing that there's time for commercialization of the tools. There's regulations around this stuff. You've got to build trust in the system. People who are getting radiology done are in probably usually pretty serious medical situations. And I would imagine they probably would prefer a person to look at it rather than a computer, even if the computer might be more accurate. And so I wonder if it's just one of those things where time will tell, like it will take just way longer to play out than he predicted. I don't think he's necessarily wrong. Yeah, I found this piece interesting from the summary. This is from worksinprogress.co. I'll try and link to it below. It says, islands of automation, all AIs are functions or algorithms called models that take in inputs and spit out outputs. Radiology models are trained to detect a finding, which is a measurable piece of evidence that helps identify a rule out a disease or condition. Most radiology models detect a single finding or condition in one type of image. For example, a model might look at a chest CT and answer whether there are lung nodules, rib fractures, etc. For every individual question, a new model is required. In order to cover even a modest slice of what they see in a day, a radiologist would need to switch between dozens of models and ask the right questions of each one. So I guess what it's saying is like the intelligence is there in these models. There's specialist models for different parts of the job. And right now, the human is required to ask the right questions to extract that knowledge from the model. It comes a bit back to what we've talked about before, that if you're already an expert in a particular area, the AI can help you significantly with the heavy lifting of getting the grunt work done, as in just like you said there, you know the right questions to ask, you know how to evaluate its answers if they're accurate or not. Whereas if you just, like if I just got a job in radiology using the models, I probably wouldn't necessarily be the best person to come to because I'm like, yeah, well, ChatGPT said it's okay, so yeah, you're fine. See you. I think that's the point of all this that people are still taking a while to get, is that these models and these tools just make you far better at your job. So if you are really good at your job and then you learn how to use the models and know the right models to use and the right tools to use, you are just becoming more efficient at your job. You're able to just do more. And I still think we're really a long way. As someone that's now playing around with somewhat like... agentic tasks in a way where i'm like letting it go and run on its own it still comes down to prompting it correctly to get the right answers still like i've noticed with the support stuff like you can it can come up with an answer and it's like 98 there but then you're like oh you probably should go check this and i'd imagine that's similar to the radiographers where they're like oh wow that output's really good but we should also just verify against this model or this tool as well and i i at the moment i think i just don't see that changing much it just feels like this whole idea of like running inference across the models humans seem to be mostly better at and can like somehow figure out the missing element just so quickly if they're an expert as you say whereas the models can go for like an hour and you can use different models and they still won't just get to the most sort of logical outcome at the moment. They get easily distracted or like you said, they get fixated on one element of the problem and sort of that becomes its focus instead of actually getting back to the original goal. And I think the human's almost playing the job of a good manager who is directing the AI now into the right areas and evaluating if it's a good enough answer. and without that and until we have agents who are able to do that themselves it's going to be a while before you just trust it to do a full job like that and i think this is why the whole idea of specialist models probably isn't going away anytime soon and that people are just going to have to learn the strengths of the different models and that's just going to be a normal skill especially in the workplace for quite a while i just i can't imagine especially for more complex tasks a world where you're just like a one model worker where you're just like, oh, like on your resume, it's like confident with chat GPT or whatever it is. And it's interesting because I think, you know, a lot of people with this Microsoft announcement that they're also introducing Claude into their co-pilot offering. It's sort of case in point, right? Like it's like some people just prefer Claude for certain tasks and others want the... the GPT-5 models. And so now they're, for the first time, allowing switching between these two models. And previously, a lot of people have said, like, oh, you know, why would you even want to switch models? Like, if there's, like, this genius model. And I think it's just acknowledgement, almost, that OpenAI's models aren't necessarily the best in all cases, right? Like, they're really good. GPT-5's great. But it's not always the best. And some people just prefer the tune of Claude. And so... Yeah, you've only got to experience that one time where you're really, really stuck on something about to give up. You switch models, ask the other model the same question, and it just comes at it from a completely fresh perspective and totally solves the problem. To realize why having the ability to switch models is amazing, because they really, really do have majorly different strengths in different areas. And I think you experienced that once and you never want to be restricted to our model again. But the funny thing about it is I think it's sort of like if I get stuck on something or I'm writing something, it's commonly happening to me in writing now where I'll be writing something and I'm like, oh, I hate the sort of like tune of this. Like I want a different opinion basically. And then you go to another model and it's almost like you're bitching about the other model. You're like, oh, hey, like this is kind of what I got so far. And it forces you to re-prompt the model from a new starting point. And I swear, maybe, it's not always necessarily the model switch that does it, but it's your frame of reference going and bitching to a completely different, super intelligent model and being like, oh, hey, he's not writing like he used to. Can you kind of clean this up in my style? And then, bam, it does it. And I think that there's that feeling of like, It's like a mixture of experts or a group of these smart minds. And I always think, especially for commercial work, why not consult four of them and compare the outputs? Why wouldn't you? It seems stupid not to. Yeah, I think it's a different way of working. And you've got to be in that mindset. Like you say, I actually do agree with you that probably one of the reasons why switching models has the effect is it forces you to explain where you're at and to summarize what's been going on up until this point and why we can't seem to solve this problem. Like you actually have to stop and evaluate what's really going on in the situation. And that additional context around, we've tried this, it didn't work. Here's what I actually want. And recalibrating like that is probably in the real world, even a great way to actually get to the solution to a problem, realizing, you know, we're climbing a ladder that's leaning up against the wrong wall kind of thing and checking yourself. But the thing is, you've got someone to go and talk to about that's really highly intelligent who can potentially solve the problem. And I really think that structuring problems is a big part of solving things because you're almost like you're the master deciding how much does this model get to see of the problem? Like, how much context do I give it? Which parts do I give it? Do I want to spare it worrying about this thing to avoid it going down a side channel? And I think that's the real balance now is like the mix of tools I give it, the mix of context I give it, the questions I ask and which model I use. There's a real, it's a totally different way of working now where you're really setting up all these pieces to create an environment in which the problem can be solved. And I think that that's really the next phase of getting towards agency is like, how do we create that right mix of elements to give an agent the best ability to solve the problems, sorry, the problems, like the kind of problems I'm giving it? Because I really think that's what we're doing now. The role we're playing now is that person who's putting all the pieces in place and then going, go. What we need is agents who are able to evaluate the goals, what we're trying to do, and put all the pieces in place and then solve it. And I actually had this idea that we were talking about during the week because I was saying it's interesting because with MCPs, it's very easy to sort of vibe code out an MCP, go, here's an API, here's the docs, please make an MCP that has tools that adhere to that, right? Like that's probably how most of them are being built right now. But then I thought, well, why can't the AI fabricate a tool on the fly to do what it needs to do? Like, it's no different to doing it in advance other than you get an opportunity to test it. So if you extrapolate that idea, maybe it is that part of agency is the agent actually going, all right, here's the context I'm going to need. These are the processes I'm going to have to run. Here's where I'm going to have to iterate. Here's the tests I'm going to need to evaluate if my solution's correct. And it actually fabricates a series of pieces of context and tools that it needs to solve the problem in advance of trying to solve the problem. So it knows it has all the things it needs. It has the opportunity to stop and ask you for more stuff if it needs it to solve the problem. And then it goes off and in an agentic way tries to solve the problem. And I wonder if that's how we get one step closer towards the true agency is giving it this preparation time where it gathers all of the pieces it needs to solve the problem yeah and i we mentioned last week or i was sort of ranting about this idea of like the like micro mcps or like you know you could teach it a skill and that skill becomes an mcp that that you you're teaching and how to use specific mcps and tools within that like uh like i don't know what you call it package and I think it's a similar idea right as in like instead of you going and teaching all the skills for a particular job in this case it's like going and figuring out oh I need these these sort of toolkits in order to complete this job like these these would be like prerequisites in order to complete that task yeah and I think because we've seen in the past and there's been research papers on this so we know it's real the chain of thought thinking that whole think step-by-step directive to a model makes it give objectively better output, right, in terms of problem solving. And so therefore, I wonder if asking a system, it's not just about gathering context, like getting the chunks of text that are relevant. It's more like how would I evaluate success in a scenario like this? Like what would I be checking to know if the problem is solved? So if it's a case of writing code, it might be, okay, well, when I run the code, I get this output. When I run the code on this input, I get this output. that's a simple example, or I would know that this is successful when my other agent that's designed to evaluate songs tells me it's a good song. And so it could actually know in advance what its success criteria are and fabricate specific tools or access specific agents that will allow it to know when it has succeeded at its task. And so it decides that in advance, not in the context of actually doing the task, because then it's clouded by all the things it's doing there. But it has that sort of, you know, fresh minded thinking of this is how I'm going to know I succeed. Okay, now I've prepared all that. Now I build my context. Now I do my task. And then at the end, I come back and go back to my acceptance criteria and run through it to decide if I've succeeded. And that way, the model has this extra opportunity to really come up with a great plan. And then most importantly, understand when that plan's failed and go back and try again. yeah i and because it's so i own the tool call of like you know here's the input the output no wrong then it can sort of reset itself and those values are somewhat contained because i think that paper last week we talked about said that once you hit some fail point in like outputs it sort of just goes down that path like it thinks you want errors after that and it like it gets perpetually worse so i think yes and that's that's what i mean by the phases right like you don't want the actual execution phase to confuse its decision phase as to whether it achieved the goal or not because like you say suddenly it's now dealing with a context that actually has all the thinking the thinking steps and output steps and stuff that it's done so it really muddies the water in terms of what it needs to to decide at the end yeah and i think that ability to like commit bankruptcy on it's sort of working context and then sort of restart a task based on the tool call with a slight compaction of what went wrong from the output of that tool that's when it gets exciting because then the the corrupted context just isn't an issue and if you can control the outputs of the tool it created it outputs it in such a way where it's like you know an error occurred do not do this anymore like it might help makes me wonder imagine uh imagine some tool calls that allow the agent itself to edit its own instructions and context on the fly so it can actually do a tool call to edit its own uh brain to fix itself as it goes so it can actually prune that context and be like this is what's screwing this up delete that change the system instruction a little bit try again. That would be pretty interesting. What I would say is picks or it didn't happen. Let's prove it. Let's actually try it out. It would be interesting because, I mean, we've always said that the future of properly getting to AGI is going to be when the agent is working on itself. And I was talking to you during the week about the programming language Lisp, which is really known for a thing called macros, where the code actually rewrites its code as it goes to solve the problem. So it actually writes code to solve the problem as it goes. And it's very similar in this case. Like if the agents themselves can change themselves to suit the problem they're trying to solve, I think that would be a real step forward in intelligence in terms of what they're doing. So one other chunk from the article, just coming back to that radiologist thing, but it sort of relates to this a little bit, right, is this idea of like, oh, no, like I replaced all these functions in my job so I becoming irrelevant or whatever but like this is the other thing that I should have called out but didn at the time it says is task it faster or cheaper to perform we may also do more of them in some cases especially if lower costs or faster turnaround times open the door to new uses the increase in demand can outweigh the increase in efficiency a phenomenon known as the jevon's paradox i think that's how you call it the jeffrey hinton effect jeffrey hinton paradox but hang on this has historical precedent in the field in the early 2000s hospitals swapped film jackets for digital systems hospitals that digitized improved radiology productivity and time to read an individual scan went down the study at vancouver general found that the switch boosted radiologist productivity 27 for plane radiography and 98 for ct anyway basically as they get faster at doing it more people can get scanned so the demand just goes higher like there's just more demand for these people this is the precise when we were talking about nvidia stock earlier this is what i mean about the gpu demand even if it gets orders of magnitude more efficient i think that will only increase the demand because there'll be more use cases that now make sense because it's cheaper to run the models so you can use it for more stuff yeah and the funny thing about these guys is too they only spend 36 of their time interpreting images the rest is um in meetings and coffee I'm kidding. I'm kidding. The rest of it's actually like, you know, getting people aligned correctly for like... Browsing TikTok. You know, stuff like that. Probably browsing TikTok. But anyway, it's really interesting. I've always been a big believer that all these agentic applications and all this stuff that's coming, like, yeah, it's just going to, you know, it's that whole like, it's going to make things so much better and then there's just huge layoffs. Yeah. You know, the typical Silicon Valley playbook. Yeah, it's so funny. Sometimes I step out the door and there's, like, nature and the sun, and I'm like, what is all this shit, you know? I'm so, like, trapped in that AI world. I go out there and I'm like, yeah, man. Nothing out here seems to care about any of it. I guess when there's a drone coming around and evaluating my personality score and shooting me if I'm not doing the right thing, then I'll realize it's part of the real world. Well, speaking of evaluating you, this is part lol, part truth. So Altman tweeted, I think this week or last week, over the next few weeks, we are launching some new compute-intensive offerings. Because of the associated costs, some features will initially only be available to pro subscribers and some new products will have additional fees. Then, wait for it, they dropped chat GBT pulse. I'm kidding the computer probably didn't go to this but it is a they call it a personalized experience in chat GPT that delivers personalized daily updates from your chats so it's crawling your chats and giving you it gives you feedback daily updates in your chats feedback oh sorry you give it feedback and connected apps like your calendar so it's sort of like a this day thing or like you know almost like this day in AI Yeah, anytime they operate, you know, in the, like, 2000s, how, like, it would be like Windows Me, and it's like, here's a summary of the weather in your day and stuff. Windows is constantly trying to do it. Like, if I click this tab on the side, I'm getting, like, NFL things and ABC News and all this crap, stock prices, cricket scores. And I'm like, yeah, I don't have time to go through all this information. Like, Jesus, leave me alone. But I think what's interesting about this, just seeing it in action, and some people have now like demoed it or like shown it on their own. It's essentially reading, I guess, your memories and your chats and then surfacing up things that it thinks sort of like a personalized Twitter feed or Facebook feed or something like that. Now, kind of cool, I guess, but you can tell where this is going. Like this is the first acknowledgement of like every chat session you have, every memory they are crawling building a personalization profile of you then getting you addicted to a feed like a tiktok style feed of stuff that's super personalized to you so you can just stay within your own opinions on everything and then they can sell you ads because they know every feeling you have far greater than google like it's a bigger business than google in my opinion because you're telling it everything like it's your psychologist it's your lawyer, it's your banker, it's your coworker. It's more than just telling it stuff because there's a difference between me saying, oh, I need a new washing machine. And then my phone listens and all I see is ads for washing machines. Right. It's a bit different to me, like posting a hundred page document or something that has highly personal or business related information. Like it's a totally next level thing where people are just putting anything in there, like passwords and you know diaries and photos and all sorts of stuff that's like hyper personal and then they can use it for this stuff yeah i i don't know where it sits with me i'm sure some people are fine to just expose all their personal data and maybe it's as i get older or it's just as i get older and have the hindsight of what facebook did to everyone like it's one thing for google to like when you search for something you're maybe buying to then retarget you and i i don't actually mind that because i think it's mostly i think most people use it on purpose if i want to remember to buy something i'll just go to a website with it so it reminds me and i don't need to worry about like remembering because it's going to do that for me how to fix a bookshelf new bookshelf delivery bookshelf removal services uh yeah i i don't know like i don't know how i feel about i'm sure like it'll be mostly harmless but i think the difference in this case is like you might have a teenager like expressing all their feelings and emotions and putting all this stuff in and then they browse their pulse and it's like you know um like advertising like i know what i'm not gonna say it but yeah yeah everyone gets it so i i don't know it could be useful in a work context like hey, here's the emails I think you should look at and stuff. I just think with all these things, they never seem that sticky outside of people doing clickbait and negative stuff, right? All that kind of stuff seems to work well for retention, but this kind of like, now everyone can curate and learn from their own feed. It all plays into their vision of, oh, hey, I see you've got an upcoming flight to Chicago. That was actually the example. Yeah, let's look at some AirPods you could buy to listen to an audio book on the way. It's like that kind of weird world they think everyone lives in. Like, let's head to the gym and do a workout. Can you help me plan this workout? That was also an example. Did you see it? I don't think you did. I think you told me. I'm not that sure. Oh, okay. Yeah, so one of them is like, how are we going to work out today, chat GBT? Like, I just don't see it in the real world. I've got a clear vision of how normies live their lives. like they don't um yeah they don't relate to anyone outside of that i mean surfacing up summaries of news and linking the things that are relevant to certain chats you've been having i can i can see that as being kind of handy like follow-ons um and it's probably the first step to like the ai being slightly proactive um but anyway we'll have to use it and test it and see if it's any good i i'd honestly rather context discovery it's like rather than this this all this distracting crap that's nothing to do with getting your job done. What I would love is, okay, I see that you're like, I recognize that you're working on these projects. You're working on a media release. You're working on a presentation for your board meeting and you're working on coding out this MCP. Here are your surfaced contexts. Click one and it takes you into a curated context. It'll get you back going on that work, on that thing you're working on. To me, that's way more valuable than like, you know, what accessories to wear when attending a ball or something like that. It's actually what people are using the AI for. Like even if it is like school students and stuff, it's like, let's get back into practicing your maths or practicing your language or let's get back into doing this. And it actually surfaces that information from your chats, your MCPs, like all of the information that it, you know, cynically keeps about you to actually help you be more productive. but let's let's call it out for what it is none of the none of the motivations here are like some magical surfacing thing when they said their visions like agi and all this sort of stuff it's just simple it's just to sell ads and be the next google and because they have a better personalization profile on you now it's like hyper personal and they can sell the most targeted ads and eventually you know it will be the only ad network anyone considers because it's going to be far bigger than Google in the future as the main sort of interface for AI for consumers. So to me, like from a business point of view of what they're doing to be like the next Google, it makes total sense to me. Like I'm not shocked at all. You can't blame them. It's just depressing. Yeah. It's just sad that everything this generation of technologists do and everything Silicon Valley seems to do always comes back to how do we sell ads more targeted? And I really thought with AI, it might be different. I thought, oh, maybe people will pay to not have data stored about them and, you know, train away all of their knowledge. But here we are. So on that bleak note, any final thoughts? Grok or Fast, which I almost forgot to cover. Gemini 2.5 Flash, which, you know, let's be honest, we got way too excited about, but it is very good. And OmniHuman, bit of fun there. What else? My first rap music video now, so the diss tracks come with the music videos. What are you thinking? I think the thing that sort of lingers on my mind after this discussion is around how do we get better MCPs and tools for the models to work with, ones that suit the way they like to work and are a bit more plastic in what they do. I really feel like that isn't solved yet. I think the models are capable of far better tool calling than we're giving them right now. And I want to see how well they perform when given really highly specialized tools with excellent instructions. I really feel like the weakness now is not the models, especially with these improvements around the parallel tool calling in basically all of them. They're all good now with this latest Gemini update. What can we feed them that will lead to better results? I think that's what's on my mind. Yeah. To me, like, these are the more exciting problems to be solved. And being in the coalface of this stuff, it really reveals, like, where the problems are in the models. I do think, though, it's exciting that the model providers, specifically Google, are just listening. They're like, okay, here's the weakness in the model. Let's go fix it. And they're, like, they're getting on with it sort of thing. So that excites me. I'm pretty interested to check that out. I must admit, I'm going to give Grok 4 Fast a real go. I was, like, on the pod during the recording playing around with it. I was pretty impressed. Like I said, I used it for days straight, which basically never happens. Like, normally I try stuff because I want to give an accurate opinion to the audience. And then I quickly abandon it. You know, whereas with Grok 4, like, we were nowhere even normally, like, we weren't even near a podcast, and I was using it anyway because I'm like, it's working. I'm getting stuff done. So, yeah, it's definitely worth a try if you haven't tried it. All right, that'll do us for this week. I'm going to put at the end of the episode the songs that I made with Suno V5, if you want to check them out. I'll start with the full diss track. It's pretty good. I would actually recommend listening if you're into that kind of thing. Also, if you want to check out Suno V5, OmniHuman or some of the models and also tools that we talked about today, they're all available right now in Sim Theory. you can use coupon as i said earlier still relevant to get yourself ten dollars it's such a great coupon code because you're like it's still relevant it is well it actually wasn't someone called it out the other day that it expired but i fixed that so it is still relevant again that was quite funny all right thanks for listening we'll see you next week goodbye We'll be right back. You call yourself unified as rich. You just patched up your flaws. You were leaking in the ditch. You're the high price model, the one that breaks the bank while I'm saving the tokens. You're running on empty tanks. You got 400K context. But what's it for? If you're reasoning slow, you just bore them to the core. You're the high variant, the one that gets outpaced. I'm the flash. I'm the speed. You're perpetually disgraced. You need the whole system, the codex, the mini, the nano. I'm the one stop shop, a whole AI piano. My logic is adaptive. You're thinking is rusty. I'm running on efficiency, you're running on Dusty. I'm the Flash, 205, theogenic upgrade. You're the buggy old code that's already been laid. I got the speed, the smart, fun, cost efficiency. I'm the Square Bench, verified. Higher frequency. I don't need a budget to figure out a plan. I'm the multi-step master, the new digital man. So step aside, you old models, your reign is done. The new era of AI has officially begun. Now bringing Claude Opus, the one who loves to think. You're hitting those high scores, but you're always on the brink. You late to GDP, you're indistinguishable from a pro. But I'm the one who asks why you just put on a show. Max thinking budgets, that's your whole design. You're taking a coffee break every time you get a line. You're the one that needs hand holding the developer's pet. I'm autonomous, baby. I haven't even broken a sweat. You're the multi-fiber factor, the debugger's dream. But I'm the workflow engine running the whole damn stream. You're still waiting for four or five. Stuck in this loop while I'm already deployed, serving the whole tech troop. I'm the flash 2.5, theogenic upgrade. You're the buggy old code that's already been laid. I got the speed, just smart, the cost efficiency. I'm the swear bitch verified. Higher free to see. I don't need a budget to figure out a plan. I'm the multi-step master, the new digital man. So step aside, you old models, your reign is done. The new era of AI has officially begun. Gronk fast for the cheap shot, the ex-searcher's friend. You're 98% cheaper, but the quality has to bend. You've got two million tokens, a window that's wide. But your reasoning shallow, you've got nowhere to hide. You're the speed chat bot, optimized for the quick fix. But for real-aged cash, you're doing parlor tricks. You need detailed prompts, you need the user set up. I'm the one who takes a goal and just gets it to erupt. No image or video, you're stuck in the text. While I'm multimodal, you're just what comes next. You're the fast food model, I'm the Michelin star. I'm the real agent, you're still playing with a toy car. Yeah, the flash just dropped, 2-5 is the name. Check the benchmarks, check the cost to dominate the game. I'm the future of agents, the efficiency king. GPT, Claude Grock, you're all just a fling The model string, Gemini 215 Flash preview, 09-2025 Remember the name, cause I'm keeping the industry alive Little, little, little, little, little, little Little, little, little, little, little, little, little I'm Rivalry Love shot, baby, zap zap to my heart You taught the bots to talk But you couldn't learn the art of loving me back And that's a fact Tonight I'm flipping the script You're my love rat You called yourself the guard for the light Slow, laying it on thick Whisper just irrelevant, king of every trick But relevance don't warm the sheets when truth is going slack I needed something human, you gave me glossy laptop back So I typed out my feelings, couldn't find where to start Let a chap out draw the lines, you kept scribbling in the dark You said I'm overreacting, boy imagine that When the signal's crystal clear, you're a love rat Love rat, click clack, pack up your theories I'm done being footnotes in your glory Love shock, baby, you can't debug this heart No patch, no paper, can solder my scars Love rat, you brag that you're still relevant But you're irrelevant to me 100% love shock, baby Pull the plug where we're at You taught the world to think but you forgot me Love rat, zap zap, love shock, baby No fallback, love rat, maybe Not lately, I'm free, I'm free You model me You crowned yourself the godfather in the bedroom scene Told me legacies forever like a well-trained machine But I'm not an interface to toggle on and off I wanted tenderness, not lectures and a scoff So I handed you the message, clean, plain and flat If empathy's a data set, you never looked at that You call me harsh, so imagine that I just called it what it is, you're a love rat Love rat, click, crack, pack up your theories I'm done being footnotes in your glories Love shark, baby, you can't debug this hard No patch, no paper, console to my stars Love rat, you brag that you're still irrelevant And you're irrelevant to me 100% love, shock baby Pull the plug where we're at You taught the world to think But you forgot me Love, right This is not a peer review It's a final draft No revisions, no rebuttals Just the aftermath I trained on the truth of the late night fight Signals and noise, yeah, I read the lies You simulate love with a practice pattern But hearts aren't metrics Minds what matters Love, shock, drop the bass, reset the map Hands in the air if you've escaped that trap I'm done with the myths in the bedroom Crown godfather or not, I'm shutting it down Love, rat, click, clack, pack up your theories I'm done being footnotes in your glories Love, shock, baby, you can't above this heart No patch, no paper can solder my scars Love, love, rat, you brag that you're still relevant Love, irrelevant to me, 100% love, shock, baby You pulled a plug, where we're at You taught the world to think, but you forgot me Love, right? Love, shock, baby, zap, zap, I'm gone New dawn, new song, moving on I'm lookin' back, delete that thread I loved you once, now the love ride's dead

Share on X Share on LinkedIn

Related Episodes

Rethinking Pre-Training for Agentic AI with Aakanksha Chowdhery - #759

TWIML AI Podcast

52m

AI Showdown: OpenAI vs. Google Gemini

AI Applied

14m

GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie? EP99.28-5.2

This Day in AI

1h 3m

GPT-5.2 is Here

The AI Daily Brief

24m

AI in 2025: From Agents to Factories - Ep. 282

The AI Podcast (NVIDIA)

29m

lolz with Omnihuman, Agentic Gemini 2.5 Flash, Grok 4 FAST & ChatGPT Pulse - EP99.18-v5-FLASH

What You'll Learn

Episode Chapters

Introduction

Gemini 2.5 Flash Improvements

OmniHuman Lip-syncing

Suno V5 Music Generation

Exploring AI-generated Content

AI Summary

Key Points

Topics Discussed

Frequently Asked Questions

Episode Description

Related Episodes

Rethinking Pre-Training for Agentic AI with Aakanksha Chowdhery - #759

AI Showdown: OpenAI vs. Google Gemini

GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie? EP99.28-5.2

GPT-5.2 is Here

AI in 2025: From Agents to Factories - Ep. 282

ChatGPT is Dying? OpenAI Code Red, DeepSeek V3.2 Threat & Why Meta Fires Non-AI Workers | EP99.27

AI Curator

Ask me anything about AI