

GPT-5 A Week Later, Ideogram Character Reference & gaggle poaching - EP99.13-THINKING-MINI
This Day in AI
What You'll Learn
- ✓GPT-5 fell off quickly for the hosts, who found themselves using Claude Sonnet 4 more for daily tasks due to its speed and capabilities with MCPs.
- ✓GPT-5 is still the go-to model when the hosts are stuck on a difficult problem and need more advanced capabilities.
- ✓The removal of older GPT models upset many users, who had developed a rapport with those models and preferred to switch between them for different tasks.
- ✓The hosts believe that users are more sophisticated in their model selection than commonly assumed, and that the ability to choose the right model for a task is a valuable skill.
- ✓There is a small gap between novice and advanced users of AI models, as people quickly become proficient in using the tools for their work.
Episode Chapters
Introduction
The hosts discuss the usage and performance of GPT-5 after its launch.
GPT-5 Usage
The hosts share their experiences with GPT-5, noting that they have found themselves relying more on other models like Claude Sonnet 4 for daily tasks.
Community Reaction to Model Removal
The discussion touches on the community reaction to the removal of older GPT models and the hosts' belief that users are more sophisticated in their model selection than commonly assumed.
User Sophistication in Model Selection
The hosts discuss their belief that users are more sophisticated in their model selection and that the ability to choose the right model for a task is a valuable skill.
AI Summary
The episode discusses the usage and performance of GPT-5, the latest large language model from OpenAI. The hosts share their experiences with GPT-5, noting that while it is powerful for solving complex problems, they have found themselves relying more on other models like Claude Sonnet 4 for daily tasks. The discussion also touches on the community reaction to the removal of older GPT models and the hosts' belief that users are more sophisticated in their model selection than commonly assumed.
Key Points
- 1GPT-5 fell off quickly for the hosts, who found themselves using Claude Sonnet 4 more for daily tasks due to its speed and capabilities with MCPs.
- 2GPT-5 is still the go-to model when the hosts are stuck on a difficult problem and need more advanced capabilities.
- 3The removal of older GPT models upset many users, who had developed a rapport with those models and preferred to switch between them for different tasks.
- 4The hosts believe that users are more sophisticated in their model selection than commonly assumed, and that the ability to choose the right model for a task is a valuable skill.
- 5There is a small gap between novice and advanced users of AI models, as people quickly become proficient in using the tools for their work.
Topics Discussed
Frequently Asked Questions
What is "GPT-5 A Week Later, Ideogram Character Reference & gaggle poaching - EP99.13-THINKING-MINI" about?
The episode discusses the usage and performance of GPT-5, the latest large language model from OpenAI. The hosts share their experiences with GPT-5, noting that while it is powerful for solving complex problems, they have found themselves relying more on other models like Claude Sonnet 4 for daily tasks. The discussion also touches on the community reaction to the removal of older GPT models and the hosts' belief that users are more sophisticated in their model selection than commonly assumed.
What topics are discussed in this episode?
This episode covers the following topics: Large Language Models, Model Selection, User Sophistication, Model Removal, Model Capabilities.
What is key insight #1 from this episode?
GPT-5 fell off quickly for the hosts, who found themselves using Claude Sonnet 4 more for daily tasks due to its speed and capabilities with MCPs.
What is key insight #2 from this episode?
GPT-5 is still the go-to model when the hosts are stuck on a difficult problem and need more advanced capabilities.
What is key insight #3 from this episode?
The removal of older GPT models upset many users, who had developed a rapport with those models and preferred to switch between them for different tasks.
What is key insight #4 from this episode?
The hosts believe that users are more sophisticated in their model selection than commonly assumed, and that the ability to choose the right model for a task is a valuable skill.
Who should listen to this episode?
This episode is recommended for anyone interested in Large Language Models, Model Selection, User Sophistication, and those who want to stay updated on the latest developments in AI and technology.
Episode Description
<p>Join Simtheory: <a href="https://simtheory.ai">https://simtheory.ai</a><br>----<br>CHAPTERS:<br>00:00 - Simtheory plug<br>00:48 - GPT-5 1 Week Later, Reaction to GPT-5 & Our Thoughts on Future of AI Models<br>30:12 - Ideogram Character Reference Fun + Disturbing Photos of Us<br>37:33 - Using creative MCPs together for photos, videos and 3D objects<br>43:16 - MCP output combinations and the explosion of MCPs<br>51:18 - What is needed from the next models like Gemini 3.0 Pro<br>54:30 - Sundar Pendant Design & Final Thoughts<br>56:20 - Final LOLz of week: gaggle poaching<br>58:10 - Surprise GPT-5 Indie Song</p><p>Thanks for all of your supporting and listening to the show! xoxox</p>
Full Transcript
My name is Geoffrey and you're listening to This Day in AI. Before we start today's show, just a quick reminder, the latest version of Sim Theory is now available with an MCP store, async work via tabs, context forking, agentic workflows and has all the models and tools we use on the show. Use coupons still relevant to get $10 off an AI workspace and if you have a team or organization that wants a fully customizable workspace with privacy and security in mind, you can now try one out for the first time with your whole team for seven days to see what you think. Visit simtheory.ai and get your AI workspace set up today. So Chris, this week I've been using the new ideogram character reference to put you in really compromising photos and videos. More on that later. But first, I wanted to discuss are we still using GPT-5? Is it now your daily driver? And just check in after the sort of fallout of the GPT-5 launch. Well, I forget what I said last week. I think I was pretty positive about it, despite the community not liking it so much. But to be honest, Yeah, I haven't really been using it that much. It fell off a cliff rapidly for me early in the week. Can you explain why it fell off? I think just once you get into the meat and potatoes of regular work, you just go with the models that give you the best results. And I actually have found myself using Claude Sonnet 4 a lot more this week, partly because it got faster because we got more allocation from Amazon to use it. So that was nothing to do with the model improving. Nothing to do with you trash talking them on the last like three shows. Yeah. You know, I never know. Like, is it a coincidence or whatever? But like, I guess a bit of tough love does work from time to time. But yeah. And so I just GPT-5 in general. I just don't go to it, basically. So the thing I've found about working with it is I've surprised myself a little bit. I'm using it a lot. Like so much more than I thought I would. but in weird and strange ways. I think I agree with you. In terms of daily driving with MCPs, I'm definitely still favoriting Sonic because I think, as you said, it performs the best. It's now way faster for us, at least. And it's just capability to go off and call many tools I really like. And I think it's just maybe because of the spin-up or the demand for GPT-5 right now, what we're seeing is this problem around, you know it's just slow to respond through the api because they're under such high load being a newer model and so if i'm working with mcps and i want to go back and forth right now sonnet is definitely my preferred option but one observation i have about gpt5 and this is previously where i would switch to o3 pro is it's very smart like if you're stuck on a hard problem it still is my go-to model And I would say this week's be my go-to model now more than ever before. That's interesting. Yeah. When you have an actual deep problem you need to solve. I think that's the thing. There's like sort of daily grind of getting tasks done where most of the models can do it competently. And it's only once you get really stuck on something, you really need a much better model. And that's when you start to build an affinity for a model when it can get you out of those tight situations. And this is the surprising thing for me this week. I thought, I don't know, from all the hype and the expectation and Sam Altman putting up like the, I think I'm not the biggest Star Wars fan, but like the Death Star or whatever, like hyping this thing up, like, you know, it's going to take over everything. I was, like everyone, the expectations for GPT-5 were through the roof. Like they had hyped this thing to no end. We had all been expecting it since GPT-4. And so I feel like they were just never going to deliver or meet the hype. but I did get this like feeling in my belly when it first came out I'm like maybe this will be the one model to rule them all but as you say there's sort of these grunt models like Gemini 2.5 Pro that's got the huge context window and it's just good at churning stuff out for you throughout the day and then you've got Sonnet that's working really well agentically with MCPs and then GPT-5 for me at least is my go-to like I'm stuck I need some help or I need a plan to get unstuck and that's sort of how I've been using it. Also I think to some degree it's what you always say which is the the model improvement is plateauing to the point where we're not exactly making empirical comparisons between the models it might just be you happen to give the problems that are easier to one model and therefore you're like, it's amazing. And then you give a harder one to a different model and say it's crap. So I think that partly you're getting relatively equal results from a lot of the models and therefore you're just going with whichever one's faster or more context window or just works for you on the day. And so a lot of the feedback around ChatGBT, at least prior to GPT-5, was this whole idea of the model switcher where you sort of had, And we've lulled at it on the show many times, like, 04 Mini, 04 Pro, GPT-40, GBT-4.1, GBT-4.5. I mean, this was actually real a while back. And I think it's somewhat laughable because they sit in the consumer market, obviously. And so, you know, like everyone's sort of been saying for a long time with the user experience, like we can't have these models in the picker like it's just too too much but then they took them away they they sort of did the feedback which is say like gbt5 is going to be a router it's just going to figure out where to route it uh and it'll give you the best answer and they said the router wasn't initially working very well at launch and people got frustrated and then all of a sudden they started rapidly backtracking to the point of like oh you know you can re-enable all the old models you can re-enable the model switcher and you can just have it the way things work that's right because at the announcement they just immediately took away all the four series models right from the ui and then i think they said the api is going to lose them fairly soon too yeah and they also did that um like eulogy for gpt4.0 which uh really upset a lot of people especially uh there was like tiktoks and uh various videos being made about people being upset that like it wasn't the same like it wasn't basically chatting to them as much or like you know acting like it was previously and i've never watched the movie but i'm pretty sure that's the plot to her where they take the ai away right like i haven't either but remember many episodes ago i talked about how you sort of build up a rapport with your assistant and that is to some degree tied to the model so if you take it away eventually you're going to lose them and lose that connection you developed throughout the working week but so just for the lols so in simplifying around gpt5 we've gone from having just gpt5 to now we have gpt5 auto decides how long to think we have fast instant answers thinking mini thinks quickly thinking thinks longer for better answers and pro research grade intelligence then under legacy models we have gpt40 gpt 4.1 gpt 4.5 03, 04 mini. Now, I would like to make a huge statement here, a big statement, which is, I think that AI companies, us and everyone else in the tech industry is really underestimating the average punter with AI. I think the reaction to taking away a lot of these models is because power users of AI, which is almost everyone today, and people that start to get a feel for different tunes of models do start to realize what model is better at what and they do switch around a fair bit out of your interest and so we assume it's sort of like a ux design world that oh you know this is way too complicated for plebs but it turns out most you know and i think you said this last week like most people are a lot more sophisticated than than people think with They're quick to adopt it because it's so powerful. And maybe the human right now with the way the models are is the best at picking the model, just like you would pick a software application to do a specific task. You just now pick a model to do a specific task. I think that's right. I think for the same reason we said that people who are experts in their industry know when the model's giving good answers because they're an expert in that subject and so they can evaluate its ability to do those tasks. For that same reason, they're going to be able to evaluate the different models' ability to do those tasks well. So it isn't a case where you need to be technical or need to be a programmer to know that an answer to a problem is better or not. I actually think the huge difference between people in terms of their understanding of modern AI is have they branched away from ChatGPT at all? Because I think anyone who has even slightly, like they've tried Claude or they've tried something else at all other than the open AI model, suddenly realizes that there are alternatives and then they want them all. And I think that's the major distinction. And the second point you made, I strongly agree with, which I think the actual curve between very small amounts of knowledge of the use of the current AI models and being at the most advanced level where you're pushing the boundaries of it is very tiny. Like people go from that initial use of it to I'm using it for everything all the time in my job, and therefore I get good at it really quickly because I'm using it all the time. I don't think it's one of these things like, oh, the internet, I don't deal with all that tech garbage where it takes a long time, like 10, 20 years for all commerce to be done online. This is a much faster uptake. Yeah, and I also think that people really underestimate the benefits of exploiting all of these tools to get the best result. like you know you you would often think like if you look at an application like cursor there's a reason they offer all of the models right because for different programming tasks or different opinions or different workflows people like to flip around models but i don't think that is necessarily that different to knowledge work either where at the moment you might be writing something and you want to get like four different tunes of an answer from four of the best models to pick which is the best answer because I don't know. I think if you're in a sort of work, especially like an enterprise or like a business environment where the cost of running inference like four times, you don't really care. You just want the best outcome. You just want the best marketing copy or you just want the best strategy or whatever it is. Well, think about doing things like tender applications. If you're applying for a tender, you've really got one shot at it to get it right and win. and if the AI, like if it costs you $200 in influence to get the absolutely best tender application you can get through, of course you'll pay it. It's not a problem. You just want to know which one is best. That's why I sort of am starting to see, like I see MCPs and even models in this current paradigm and iteration of LLMs as just like sort of apps. Like, you know, the app is the model where you're like, oh I'm gonna use the this particular model for this task because that's just I get the best result from that model and I'm gonna then pair this model with these like four MCPs because that's my workflow for this task right now and if a better model comes out I'm gonna slot that model in and I think the GPT-5 router is somewhat like trying to do that human tune or taste test for the user and i'm not sure like i'm not sure if that's actually benefiting anyone because the first reaction from people that use chat gbt at least in their workday was like i want to know what model it's using like oh like if i'm doing some research i want it to use the best thing well i want it to use the best thing all the time and i think that's why that that switcher quickly came back because it was like we've got to allow them now to select like the feedback was very loud on line like i'd also imagine that if you're doing switching right because we do it to some degree with tool selection if someone has a lot of tools um you really are relying on the cheaper smaller models to make that decision as to what's available and if you think about it it's like a less intelligent person deciding whether or not they will ask a more intelligent person or not to answer a question so it's a case where you don't know what you don't know kind of thing so I'd imagine there'd be scenarios where you think, I don't want to delegate this decision to a stupid model to decide if it's going to bother to call the smarter model, because I just want the smarter model in this case. And I think that's where people are not going to want to be hands off and go, I'll just leave it to the fates and let the model decide. You really want to be the one making that decision. And like, oh, I'm happy to hang my head in shame here, because I I thought given the sort of mass adoption from consumers with ChatGBT, people ultimately wouldn't care about which model was being used and they would just dumb it down to the point where it like, you know, you had no idea. You just went to ChatGBT and there was no model picker at all. But the reaction from users, and I don't think this is like the, I don't think that these kind of users are like the loudest, you know, and like tech people either. I think it's also been, you know, your average punter, right? Like it's sort of like, hey, I was using this for this or like, I guess what I'm saying is I don't think it's relegated to some tech elite thing of giving us our models back. I think it's everyone. Yeah, I agree. And I think that that is also just part of the problem with this technology in general. Everyone wants it to be better, but no one thinks, what if it gets worse on the things that it's actually helping me with right now because that's happened a few times where a model is at the minimum a lot different like we saw when some of the models came out so verbose and they were just outputting so much text and people were used to less and things like that. See, it's really an ongoing relearning process every time the new models come out. So I can see a desire to be able to have a snapshot in time of something that's fully working for you and stick with it. But you've got the other side of the coin where the providers really don want to be publicly hosting models forever when they got the newer and greater thing they want to use their limited supply of gpt use for what fascinated me is that just the love for gpt 4 from people that was sort of relying on it as a daily tune like their daily assistant um you know the reaction that that was just abruptly taken away um which in hindsight is not that surprising but like at the time i didn't think it'd be such a big deal but it does show that the tuning that goes into each model too people get used to a certain tune and by tune i mean like how long the response is you know if it entertains like you know personality like all the various traits of the the model how it outputs code how it writes so people do get used to these things and then to abruptly just slam it away um yeah i i do think it it's opened my eyes at least to the fact how connected and attached people have become to these various models now to just put this launch in in context and kind of recap so sam altman the day after gbt5 came out did an ama on reddit because that's what people still do i guess in 2025 um and he talked about the launch problems and technical difficulties uh they were blaming the fact that the model seemed dumber on the auto switcher um and they said it was down for a big chunk of the day which made gbt5 seem way dumber that sounds like total bullshit yeah i don't know if i believe uh it says the massive scale rollout caused api traffic to double in 24 hours and created uh hiccups for hundreds of millions of users i believe that i think it probably got smashed yeah um bugs in the model router not properly detecting coding queries to the thinking model which caused gbt5 to seem like a downgrade for programming tasks not because it probably was it was probably routing everything to a lesser model yeah um the live stream presentation had human error with misleading bar charts due to people working late and being tired it's not also they they admitted that one where you pointed out the wrong numbers on there. Yeah, but get this. Like, they are up there saying that GBT5 hallucinates less, yet, oh, we were tired and it was human error. Yeah, why are the humans making the charts? Don't you trust your tech? It's not a great look. Anyway, the model switcher currently decides based on domain and complexity, but doesn't search the web first because that would increase latency too much. I mean, that's pretty obvious to me. User access and model options. OpenAI heard loud feedback about bringing back GPT-4.0 and will restore access for Plus users while monitoring usage to determine long-term supports you need to pay now to get access to your buddy. They're working on GPT-5 Mini to restore the same overall reasoning usage limits that users previously had with O3, O4 Mini Height, and O4 Mini... I mean, this is out of control. Plus users are supposed to have unlimited access to GPT-5 and anyone getting rate limited is experiencing what they consider a bug. They're exploring whether users need access to both 4.0 and 4.1, or if just 4.0 would be sufficient. Like, this is a mess. And I think they're dealing with strange problems that no one's really dealt with before as well here. It also seems like a hands-off CEO who's, like, come into the boardroom and they're like, f*** these guys. Let's just take away the four models without really being close to the actual usage and understanding what's really going on. Like, you know, it's a higher level decision that's been made and then they've panicked and realized they've upset everyone from the outside in. And then he's sort of deflected the blame onto, like, the errors of his team and the model not accurately picking and all this sort of stuff. It just seems like they've made actual decisions, haven't liked the consequences and then said, oh, it's just a rollout issue. Yeah, I don't understand. And to me, they just would have been better coming out and taking full responsibility and saying, like, we completely misread this situation. You know, here's what we're going to do about it, which is kind of what they've done. But it does seem like they really are making a lot of excuses. And totally understandable. It's a rapidly evolving situation where they're big leaders in this area. And sometimes you're just going to get it wrong. There's absolutely nothing wrong with being like, oh, we screwed up bad. We're putting it back. Which is what they did, to their credit. It's just that they didn't really want to admit that. They should have just said SOZ, like SOZ, classic apology, classic corporate apology, SOZ. So this is the one I just wanted to call out, and then I'll stop banging on about this. But safety and filtering improvements, they're actively fixing over-flagging issues in biological safety that were incorrectly blocking legitimate academic research. Now, our community is experiencing this constantly, the content filtering getting triggered by the most ridiculous things. It is mental. Like it can be the most innocent thing and it triggers it and then you get an API error that's like soz. And all of this, I think, is a great example why larger companies should be looking at their own private hosting of all of these models, which you can do even with open AI models through Azure and Anthropics through Amazon and Google, for example. so they can have the data in a region that they want it, like in their own country, for example, and a deployment of the model that isn't going to be like swapped out and switched and changed and suddenly censored. At least you're getting consistent results and your data is private and you're not subject to the whims of Sam Altman and his group of the week. Like you really need that level of stability, even if, okay, eventually they'll deprecate the models and you'll have to upgrade, but you can do it on your own time schedule not when these guys decide to do an announcement to trump google or something like that yeah i i do want to now go into like less of the drama and more of like my sort of lived experience with the model i i have to say after the week i've had the world is better for gpt5 especially the thinking tune like that that model is smart and it's made me also realize that Gemini 2.5 Pro is in need of an intelligence upgrade. I know they have that, like, long thinking or super think out to, like, ultra plus plus plus users willing to pay Google, like, 500 a month or something ridiculous, which, let's be honest, like, no one's going to actually... I just don't think it's that realistic. Yeah, unless they're, like, a YouTuber, like, I experimented with the smartest AI on the internet. Wait, isn't that what we're supposed to be doing? Yeah, but we're way less popular and less entertaining. Yeah, so I think that it needs an upgrade. Like it started for the first time Gemini 2.5 Pro over GPT-5 thinking seems dumb. I actually said it to you this morning and you did make that point about the relative impressions because I was like, did Gemini 2.5 get dumber? And you're like, maybe the other models just got better. And so it just doesn't seem as smart by comparison, which seems likely. But I must say this week, I've definitely found myself using Gemini 2.0. Like it was the only thing I use to the point where it was all like I wouldn't even try the others. But now I barely use it because I'm just not happy with the output I'm getting. And I don't know if that's because the others are better or not. Yeah, I think it shines when you've got a hard problem. But I think it's like a grunt worker. Gemini is still better. it just like big chunky outputs with less hallucinations even though gbt5 is meant to do that gbt5 i think suffers from a bit of laziness still that the gpts have always suffered from where you ask it for something and then it only spits back like i was getting it to rewrite something the other day for me and produce graphs and um and just like read into some data gbt5 and it would just give me like here is the change in like a single line which is the right answer like to its credit but i'm like no i want you to kind of give me the whole thing again um yeah and show me what it looks like in context so it it yeah anyway it's just a bit lazy whereas gemini will just like gung-ho and just give you everything you want and also to gemini you can't stop it from rewriting stuff like you're like just give me step by step of how to do it so it's like sure chris my love here is the step by step but then at the end also i've rewritten the entire 2000 line file for you as well like it's like but i don't want you to do that it's like but i must i must it's my duty but you know what i think in terms of just where this all sits is this is just going to put enormous pricing pressure on anthropic because if i'm cursor say right now that is burning a lot of tokens and probably losing a lot of money and gbt5 comes along and it's smarter and it can work in my prompts and my workflows in order to have the sort of agentic coding capabilities, I'm 100% going to push that model over the Claude model simply because, you know, you're talking $1.50 to $15.00. Like, it's a 15x. You know, it's 15 times more expensive for Opus over GBT5. And if I'm in the enterprise too, and I've got this smart model for my coding pipelines and productivity, flawed starts to look really expensive. So I think that's going to put a lot of pressure on Anthropic and give OpenAI a chance to sort of, you know, really rug pull that business out from under Anthropic in terms of code. Because if a larger company, we'll say 20,000 employees now, is coming to you and saying, which model should we make our primary for our team? you're not going to recommend Opus because you're like you don't want to bankrupt the company you could never justify the budget you'd be like oh okay cool it's only going to cost you about two to three hundred dollars per month per user across your 20 000 users and even that's probably not enough they're going to run into limits and all that sort of stuff it's just unrealistic and you're like and oh but i mean we can afford it how much benefit is there it's like oh it's probably like a two to three percent benefit sometimes like you know it's just it's just not that much better to justify the massive increase in price yeah i guess it'll just come down to you know in the coming weeks if people really do settle on gpt5 it seems like a lot of people are just straight back to the clawed uh the clawed models like just straight up just back there and i understand like opus 4.1 does have a really good tune and it is very intelligent but i would just say I still think GPT-5 can just absolutely cut through the rubbish. And man, I'm pleased to have access to all of them. And this is not some promotion. I'm just saying, like, I love living in a world where I have access to all these models and I'm not constrained to one. Because I think, honestly, if I was stuck in the chat GPT single model world with GPT-5, I'd be pretty upset. Like, I need the other models. Yeah, you wouldn't even know what you're missing out on if you didn't have access to it. Yeah. You just be like, I'm ashamed that you've taken away my girlfriend. So do you think, like, obviously we touched on this before and to just, like, round up our average coverage of the GBT-5 launch. Like, do you feel a model step change now? And do you think when, say, Gemini 3.0 Pro or whatever we're going to get next out of Google comes, that could be a step up? Or do you get that plateau sense now using them as well? Or do you think we're now too stupid to understand if the models have improved? I think a big part of the problem is that the waters have been muddied a lot by the rise of MCP and tool calling because suddenly it becomes a lot more than just which model gives me better answers for my problems. It's like which combinations of tools do they choose to use? And the problem is that depends on tool availability and the question asked and they can be quite lengthy processes, So it's a bit harder to get a straight comparison. And then even once they do use the tools, it's like which ones are best taking into account the tool results to give me an answer and then following up correctly. So I'm finding it a lot harder to evaluate because there's just so many more variables at play when you're using the models. And so for me, a lot of it now is not even which model I use. It's more like which tool combinations do I have enabled for a given task? Do I directly ask it to use tools or do I just, you know, trust it to make the right decisions and that kind of thing? So I'm more in that area than I am actually just worrying about which model I'm using at any given time. Yeah, I think that the only limitation I've found so far is where other models will get stuck on something and I would normally give up and have to use my own cognition. Yeah. that I've been deferring to GPT-5 quite a bit and getting the solution really quickly or fairly quickly it's a bit slow uh and that I think I'm noticing an improvement there like where it's way more intelligent and but I was noticing that similar improvement before and that same workflow before with O3 Pro but I just can't really see myself daily driving GPT-5 due to its tune and how it answers stuff right now i think i'm still in the gemini and sonnet amp there so anyway i think we do way too much talking about models and model tunes yes it's maybe by the way just before we finish though my favorite one is when a tool call fails like you've asked it to research something like uh how i continuously ask it to research diabetes and um if it fails for whatever reason um the model will just be like hey look the tool call fell but i actually just know the answer to this do you just want me to tell you it's like we're kind of unnecessarily researching this man i'm already smart enough to know yeah it's like i was holding back because you said you wanted to do it this way but look i just know yeah like i'm i'm calling the tools to burn through tokens because i've been instructed to yeah exactly sort of like you know like a really smart servant or something being made to do medial tasks all right so now to some now to some lols deferring away from model talk one thing we missed last week was uh a new release by ideogram which is the realistic image generator that people really do love are you sure you're saying that right it's not ideogram or ideogram ideogram ideogram i'm sure i've got that one right um ideogram drop it in the comments below if I'm wrong. I bet you're wrong though, that's all. I just have an instinct Okay whatever I wrong about everything Ideogram character this is all the way back from July 29th This is how far behind we were But we are catching up So, because last week was such an epic week. So, this is very cool and very useful. So, you can take a photo of yourself. And I'll show you in a minute how low quality the photo can be. And how realistic it is with some example images I did for you, Chris. uh and so you take a photo any photo almost any angle works as well which blows my mind and then you can be like make me a boxer or make me a homeless computer programmer or something like that and you get phenomenal results it is too much fun i've wasted way too much time on this and uh yeah it's incredible but i think in terms of like what's it actually useful for I think nothing absolutely nothing no no no so it's pretty useful at like passport photos LinkedIn profile images corporate photos if you don't like you know if you don't like the current photo you have you just ask for something completely different and it's so realistic that you you wouldn't notice the difference these are some lols that I did in a demo video of it this is me as a male model um now let me look at i'll show you the photo this is the this is the sample photo so that's all it's just from that photo now there there were whole companies like that photo ai that you would pay like to train your own custom laura and all that sort of stuff in the past yeah that's right but now this is just like you drag and drop a photo in and like boom I mean, it's pretty crazy. Let me now show you some examples I made of you very quickly. Oh, this will be good. But you always send me the lowest quality photos for our YouTube thumbnails, which is always why sometimes you look... Your camera's so bad, even though I bought you a new one. Yeah, like every time you see me in real life, you're like, oh, you don't look sick. Yeah, I'm like, you look so healthy. And then I realized it's just, you know, bad setup. So here's a really low-res blurry photo of you, right? Yeah. Now, here's you at an internet cafe in the 90s looking a bit hobo-ish. In a denim jacket. Well, and it's got tape on it and stuff. Because I said, use ideogram character to make me look like a homeless person working on a laptop in an internet cafe from the 90s. That's pretty good. Thank you, Eddie. Well done. All of our listeners will love this segment. This next one is truly disturbing. Look away if you're easily disturbed. So I said, okay, do another one. But this time I've fallen on the floor and there was whiskey bottles around me. It didn't really nail the whiskey bottles. I've fallen. And you also now have a crop top on and your belly is exposed. But it looks like you. Hey, we're talking about character consistency here. That's unbelievable. like I mean look at the photo I supplied yeah you're right like I think if I showed that to someone I knew they would think it's me I think they'd know it's AI but they wouldn't think it's me so then the next one is now make an image of me in a girl pop group on stage before this is truly disturbing um let me zoom in here oh I gotta see this uh oh yes oh god I mean because the head of the person next to you is fully on backwards that's the giveaway it's like my wife was saying the other day kids could tell pictures are ai but it's usually because yeah they're complete the people are deformed or the fingers aren't right or something like that here's you as like an emo in the myspace era um it's it's okay um and then i did a few more here's me as a model in Italy notice how I'm doing you as homeless I'm doing me as a male model it's fine it's realistic so uh and then this one I'm like put a girl in the photo now I have a tattoo on my chest apparently um so that isn't meant to be your wife that's just some rando yeah it does look like her though which is kind of strange um and then I said uh now make me a firefighter so I'm just like changing it up in the chat. I'm like, now make me a firefighter. And then, bam, I'm putting out a fire in Australia. And yeah, and then I did as Batman. I mean, it like you got to come on like this is good, right? Well, I don't know if you were going to mention it, but you were also turning a lot of these into videos, right? With VO3. Yeah. So that's the other thing. So in an update where we've pushed out a whole bunch of new creative MCPs. I added in VO3 Kling WAN 2.2, which is a new open source model from the Alibaba group. And you can now just say, once you get an image, like a character reference, you can say basically like now, bring it to life or, you know, have them like fly off into the distance and just instruct it. And I think that bringing these tools together has a lot of interesting possibilities, right? Like this idea of now you've got character reference, you can create the images or the scenes, you can turn them into actually video scenes, you can create music with this Suno MCP, then you can create audio eventually with like 11 labs. so you could kind of see your assistant eventually being able to connect this stuff together edit it all together and uh and create like a little tv show or a movie or a musical or whatever like where everything works together and i i think it's really that combining these mcps together and leaning on these technologies is probably the next paradigm of where you start bringing this all together. Yeah. And I think part of the entertainment of doing it is when you use other tools like research tools or internal company tools or your emails or something like that. And you're like, go check my latest emails, turn that into an image, then turn that into a video and make a song about it. And you can genuinely do that. And it's just the most bizarre things like it looks so real, but it just, and it has a basis in reality, but who would ever do that? Who would ever go to the trouble of making a video and a song about these mundane things? And I, that's what I find genuinely entertaining about it. Yeah. The, the other really interesting thing about this is, uh, uh, like creating images or lookalike, um, like images from MCP and then tying it into other skills as well so another one of the interesting mcps is this um it's called tripo 3d and what it allows you to do is is sort of combine all these things together so in the example i've got up on the screen now i combined an ideogram oh sorry so the prompt was use ideogram flux wan and gpt image man this is burning the gpu oh yeah i was gonna i was gonna bring up mike how much did you spend on this stuff this week? I'll reveal that in a minute. It's not good. Use ideogram flux, WAN and GBT image to create a concept of a futuristic cyber car. It should be golden and fit at least four people. My prompts are weird. Anyway, so it goes off. It goes, this is with Sonnet. It's ideogram flux, WAN and GBT image at the same time. Gives me four concept cars. And then I say, okay, take the GPT one, which is the one I like the most, and remove the background using Bria. So Bria is this other really great model that just nails removing background images. And I know that sounds... I think that actually wiped out a couple of startups. I remember seeing on LinkedIn a while ago, there was a startup that was just for removing backgrounds for product images. Well, that could be the startup, but it's really good. it also does product placement so you can like take a picture of a product if you run an e-commerce business and say to this bria app like put it in like a nice you know forest or jungle like if you know like the example they give is a candle and it's like in some rainforest sitting there and i don't know isn't that that's dangerous that's irresponsible of that no i think women women love that like when your candles in the forest you know all those marketing i guess if it's a rainforest it's wet so it's not going to ignite but anyway so then you've got a that like look at that so it's taken uh let me bring up the original so it's taken this image here yeah and then it's uh removed the background the background and i can kind of demonstrate that like how good it is boom it's gone and then i asked it and this is using that um tripo 3d now turn this into a 3d object and this is where i reckon it gets really cool it's pretty amazing because these assets seem like they could be immediately used in game like video games or like visualize like think about someone who does like interior design i mean they could more or less be fabricating furniture and stuff to rotate and put inside a virtual house to demonstrate to people like if you're like a kitchen designer or a you know whatever you could actually go oh well let's look what the you know the Bosch dishwasher 3.0 looks like in your new kitchen and actually put it in there. Yeah, and look at this one. So I actually stole this idea from your fish shoe, which you should release. I should have sent you a link. My shoe was much better than this. Mine looked like an actual, like, colorful trout or something as a shoe, and I'm going to patent it. It's really good. But, yeah, see, like, it's a concept or just coming up with new creative concepts and visualizing them and then throw this because it's a file that you can... I think Nike or someone would release a product like this, right? Like where you can just develop a concept shoe and then print it out in a factory in China or something. Yeah, like design your own shoe. And then, so anyway, what is then cool is you can, so you can, with this 3D model, you can then go and like 3D print these parts or concepts. So like you can now go from creating an image with AI to turning that image into a 3D object to then printing that in a 3D printer and creating something from scratch. I mean, like, it's pretty amazing how just that democratization of creativity or, like, making stuff. Like, this is almost like the Star Trek replicator, right? Like, you can now imagine something, ask the computer to create that thing, create a file and then potentially 3d print it i was thinking imagine if like the you know how you had like your fitness tracker in there imagine if you had one that monitors all your health and vital signs and mineral levels and stuff and then it has a 3d printer that can output like supplements and so it's like here is the ultimate pill for you to like supplement your body on all the crap you're missing yeah we might get to the the replicator i took this concept here as well and made a 3D spaceship with it. Anyway, I had way too much fun with this and spent a lot of time mucking around with it. So let me now reveal the bill. The bill. Well, the bill didn't come from these ones, right? It's all VO3 that's the real cost. Yeah, so here's... I actually just did... I don't know what this is going to be, so this could be highly embarrassing, but here's the Batman video. and me as Batman. Very cool. Was it worth six bucks or whatever that just cost me? I'm not sure. I think so. It's pretty good. But it's clearly showing the sort of continuity of the data between the different tools is what's really exciting. You can ask one specialist tool to do one thing and then bring it to another specialist tool and do something with it. And I imagine there's a lot of industries where you're working with relatively old technology that needs inputs that can then be generated by more advanced modern AI outputs to really make things that couldn't be made more efficient much more efficient. And so when those outputs become API calls into another system or become some sort of actuator in the real world, There's a lot of potential with this combination of all the different tools. And I think that's why we're seeing just this explosion in MCPs and the combinations of them, because it's so useful and the models themselves, and this applies to basically all of them, are really good at that taking stuff from one tool and putting it into another one. Do you think that, see to me, and we were saying this before the show about MCPs, because now we've released them. A lot of people are trying to make their own and discovering like we did just how hard it is to make them and deploy them to people. Like it's not easy. The protocol, like there's been so many iterations now of how you serve them that I've lost track. And then authentication's a bit of a mess around them. They proved during the week at some InfoSec type conference that they could exploit some of them in certain ways as well. And so, you know, there's a lot of negativity around it as a standard, but then also it's hard to see it not being a thing in the future right now, especially how the models are. Yeah, what I said to you before the show was, remember when GPTs came out and we were like, oh, okay, this is the future. They'll have a store on OpenAI. People will make entire businesses out of building custom agents for people or custom assistants or whatever they are. For people, people will license those, pay them money, and then use it for their jobs. But it sort of went to nothing because we all realized they were really just prompt engineers, like just making good prompts. And then people were like, well, I can kind of just do that myself and I don't need all this stuff. I think it's different this time. I think the concept of groups of tools that work with your models for the reasons we just said are incredibly powerful. And I think that ultimately we're going to see a new generation of MCPs, which are almost certainly being worked on right now, that are far better than the current ones. The current ones are very immature. Most of them have just been made by enthusiastic people who just want stuff. Like they're like, I really want to be able to access my emails. I really want to be able to access my Cloudflare data or whatever it is So they just bashed one out And when you look at the code for all of these things it like four months ago six months ago like last updated and they're just these static things that just are a nice idea but not fully fledged. And I think the weaknesses in them now are far too many tools. So they've just thrown everything at a wall and are relying on the model to make good decisions. the models themselves don't have good instructions about when to use each of the tools and when they do how to use it and moreover when they do use a particular tool the most appropriate output type like for example some tools are really just part of a chain it's like you're you've really just got to manufacture your output do your thing dump the output you're done but other ones really have a logical follow-on so if for example in your image ones it's like well you don't just have to make the image. You need to output the image. You need to find a way to get it into the user's hands. And so I think what we'll see with the MCPs is them getting more sophisticated and mature. And then I think the other major thing is going to be official providers of MCPs. So not just some dude on GitHub, but like the companies themselves or forward-thinking companies themselves saying, here is a paid, metered MCP. You plug it in with the URL. It's got discoverable OAuth in it. So the auth is just sorted, as long as you support that. And this is the official one that works the best with our product. And then I think once we hit that, it's going to get incredibly powerful. And I think that's when we're going to see, like we discussed, the depressing world of MCP optimization, where your image background removing tool is competing with five others. So the AI picks yours over theirs and that kind of thing. I think probably neither. I think it's really about the client. Like in our case, it's sim theory, but in some people's cases, it's like cloud desktop and open AI or whatever the case may be. But that's the point, right? So the tools will produce output, but the thing about it is again, because the MCPs aren't that sophisticated, the output they produce often isn't appropriate to be fed back into a model. So for example, say you're crawling web pages. Web pages are full of junk. There's so much crap on there. And if you just get the raw output of a web page and go, that's my tool result, and then shove that back into the model, the model is going to fall over because it's way too much data. It uses too many tokens. It dilutes the meaning of what you're trying to do. And often it just exceeds the limits of the model. So the client in the middle needs to go, okay, which part of these results are actually relevant to the answer? Then you've got the case, like you just said with Interpreter. So Interpreter comes back and says, okay, I've gone off and produced this chart or this Excel document or this PowerPoint or this C file or whatever it is, here's the file. So what do you do then? Do you just give the raw file back to the model and then expect the model to re-output it in whichever way it normally does? Or do you as the client intercept that file, provide it to the model, but also output it to the user? So there's a lot of, like, you're right, it's a whole other step in this process that has largely been ignored because it's not really the responsibility of the model providers. It's a responsibility of the system that is actually running the tools and therefore it needs to decide. And I agree with you. I think there's significant scope to improve in that area in two ways. One is how is it presented to the user in a way that's ultra convenient? So in our case, for example, when we get a music file, we will show a visualizer, show the lyrics if it's a song. If it's an image, generating a detailed description of that image and allowing you to zoom in and download different versions or reuse it in future tool calls and stuff like that. You know, if it's a file, like a document, being able to actually then edit the document or sync it over to Google Docs or the Microsoft, whatever their drive thing's called. And so I think that there's a big scope there. But then the second part is how do you then turn those tool results into something, you feed back into the model. So then as your session with the model continues, it understands what has gone on. It understands what you've done to the file, what access the user has to it, and which pieces of that file are relevant to its subsequent operations in the thing. Because the last thing you want to do is get some massive result from a tool call and then include it in every subsequent request as if it's just as important as everything else when you may have only needed small amounts of information from it. And I think this is where we see in longer sessions with tool calls, the quality degrading and people running into errors and other unexpected things that seems like they used to work, but now they don't. And I think that's part of the problem is we need this level of curation there around the way tool call output is handled. So for that last part there, I had my microphone on mute, so we cut it up. So Chris just kept speaking. I do apologize for that. one point I did make in the interlude there was what I think is needed from the next models including like a Gemini 3 tune and we were talking about that you as the listener didn't hear is this idea that the models that don't support MCPs well seem lesser now and that the improvements in you know in the latest Claude Sonnet or Claude Opus with tool calling really do make them still stand out and how what I really want to see from Gemini 3 is the tool calling, the async tool calling, the inline thinking, the internal clock speed of that model. And I think these are the most important things that the labs can focus on right now because if intelligence is not necessarily plateauing, but at least our perception of the models is somewhat plateauing and they're getting like some incremental gains where they feel a little bit smarter like GPT-5, that's probably the thing to focus on next. That's where we need a lot of improvements. Yeah, and I think some of the tooling around that stuff will help as well. To some degree, allowing the people using the tools at a tech level to just dump stuff in there and having the library or the model actually be able to cope with things and clean it up itself, that's probably going to help a lot too in the perceived quality of the models and that that's where gpt5 kind of disappoints me it just feels like the the mcp integration was rushed or not it's not like deeply trained into that model because gpt5 is really a collection of existing models that already struggled with mcp and so that's where it feels off like you can tell it was geared towards deep research by doing all the tool calling up front and then it doesn't really called tools later in its sort of like internal clock and to me that that's why it still feels off like it like it it's sort of like the design of the boeing 737 how it's never changed since it was first came to be and like they've had to like flatten the cowlings on the engines to fit bigger more fuel efficient engines under it because it sits so low to the ground and i think the it's i don't know this is a weird metaphor but the gpt's now feel like the boeing 737 like it's reliable it's good unless you know it's the max um and but it really needs like a new sort of um foundation and for the mcp world it feels like it needs a new foundation it's understandable for the reasons i said earlier which is i don't think anyone like tool calling was around for ages before the mcp concept came in and i think some people used to tool calling for whatever systems they were making because it was convenient but a lot of people like us sort of implemented our own tool calling where you'd get the AI to output certain tokens and then do things based on those tokens. And then when the MCP concept came out, you're like, oh, actually, this stuff works really well. And then everyone realized how you could combine these things to get really good results. And so I think not everyone would have assumed that that would have taken off to the degree it has, and therefore takes a while to make these models. They probably just weren't ready for it. And the next iteration we'll see that so final thing i have promised for quite a while i would work on a sundar pendant this is my current 3d design that i'm modeling out with the ai um so this is the sundar pendant i don't know if it'll look like this but i am actually got a 3d printed concept because i want to try this out um and i'm hoping that gemini 3 really delivers when it comes out soon so that uh this will be immensely popular yeah i'm sure it'll cost this will be a huge loss in terms of manufacturing maybe if google could give me some of my vo3 dollars back i could afford to actually you can either make one video on vo3 or have this pendant yeah okay uh wrapping up because we are both chronically tired if it hasn't shown from this if you've made it this far you can probably tell we have had very little sleep this week and are operating on fumes too busy making seductive photos of me as a homeless computer programmer in a 1995 cafe yeah or um or designing new shirts and trying them on fake models that i've created as well here um make ai great again um all right so yeah any final thoughts gpt5 it doesn't seem like you're fan you seem more of a fan on launch day than now a week later to be honest i think for the reasons you just said i've just been continuously testing mcp based stuff and i just use whatever's the fastest and calls the tools the best and right now that's not gbt5 it's too slow and it doesn't call the tools the best so i think that that's part of the problem for me i think if i was doing a different kind of work where i needed say better decision making or something like that i might have given it a better chance so i i think my opinion is probably a little bit biased against it for no fault of its own all right so one more thing before we go final lols of the week but alexander wang who was at the scale ai company that zuck bought for a couple of billy just for him basically because he's like a cool tech bro um they have been poaching open ai researchers um from the gaggle you know when they announce something the gaggle gets marched out it's like a shopping catalog yeah so he's he's basically like shopping now on the open ai live streams going oh that guy seems like he's a good contributor to this gaggle um and then it sounds like they're just making them offers and so then but it's pretty savage this this post on x um someone said sorry the person who was poached that after a great time at open ai we recently joined meta super intelligence lab so this gaggle actually came as a three for one package like you get one of them the whole gaggle goes yeah and then alexander wang posted welcome to the team some may recognize them from various recent live streams smiley face like absolutely savage so this is the point it's gotten to where uh where open ai um team members are now being poached and it seems like a pretty like small industry to burn your bridges like that you know like if if you get fired from there, then what? Do you reckon they even care? They're getting paid hundreds of millions of dollars. Like none of these people would care. Yeah. Burn any bridge you want. It doesn't matter. All right. That is it for this week. We will see you next week. If you want to play around with some of the 3d stuff and print your own pendant, you can check that out on, on SIM three. Now we'll see you next week. Goodbye. Thank you. promised past level thinking got a corporate drone now everyone's crying wishing they left gp take for alone 15 seconds loading just to get a bland reply while the hype machine keeps spinning Another lie GPT-5 You're supposed to be so smart But you ripped our workflows apart Riders busted Responses are flat Open AA What the hell was that GPT-5 The upgrade that went wrong Now we're singing this sad indie song They took away our GPT for a real friend Said trust us, this is not the end But the new model's got no soul to find Just a beige zombie with a corporate mind Benchmarks like the users cry Another launch where promises die GPT-5, you're supposed to be so smart But you ripped our workflows apart Ronald's busted responses are flat Open AA, what the hell was that? GPT-5, the upgrade that went wrong. Now we're singing this sad indie song. We're working on fixes and tweets in the night while subscribers cancel and put up a fight. The rooters confused, doesn't know what to do, sends complex queries to models that haven't got a clue. GPT-5, you broke our hearts Tore our favorite AI apart Promise us the moon and stars Deliver us these battle scars GPT-5, the hype that fell so hard Left us singing in our backyard Maybe next time test before you ship Save us all from this sinking ship
Related Episodes

GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie? EP99.28-5.2
This Day in AI
1h 3m

#227 - Jeremie is back! DeepSeek 3.2, TPUs, Nested Learning
Last Week in AI
1h 34m

ChatGPT is Dying? OpenAI Code Red, DeepSeek V3.2 Threat & Why Meta Fires Non-AI Workers | EP99.27
This Day in AI
1h 3m

Claude 4.5 Opus Shocks, The State of AI in 2025, Fara-7B & MCP-UI | EP99.26
This Day in AI
1h 45m

Is Gemini 3 Really the Best Model? & Fun with Nano Banana Pro - EP99.25-GEMINI
This Day in AI
1h 44m

Are We In An AI Bubble? In Defense of Sam Altman & AI in The Enterprise | EP99.24
This Day in AI
1h 5m
No comments yet
Be the first to comment