Is Haiku 4.5 really THIS good? OpenAI's Erotic Mode & Are MCP Apps the Right Approach? EP99.21

This Day in AI • Michael Sharkey, Chris Sharkey

Thursday, October 16, 20251h 13m

Spotify Apple

Is Haiku 4.5 really THIS good? OpenAI's Erotic Mode & Are MCP Apps the Right Approach? EP99.21

This Day in AI

0:001:13:31

What You'll Learn

✓Claude Haiku 4.5 is a new, smaller and cheaper model from Anthropic that performs well on coding and agentic tasks, rivaling GPT-5 and Gemini 2.5 in speed and capability
✓Haiku 4.5 can effectively debug MCP applications and handle complex requests involving multiple tool calls, making it a valuable tool for productivity-focused use cases
✓Haiku 4.5 can one-shot the creation of a Mac-style desktop environment, suggesting it may be a strong competitor to the highly anticipated Gemini 3 release
✓The model's pricing is reasonable, with $1 per million input tokens and $5 per million output tokens, and it is available for use in Sim Theory without affecting token limits
✓VO3.1 from Google was compared to VO3, and the hosts found VO3 to be the superior model, despite the hype around the newer version
✓The hosts emphasize the importance of speed and cost-effectiveness when working with AI models, particularly in the context of MCP applications and other productivity-focused use cases

Episode Chapters

Introduction

The hosts discuss the latest developments in the AI landscape, including the anticipated release of Gemini 3 and the new Claude Haiku 4.5 model.

Claude Haiku 4.5 Capabilities

The hosts explore the impressive performance of Haiku 4.5, particularly in tasks like debugging MCP applications and one-shotting desktop environments.

Haiku 4.5 Pricing and Integration

The hosts discuss the pricing and token limits of Haiku 4.5, as well as its integration with Sim Theory.

Comparison to Other AI Models

The hosts compare Haiku 4.5 to Gemini 2.5 and Claude Sonnet 4.5, and also discuss their experiences with VO3.1 from Google.

Importance of Speed and Cost-Effectiveness

The hosts emphasize the significance of speed and cost-effectiveness when working with AI models, particularly in the context of MCP applications and other productivity-focused use cases.

AI Summary

This episode discusses the latest developments in AI, including the highly anticipated release of Gemini 3 and the new Claude Haiku 4.5 model from Anthropic. The hosts compare the capabilities of Haiku 4.5 to Gemini 2.5 and Claude Sonnet 4.5, highlighting its impressive performance in tasks like debugging MCP applications and one-shotting desktop environments. They also touch on the pricing and token limits of the new model, as well as its integration with Sim Theory. Additionally, the hosts share their experiences using Haiku 4.5 for various AI-related tasks and provide insights into the current state of the AI landscape.

Key Points

1Claude Haiku 4.5 is a new, smaller and cheaper model from Anthropic that performs well on coding and agentic tasks, rivaling GPT-5 and Gemini 2.5 in speed and capability
2Haiku 4.5 can effectively debug MCP applications and handle complex requests involving multiple tool calls, making it a valuable tool for productivity-focused use cases
3Haiku 4.5 can one-shot the creation of a Mac-style desktop environment, suggesting it may be a strong competitor to the highly anticipated Gemini 3 release
4The model's pricing is reasonable, with $1 per million input tokens and $5 per million output tokens, and it is available for use in Sim Theory without affecting token limits
5VO3.1 from Google was compared to VO3, and the hosts found VO3 to be the superior model, despite the hype around the newer version
6The hosts emphasize the importance of speed and cost-effectiveness when working with AI models, particularly in the context of MCP applications and other productivity-focused use cases

Topics Discussed

#Gemini 3#Claude Haiku 4.5#MCP applications#AI model performance#AI model pricing and token limits

Frequently Asked Questions

What is "Is Haiku 4.5 really THIS good? OpenAI's Erotic Mode & Are MCP Apps the Right Approach? EP99.21" about?

What topics are discussed in this episode?

This episode covers the following topics: Gemini 3, Claude Haiku 4.5, MCP applications, AI model performance, AI model pricing and token limits.

What is key insight #1 from this episode?

Claude Haiku 4.5 is a new, smaller and cheaper model from Anthropic that performs well on coding and agentic tasks, rivaling GPT-5 and Gemini 2.5 in speed and capability

What is key insight #2 from this episode?

Haiku 4.5 can effectively debug MCP applications and handle complex requests involving multiple tool calls, making it a valuable tool for productivity-focused use cases

What is key insight #3 from this episode?

Haiku 4.5 can one-shot the creation of a Mac-style desktop environment, suggesting it may be a strong competitor to the highly anticipated Gemini 3 release

What is key insight #4 from this episode?

The model's pricing is reasonable, with $1 per million input tokens and $5 per million output tokens, and it is available for use in Sim Theory without affecting token limits

Who should listen to this episode?

This episode is recommended for anyone interested in Gemini 3, Claude Haiku 4.5, MCP applications, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

Join Simtheory: <a href="https://simtheory.ai">https://simtheory.ai</a> Use "SIMLINK" to get 30% off Pro & Max annual plans until Oct 31st 2025 ---- CHAPTERS: 00:00 - Gemini 3.0 HYPE with "make an OS" 03:50 - Anthropic Releases Claude Haiku 4.5: Initial Thoughts 11:57 - Veo 3.1 and new modes (first frame/last frame & reference to image) 25:20 - OpenAI's Erotica Mode & age verification thoughts 34:25 - OpenAI Partners with Everyone & Memes 35:38 - Salesforce OpenAI Partnership & What Should SaaS do with MCP apps? 1:09:25 - Final thoughts, Polymarket ---- Thanks for your support and listening to the show xox

Full Transcript

So Chris, this week many people were suspecting that you would have fixed the bookshelf, but no, you've gone a level worse, you've got a beard, you look like absolute. And here we are. So huge news this week, absolutely huge news. There's a lot of speculation about a new model coming out maybe next week, maybe next week. Merch packs have arrived. Gemini 3 could be launching next week. and the internet is going absolutely wild. People are using, I think it's an AI studio, you can basically go in and there's like an A-B test going on, a sneaky A-B test, where we believe they're getting access to Gemini 3. And what that has led people to do, for some reason, this is the new benchmark, they're benchmarking or one-shotting the ability to recreate the Mac OS X desktop experience or the Windows desktop experience in a single shot in like one clean HTML file. And this model is really delivering. Like, check this out. Like, for people that can't obviously see, because most people listen to the show, it looks fully like Mac OS X. I can open text files. I can open the web browser here, and I'm on the Wikipedia website. I can resize it and move it around. It's pretty phenomenal. And it even has the bouncy ball effect at the bottom of the screen. there's a there's a full on terminal so you can even do commands like ls to see the uh the folders available i don't know if i can go into them let's try removing your files i can't do that uh so it you know there's a bit of a foul i can change the the background color but anyway it's pretty good here's um another one and this one is just a comparison to gemini 2.5 pro because people have been getting really overly excited about this um and and i thought just as a bit of a reality check someone also put out there another version um this version was done in gemini 2.5 pro and i think you know it still looks pretty nice it's still pretty amazing right like it's still really good um you can browse around and and you can even open python files in a code editor i would sort of say functionality wise this one's better but it might be to do with the prompt but anyway, I think if this is an indication of the polish, the level of polish and detail and output that this model can deliver, assuming it's releasing next week, it's going to be a really exciting week. And I thought, given the speculation around it, maybe we could just put out our final Gemini 3 wishlist quickly. Gemini 2.5 is already pretty good. Like, it's kind of hard for me to think of what else you'd really want. I mean, I don't know, will they make it 2 million context window perhaps um cheaper would be really nice i think that's probably the biggest one for me make it cheaper i think the the tool calling needs a lot of work but i i would say either we have improved it in sim theory or they have been sneakily releasing like minor updates and not saying anything uh but the simultaneous tool calling and like the agentic flow of the model could use a lot of work right now it's i think it's truly a turn by turn like a turn-based model instead of having that feeling like a Claude Sonnet or the new Claude Haiku, which we'll get to in a minute, where it has that internal clock and it feels just agentic at its core. Yeah, I agree. I think I tend to use Gemini more for coding and I use Sonnet more for agentic stuff. So maybe just I'm naturally going that way for that reason and Gemini 3 might fix that. So anyway, we'll report back. obviously I'm assuming it's next week that's what everyone's saying but let's move on now to a brand new model, brand new model, hot off the press, it only came out I think 8 hours ago now introducing Claude Haiku 4.5 and this is the younger sibling of course to Claude Sonnet 4.5 which was released I think 2 or 3 weeks ago now and so it's a smaller model, it's cheaper, we'll get to pricing in a minute but in terms of where it sits in terms of the software engineering benchmark they've got up here it's somewhere between Sonnet 4 and Sonnet 4.5 so it seems to be weirdly performant I mean I never really trust benchmarks but to say with agentic coding it's on par with say like a GPT-5 or Gemini like an ahead of Gemini 2.5 is kind of crazy for the size, price and speed I used it this morning quite extensively to actually debug MCPs live. So what I'll do is when I'm adding a new MCP, I'll actually, when there's an error, ask the model, hey, what was the error? And then I'll say, here's the code. Here's your code. Can you just explain why this broke and fix it, please? And it was really competent at that, like really good. And then because I was working with MCPs, I get these really long requests of like testing every single tool in every single combination and give it to the model to do. And Haiku was able to handle that perfectly. And it's really fast as well. So it was quite refreshing and good to use. Yeah, I mean, we haven't, as you say, had that long to play with it. So it's hard to know just yet. But I would say it, like, if you compare it, I think the closest comparable for me, not necessarily on price, but just on where it sort of sits in the food chain, would be Gemini 2.5 Flash, especially that newer tune of it. And I think that while it's a really good model, it seems to struggle sometimes in the feeling of intelligence. like it seems really dumb sometimes whereas just in my limited tests with Claude Haiku 4.5 it never felt stupid to me it never felt like a lesser model in fact I think it kind of makes me feel as though the the performance and price gains of some of these models when they come down like things could be really good like if you can get the like you compare it to GBT5 say in terms of tool calling and it's so much faster it's like if you're working with mcps like why would you bother with gpt5 i agree i think when you work with mcps extensively the speed is a massive factor like having to wait for every step to take so long just to get to the point where it's actually doing the work it's not worth it and when you're doing like operational things where context is everything and just calling the tools important the speed is what actually makes you more productive over um it being slightly more intelligent, but taking ages. Like the whole idea here is productivity gains with using these things. So speed actually does matter. It is actually as crucial as the other stuff, not to mention price. Like if you're working with huge amounts of data, you don't want to be constantly stressing about how many tokens you're using and doing it. And with this model, you don't have to stress at all. So the pricing is not, I mean, not where I think it should be if they want to be really competitive, but it's still pretty good. so it's a dollar per million input and five dollars per million output token so i think it does have that anthropic cord premium like the sonnet and opus models uh but i still think it's pretty reasonable for how performant it is and i think naturally you do want to compare it to a gemini flash but i mean these are these are sort of somewhat worlds apart in terms of costs now. So it's another really good option in the toolkit. And I think, as you say, for MCPs and agentic use cases, I don't see why you wouldn't use this over, like, say, Claude Sonnet 4.5. The one downside I see is it doesn't have the beta 1 million context window that Sonnet has. So it's only 200,000 context. It does have 64,000 output, which is very high, but only 200,000 inputs. So that is one limitation that Gemini Flash doesn't have because it's a million and Claude Sonnet doesn't have because it's a million. So depending on what you're doing, that may be a blocker for you. But if you can deal with that, it's worth it. So my feeling, though, is it's just a more optimized version of Claude Sonnet 4.5. That's what it feels like. I cannot tell the difference so far. So that's a pretty good sign. And for people who do use Sim Theory, we have made this available as non-frontiers so you can essentially use it as much as you like and it won't affect your your token limits at all so to have a model this good i mean essentially somewhat for free um is pretty mind-blowing um now i do want to talk also about its ability because i think this is like the new benchmark right its ability to one-shot a mac oss os style operating system. So I put it to the test. I said, make a Mac style operating system where I can open a notepad app and draw in a paint style app. And I just wanted to get a comparison here to, if this is Gemini 3, if it really is indeed that, what would Haiku, a pretty cheap new model, do? And look, it's not as stylish, but it's still pretty good here. I've got it up on the screen. I can open my notepad, I can take a note here, I can move the windows around I've got my paint application I would actually say it's paint application better so I don't know if this sort of puts water on the flames of the hype around Gemini 3 because you can see quite easily Haiku did this and it did it honestly It's not the best benchmark given that they can all just do it it's all yep no worries well i think people like to like me like look at the taste you know the tune and uh you can like you can really see that one i mean this is unbelievable and like the the detail and the icons it's very very impressive so we'll wait and see but but anyway haiku uh passed the uh the new os one-shot benchmark the other one i looked at was its ability to uh just like do the simultaneous tool calling um and that's where it'll go off and call like multiple tools at once in this case i just because i had my local setup i didn't have that many mcps installed so google was its only option but it's still called google multiple times looking at various angles around the ai ai news in general um it was very very fast um it formats things really nicely as well i even got it to create a document it created a document and um and was able to put the sources in in the format i wanted so it's like i'm really impressed i'm i'm gonna i'm going to just daily drive it for as long as i can possibly stand outside of probably harder problems to give you an example of what i was doing i was testing all the microsoft mcps so like sharepoint outlook calendar um those kind of things planner and to-dos and stuff like that and what i do is i upload a file to my one drive and then i say can you download that file and then email it to me with sort of a summary. And then what it is, it's my electricity bill. And then the AI sort of admonishes me for how much money I spend on electricity and gives me all these analogies and like big red letters, like you need to do something about this. And Haiku did it just as well as the other models. It was great. And it did it all with simultaneous tool calls as well. Yeah. So that's the thing. When you're dealing with that kind of important use cases, like getting critiqued on how much electricity you use, it's great. It works tremendously well. So I also have been playing around with VO 3.1. Last week we got Sora and Sora 2 Pro, and those models, I feel like, and I'm sure maybe, you know, put your hate in the comments below if you disagree, but I think they sort of got somewhat revealed for what they were when they released the API without the watermarking and without the tune of the sort of TikTok hilarious video, which I'm not discounting at all. I actually think those videos were really good until the copyright restrictions hit. And so then I just started using it against VO3, which I tried to show on last week's show, but unfortunately OpenAI's new ATSDK. Your computer melted down or blew up or something, right? Yeah, it fully crashed my computer. So I wasn't able to show you, But yeah, the comparisons were pretty meh. Like VO3 in general, even though it's a little bit pricier, but not much, just far outperformed it. So Google, of course, pushed out VO3.1. And you might think, oh, it's just a slight improvement. But no. So there's some new features. And the new features are now you can have a start and end frame. So you can give it where you want it to start, where you want it to end. and it'll like fill in the bits. And so I was able to produce a video, which I showed you earlier, of I gave it two images, me with sunglasses on, sitting at this exact desk, and then without. And I asked it to create a video where I like put the glasses on. So here's the video. This is VO3.1 running on Sim Theory with first and last frame. This is VO3. So you can see it does pretty well. This is the, keep in mind, this is the lower quality model-like version of it because it's cheaper and I'm cheap. But yeah, so it goes from the first frame to the last frame. And the last frame, obviously, was me putting on my sunglasses. So it's not perfect yet, but you can see how this could be used for online advertising or just like e-commerce websites where it retains the product really consistently. I think that the only weaknesses for me are things like your teeth look a bit weird and, you know, just some of the stuff around the talking. But you can tell that they'll solve those problems in no time. The ability to keep your character, the ability for it to know how to transition to the last frame is incredible. Like, it's a really major advancement. You've got so much more control over what happens in that app. so then the other thing they added which is really cool is um the ability to basically like give it a series of images and join them together as part of a scene so what i did was i uploaded my photo like my just my face and i used um the image tool first of all with character reference to create an image of me in an astronaut suit on mars right so you can see that on the screen now if you watch and then I also asked to create two more images a cinematic image so you want to look closely at these a cinematic image of the Mars landscape and then an alien and I thought there's no way this is going to work and then I said okay now use all three images for reference to videos so reference to videos the skill I just wanted to be precise in VO3 to make a video where I'm walking along a scene on Mars that looks like the Mars image than this alien appears. So it's a terrible prompt. I don't deserve the output I got. And here is what I got. Very cinematic. Here we go. Like, how crazy is that? It's the actual landscape, the same alien, and you're facing the spacesuit. And I mean, like, it's close enough to me that you'd believe it's me, right? Like, I mean, that is just so cool. Once this gets, like, obviously, we talked about this with sort of, like, longer output and more control and better tooling around it. Like, I would even just say better prompting, better tooling. You could probably put together, like, someone could easily build, I reckon, like, a really fun video editor where you could make your own, like, stories and stuff. now. Yeah, I mean, it definitely seems like maybe a bit of a marketing problem for Google in respect that like if this was OpenAI now, they would have had a keynote speaker, they would have brought everyone out, they would have had, you know, hype everywhere on X and all this sort of stuff. Whereas with VO3, the only reason I even knew it came out is because you told me, because you're into that stuff, and you actually tried it. But you don't hear much about it. Like Google isn't hyping it up the way the others are. No, I don't think it got a lot of pickup because there was a lot of initial excitement around Sora 2 and that Sora app, but then that's, you know, fallen off a cliff. And I think in a way, you know, the sort of consumer of AI is a bit exhausted. Like, there's this like, oh, wow, cool. But the wows are getting less and less, even though I think the models are improving. I think people are just getting so used to the technology now and so used to what it can do. Nothing really shocks or surprises them anymore. And therefore, you know, people either are pretty dismissive of it or the reality which i think is probably closer is a lot of these tools just aren't ready for prime time yet like they're just simply not good enough for any commercial or not any i shouldn't say any commercial use but like obvious commercial uses so they just you just dismiss them and move on but if you think about stitching this stuff together like with 11 labs training your own voice like the pro training voice version uh being able to produce audio of different characters in a video, putting yourself or just specific characters in a video, and then compiling that, like, you could do it. Like, you definitely could build this stuff. And I'm so tempted to upgrade Video Maker to be able to do, like, AI video film clips and, like, actual, like, short films and stuff. But the problem then comes to, and this is, again, another problem with these video models, is they're just too expensive. Like this video cost me $4 US, four bucks. Yeah. Like am I really going to play this to laugh with my friends And I definitely think when you think about the advertising industry like making short ads making individual clips and stuff the value probably is there for certain people, depending on what they're doing. But it's not there for what we want to do, which is just muck around with it and experiment with it, because you're going to need some sort of return in order to justify spending $4 for 10 seconds or whatever it is. And that's even if you get it right the first time. This is the problem for developers as well, right? Is the only way you can figure out if there's utility in these applications, especially as like an indie developer, is to play around with them. Or even as an end user, like if you've got a subscription to Sim Theory and you've got VO and Sora and these audio tools and you're thinking like maybe I could play around with these to get an idea if they would have practical utility for something in my business. Like you really would have to spend like a couple of hundred bucks like I do each time when I'm working on them. And I just, I think it makes them inaccessible. And to me, and maybe it's already in place, but I think Google or OpenAI should have some sort of like video developer allocation where they're giving away credits and sort of taking their hit or a loss leader on this stuff so that devs can afford to play around with it. Because honestly, if I was doing this for longer, you know, to build that video maker, I think I spent like 250 USD to build that. Now, if I had to keep iterating on that day after day after day, like that becomes very inaccessible very quickly. I wonder if they could just have a mode that just has like a heavy watermark across the top. And so you can have it in development mode to work with it, get an idea of what you want to get to. Then if you want to publish it, then you've got to pay. I understand then they'd be taking the loss, but to get the adoption there and get your model being the one that everyone wants to use, it might be an idea. The fact they charge so much says to me that they're probably already making a loss, you know, like I wouldn't be surprised if they lose money on each generation and or like, I mean, opening I definitely was with Sora and still is. And so it's like, how long can they keep like just absolutely pissing away cash on this stuff? so i think but to the haiku thing i think the optimization as you said the speed and efficiency of these things is just has never been more important because if they can get them down then a lot of these use cases start to become a reality like maybe it's like we don't need better tools we need you to work on optimization like please exactly i think that's the thing like you were asking what would we want in a in a gemini 3 and it's like cheapness and speed because if you've got those things, the models are already very, very capable. And I think some people are artificially constraining their use cases or constraining the adoption of the models because it's too expensive. So if you think about the mass rollout of quality agentic systems, the only way that's going to be possible is with affordable models, which means that when people are doing the larger scale ads, they probably are using something more like a Haiku or a flash because you just can't use a frontier model for that stuff it's just not the the payback isn't there so therefore i really feel like in some ways those models are better because of that trade-off yeah and i i increasingly think they're getting to a point at least the optimized models and i don't want to speak too soon about haiku because i tried to use after we talked about it the new Gemini Flash tune because it was so much better at tool calling. But I just got the sense after a while, I'm like, oh, this thing's just way too dumb and just doesn't interpret my dumb prompting. Like if you put detailed prompts in, I'm sure it would be really performant, but I'm lazy and I do bad prompting. So I rely on the more intelligent models to basically fix it for me. And so after a while, I'm like, there's no way I can stick with it. So I'm really interested with Haiku. If I can stick with it, And that starts to change the equation a little bit. But again, still, it's not, I mean, it's cheap, but when you can access GPT-5 at 50 cents more per million input, it's like, you know, you really want it to be fast and pretty good at tool calling. Otherwise, why would you switch? We added some other models during the week as well, most notably GLM 4.6. So everyone would be familiar with GLM 4.5, which was the previous one we had. Now we have GLM 4.6. It was really popular in the open source communities. A lot of people liked it. I've used it a little bit. It also seems a bit haiku-esque in the sense that it's cheap. It also is really good at tool calling and just seems like an all-rounder in terms of like a model. I feel like if you're going to host your own model, if you're going to fine-tune a model, GLM 4.6 is a pretty good starting point. Do you know, and someone said in the community this week, and I really agree with them, that while I think the models are improving, they're equally becoming so commoditized. but the tunes of them have such different strengths and weaknesses that if they were truly commoditized, you would just pick the cheapest, fastest model, right, and just stick with it. But you are still getting performance gains in areas where they seem to focus on in that particular model release. So I think with the Sonnet and the Haiku 4.5 series, it's really just about maximizing agentic tasks. And a lot of that work was around the Claude Code product, which I think they're making an absolute killing from. So that's why they're optimizing the models in that direction. So I do think that there is a lot of commoditization going on, but it's brilliant to be the consumer in this case of the models because you have lots of options, lots of choice, and you can really now lean into the models with the different strengths and weaknesses once you're familiar with them. And it's not going to burn you in price, unless, of course, you want GPT-5 Pro. yeah that one kills them all but also i think it means you can really build in anticipation that the frontier models will eventually become the cheaper ones and you can you can just rely on the fact that you're going to get that next gen model in there for a better price pretty soon like it doesn't have to be like a permanent economic equation that makes you lose because you know at some point it's going to work yeah and they they keep getting better and the price sort of keeps coming down not always but i i think well i mean the only issue is when they're deprecating the old ones so you have to switch to the new ones and therefore pay the higher price yeah i i mean in an ideal world you'd be able to run the very best model as fast as possible locally on your computer so you're totally in control and it's super fast inference but i don't know that day i think is a ways off and there's no money in that for anyone yeah well yeah that's true so there's no incentive to do it. Okay. So big news from OpenAI, huge news. What's the thing that we say? Insane. Insane news. Sam Altman posted this. It would be funny, sorry, but if a leader of a company actually was insane and they did stuff that wasn't in the company's or their best interest, and you're like, that guy's actually insane. He deleted all the files on their corporate server and then he burnt down the building and then he yelled at people on the streets. Yeah, I think then it would actually warrant insane. But anyway, so Sam Altman did something insane. We made ChatGPT pretty restrictive to make sure we were being careful with mental health issues. We realized it's made it less useful, enjoyable to many users who had no mental health problems. But given the seriousness of the issue, we wanted to get this right. Now that we have been able to mitigate the serious mental health issues and have new tools, we are going to be able to safely relax the restrictions in most cases. First of all, don't know what he's talking about. In a few weeks, we plan to put out a new version of ChatGPT that allows people to have a personality that behaves more like what people liked about 4.0. We hope you will like it better. If you want your ChatGPT to respond in a very human-like way or use a ton of emoji or act like a friend, ChatGPT should do it. But only if you want it, not because we are usage maxing. In December, as we roll out age-gating, age-gating, oh, can't wait. more fully and as part of our treat adult users like adults principle we will allow even more like erotica for verified adults. Now this sent everyone into an absolute spin. Open AI getting into the porn business. So firstly comparing anything to GPT-40 that is by far the worst model they ever released. It was a piece of crap. So I don't know why people are out there pining to get that back. And I agree, the GPT models definitely don't adopt the personality to the level of some of the other models. But I would say at this point, aren't we just kind of used to the restrictions? Like, yeah, it's not going to help you make a pipe bomb and it's not going to help you plan to murder someone or those kind of things. It's going to refuse those. But all the models kind of do that, right? I think everyone's over that stuff. In the early days, I really wanted an uncensored model just to see what the thing would come up with. But I don't really get it now. Like, that many people really, like, is there really that much demand from their customer base for porn? Like, surely they've got more noble goals than that. Well, there was some good memes around this. There was the meme with the two buttons and the sweaty astronaut cure cancer or erotica. And the thing is trying to decide between which one. But I don't know. Like, I don't want to criticize him too heavily here because I think what he's trying to say, but maybe framed it very poorly, is like, we heard that you want more control over the models and you feel like you have this, like, personal relationship with them. We don't really want to have too much control over the personality, so we're going to give you tools to have control. I think, great. Why not? Like, and I think 4.0 from a consumer sort of chatbot point of view, people did really like. I think from, we just look at it from a, like, actually doing real work. Um, so I think it isn't, isn't the real thing here, the age check as in, IE, we are going to ID hard ID every single person on the platform with like, you know, passports and face scans and stuff like that. Yeah. So let's talk about that because we had like, we, they introduced this into the sort of enterprise end, right? So for us to get access to the latest models, they, they hid this under the guise of, you know we don't want the chinese training on our models because then they might beat the u.s in ai and you know we'll we'll forever be enslaved or something some weird uh fantasy there and so we had to do this like we had to do the whole i had to give a blood sample pee in a cup send some dna a lot a lot of evidence that we were alive and we didn't really love it but we had to do it um to get access to like all the models and things so i at the time felt like that was disguising the fact that they, yeah, they just wanted to like verify all their customers. Now, it feels to me here, like this treat adult users like adults. I agree with that principle. And I think you should have control over the model to say like, if I'm interested in this stuff and it's not going to harm anyone and you're an adult in your own time and your own privacy and you want to go do that stuff, I don't care. Like that's fine by me, whatever. You're not hurting anyone. And I think that's good. And that's what we were calling for early on. We were like, treat adults like adults. So I think they're finally somewhat agreeing with that. But the problem I have with it is this verification. And I understand why you need to do it because kids, you don't want them accessing things or you don't want the chatbot getting all horny with them or whatever. And so it's a tricky situation. So I get why they want to ID but then you also look at the dark side of that which is like what if they get hacked? What if the government gets access to it and jason calican has tweeted and i think it's like pretty extreme but i do think it's pretty sensible about where it could go is like the blackmail is going to be fast and furious send me a bitcoin or i'm going to release your late night chat gpt sessions assume someone at open ai is reading your prompts and sharing them with their friends i i don't think i look i don't think they're doing this stuff but um it does make you wonder yeah but i mean that the The problem with that statement is to be like saying any system could be hacked. Like our national airline, Qantas, I think got hacked again this week. Like they get hacked every other week now. So like all my personal stuff's been just distributed over the dark web repeatedly by these big companies. So I think that, yeah, okay, that's like saying any company can be hacked and probably will be. But the thing that really strikes me about it and why I just get the wrong vibe from it is, and I know I don't understand the economics of it, but clearly OpenAI is positioning themselves as like the consumer AI, right? Like they on one hand will give examples for businesses, but if you look at a lot of their presentations, like the UI stuff we want to talk about soon, it's all like planning a trip, you know, riding a car, doing a presentation for your mid-level white-collar job or whatever, and then watching porn at night or like chatting to a porn bot. It's like they clearly are going all in on that sort of consumer side because to me, if anything, businesses actually want control over the restrictions on the model. So for their staff, for example, they don't want to restrict what they can do work-wise, but they do want controls in there to protect the staff members themselves when they're working and be alerted when the system is being misused and things like that. So if anything, I would have thought that maybe developing those controls more would actually be valuable rather than going, oh, you know what, screw it, do whatever you want, guys as long as we get your dna well i think i to be fair to them i think that what they're saying is they're gonna allow you to choose as the consumer and i'm i think if they have that level of control then you'd think they could deliver it at a business end but i do find it kind of weird like a couple of weeks ago it was all about him going on a podcast being like we just don't want anyone to you know commit suicide or whatever so we're you know helping guide the model in these areas to stop it which i think is quite noble but then the next week it's like oh for people without go wild um also there's parental controls and i just don't see why they have to do this like why why is the the real question and there's got to be more to it there's always an ulterior motive and it always comes down to money and power with these people and so i'm it does i get the impression They're just throwing things, like throwing darts and seeing what sticks. Like if we can get people more addicted to this thing. The question also then becomes, and I know this is probably already possible and already happening, but what happens when in a crime or something, the government subpoenas OpenAI and says, I want all this person's chat logs? Like that Kalakanakis line isn't that far-fetched when it comes to like the court's power to compel them to hand over people's chat sessions. Well, yeah. I mean, you're going to get it and they hand over internet sessions. so I don't see how it's any different. But yeah, I'm interested in what listeners think. Like, if you want to leave a comment below, like, are you pro the erotica stuff? You don't really want to put your name to that. Don't say that. Just say, like, oh, you know, I don't want age restrictions or something. We can have a safe word. But anyway, the memes have been great. So regardless of all this and the controversy that I'm sure they didn't mean it to blow up this much, The memes have been excellent. Let me show you one of them. It is Sam Altman as porn star Bonnie Blue holding up a sign 1,000. She held up this sign, so I'll let you figure out what that's referring to. But it's Sam Altman's face on her in this particular photo. And it's referencing the Salesforce partnership that was announced with OpenAI this week. and the idea that OpenAI is really just the f*** partnering with literally everyone and anything. And it seems like this has become the strategy for SaaS companies on the public market when their stock price is flatlining or in short, slow decline. And there's been a lot of good commentary around this. So breaking OpenAI to partner with OpenAI to help fund OpenAI. OpenAI up 90%. There's another one, the, like, say the line bot meme. Say the line bot, we're partnering with OpenAI, and then everyone cheers. So, yeah, it's interesting. And then another one, in case of emergency, break the glass, and the glass is OpenAI partnership. So there's two ways to look at this, I think. There's the Salesforce example, like, quite literally just waving the white flag to these guys and saying like you know take our user interactions into chat gpt that's fine like you know have access to slack um that's also fine and i think there's that part of it and then there's the other part which is any company that doesn't really have an announcement in this sort of ai world where the only growth in the u.s economy is ai if they want to grow now their stock price, they just announce any partnership, even if nothing ever happens around AI. What I would really want to know with Salesforce is all their Einstein AI stuff that they had whole events about and, like, tried to sell people on and stuff. Hey, what was it? Like, what was actually underneath it? What did it do? Was it just like a mechanical Turk with just people just, like, rapidly typing in the background? But really, they're just good at selling stuff. They don actually really have anything to sell They just sell ideas and concepts and words and then people pay millions of dollars for them It kind of crazy But I be pretty mad if I had paid for that thing and then they like oh you know what We just use OpenAI It's better. Yeah, and they had, like, didn't they have an Einstein model, too? Like, an actual, like, they were putting out a lot of open source models, and clearly, they just didn't see. They just are never going to get there. So they did the weird thing, which is like we're partnering with OpenAI, but we're also partnering with Anthropic for certain use cases. Oh, that's interesting. Yeah, anything sort of semi-serious. Anything that needs to work. Yeah. But anything that requires tool calling, we're partnering with Anthropic. But, yeah, so there's, like, integrations across platforms. And the main reason I wanted to bring this up is not to, like, read their press release, but just to show how quickly they conceded. And I'm like, is this a sign of things to come? Like all these companies conceding the sort of user interaction with their tool. Because we predicted this. We're like, everyone's just going to want to operate through their platform of choice, whether it's like Claude or ChatGPT or a Sim Theory or whatever it is, like your own custom internal chat thing. That's the frame that you'll eventually operate all this software through. And I think the MCPUI, while I'm not that certain that is it, I'm pretty sure that this is the way forward, especially being able to just get so much more done. And so they've said this Agent Force 360 apps, like what is that name, is going to be available in this chat GBT apps program. You'll have access to like CRM, Data Tableau, all this kind of stuff. Now, interestingly, there's already MCPs for all of this stuff. So I'm assuming maybe they'll just like take them and rewrite them a little bit. They're not MCPs by them, though. It's just by like enthusiasts or whatever, right? Yeah, yeah, sure. But I don't think it's that new. And I tried them. They're all awful. That's the problem. Like you almost need an officially supported one because their software is so complicated to actually get it working. And my big fear is that these big partnerships are going to lead to this walled garden effect where it's like, yeah, we have these open protocol MCP, but it's only when there's elite partnerships involved that people can actually use those MCPs. And I don't know that that's going to happen, but I worry that these top level partnerships will lead to that kind of thing. Yeah. And it could be especially problematic for organizations where they do want to benefit from the MCP, but then they can't use it in their own system. Like if they're building an agentic system, they might want to rely on the Salesforce MCP, but they can't. And I think this could be a problem. And it could, look, Salesforce and all these guys will definitely hold all of your data that you pay to store in their systems hostage. There's no doubt in my mind, like this is 100% going to be the strategy. Yeah. And it makes you wonder if a lot of companies are sitting back contemplating, like, do we have a fully open MCP, which would be the best? Or are we going to protect the data? like you say, protect the user's data from themselves so we still own it and we still have control over it. Because I reckon they really are thinking that through right now. We said this last week or the week before. We know they are. We've talked to some of these people. They're all struggling. I just didn't want to name names. But I know that we know that there's some top-level companies exactly contemplating this thing. Yeah, and so it's like, do you go fully open and just like sort of those open, like, you know, in the early days of, you know, everyone had to have an API and then slowly there was pullback where people realized, oh, like switching costs are so low because you can just suck out all the data. Yeah. I think this is the interesting path we face here in like a SaaS world is the first thing like, okay, I'm in chat GPT, I'm using Salesforce, I'm interacting with it, like we've said before, and then like, okay, now they have a database, now they have better UI. now why do I need Salesforce? That's definitely the Sherlocking that Apple does. Clearly, I mean, you just saw it make a Mac operating system. The thing could literally just infer how the app works on the back end as it goes. As it calls the tools, it could gradually just patch that back end, then build the back end as it goes. And then once it's got all the data, it's replicated your app. It would be very straightforward. And we've talked about this before, but copying SaaS apps, especially if you had access to the data like that, would be very easy now, like in a sort of automated, ongoing fashion. But herein lies the problem. If you go and build your app for this new ChatGPT app store that I assume will eventually come, which I don't know, it's probably still a good strategy in the short term for distribution, but like, you know, well, actually on that distribution front, are people really going to go into ChatGPT and go, oh, like Salesforce is in here. Great. I'm going to pick that as my CRM. I'm not so sure for those established markets, maybe like newer markets potentially. But what I think what's going to happen is they're going to look at like the top five used apps or top 10. Then they're going to go clone them all and just bake them in to chat GBT. I mean, it's just so obvious. That's well, they've done it before, right? They took a lot of early AI startups ideas and just took them like cloned them wholesale. Yeah. And look, if I was them, I'd do the same thing. I'd get my app store going and figure out like what makes the most money, what areas should we focus on. And then I would lean in heavily. And so now like you think, say you're like a productivity company or startup or CRM or whatever it is. You're thinking, well, do I lean into this or do I stay out? And if I stay out and my customer is not going to be happy because they want access to their data and they want me to have an app. And maybe they'll churn as a result of me not having an app. Or do I lean into it really heavily? I think that's going to become very common. People asking their SaaS providers, do you have an MCP? No, I'm going to someone else who does because I really need that in order to – that's how I work now. Well, I would. That's the thing. I look at the inverse of this problem, like being on the consumer or like the business user side, and I'm like, I would 100% leave the SaaS app to go to another one that did have a really good MCP. Like, look at Stripe. I think that's a pretty good example. They have nailed the MCP experience. Like, it can do everything. The only thing it can't do is, like, hard delete stuff, but I don't know if I really even want it to be able to do that, so I'm fine with that. Like, I'm happy to have to log in and do all that stuff. But in terms of... Yeah, if you were using a different, like, Square or something like that now, you probably would consider switching for that, right? Like, it can really help a business. Well, I would say the companies like, you know, Chargebee and Recurly and like all these like layers on top of the payment layer that are somewhat, I guess, somewhat competitive with Stripe. Well, somewhat directly, but somewhat indirectly at times. I think they're the ones that are going to be crunched by this, because if you're on one of them, you're like, well, you know, Stripe has this great MCP. and I don't really need that layer anymore because AI can help me figure that stuff out. What value does the app add? Is the value add really just it's a crud app with a database and a few processes, right? Like if that's what your business does, I'd be seriously worried right now, like Salesforce, for example. Whereas you look at a company like Twilio, who provides phone infrastructure, SMS infrastructure, things like that. I reckon if anything, those companies will become more valuable because they will be the end, you know, access to the real world for MCPs and things like that. Like, I actually think it enhances their business quite a lot, whereas it has the power to completely destroy something like a Recurly or Salesforce because really all their whole value add is literally just a database with some code on top of it. I mean, to be fair, there's a ton of business logic built into these applications, and Salesforce in particular, it's really just a GUI for staff on top of an SQL database. I think it's far more at risk long, long term, not short term, because there's so many security protocols, the enterprise moves so slowly. But yeah, for new startups, why would you bother if you could have a database within your AI workspace? well when you see that line item on your bill like you know 20 grand or 30 grand or whatever it is you're just like well hang on i can get most of this just working with an agent like you'd start to wonder like do i actually want this long term well i i mean i i called this on the support platforms as well like help scouts and desk i mean even even though we use help scout it's like all we really need is an MCP that connects to a shared inbox and has a database where you like allocate this email like this email ID from that inbox or subject line or whatever to a particular agent which is linked to the say in sim theory like the user ID and like all of a sudden the app is pointless like because you could just be like show me the new tickets and and maybe not even do that maybe you just have a home screen where it shows them and then you're like, okay, now go and answer them all and then you review them and then there's somewhat of a process built around that. And the crucial factor here that we spoke about briefly last week is that MCPUI protocol being added to MCP, the idea being that an MCP can specify how you display different UI elements for input and output in order to interact with it. And if you add those elements, then your idea there of having an automatically generated help scout dashboard is perfectly possible. You simply add that to its resources in the MCP and then you've got that. And so a lot of these applications actually will be able to be replicated in whichever MCP client you're using as long as they support the protocol. So it's not like it's even one that will necessarily win here, assuming they're actually open. This is why I'm bullish on these apps not being powerful for existing startups. I'm bullish on them being powerful for new startups. And the reason being is like, if you came at this and said, I'm going to build the universal MCP for support, just customer support, right? And you just have a database that can't be seen, really nice integrated UI elements and good, like all the user, like everything baked in, but it's all connected to like the chat GBT user or the users of the MCP. all of a sudden you're like well i can just buy my software through this store and i maybe i pay a few extra bucks per user a month but like it's all integrated it's all there it's it's how the new sort of ai first worker wants to work yeah and i i strongly agree with you there that there's a real need for like an mcp first approach like as in we didn't just like wrap an mcp around an existing API because it doesn't necessarily work in that way. You see a lot of the MCP struggle where like Outlook, for example, if you want to email someone, it's like, oh, I'll just look up the contact. Okay, now I've found the contact. Now I'll draft the email. Now I'll send the email. So it's doing a lot of these sort of, well, they are necessary because it needs to do them, but like, in my opinion, unnecessary steps, whereas a built for purpose MCP could actually realize use cases and have the tools as use cases rather than being, you know, just a disjointed set of tools that might be automated in another app consuming the API normally, if you know what I mean. So I think that this idea of, you know, dedicated paid MCPs, like you say, that are just experts in their area will really take off. And I think we deserve full credit when that happens and some of the profits. But yeah, so the other thing though I would say is, you know, maybe that's the shorter term. But then the next step you can imagine is like a big enterprise is like, okay, cool. We can go and like pay for all these disparate apps that we have no control over. When we know our own business better than anyone, we might have all of our data stored in a data warehouse where we like, you know, we know the data structure. We're fully aware of it. so it's like okay well why do i need any of this like i'll just build my own internal mcps hire like a couple of people internally to maintain them and now i can replace like solution after solution i can drive down my it spend completely and because it's powered by ai it's like it doesn't really matter as long as the core platforms got you know the security and the permissions and stuff baked in. Yeah. Why wouldn't you? The protocol does. That's the thing. Like if you build an MCP for your company, you can already do it with the OAuth, right? So you've got that level of security there, which is strong. You can host it yourself over HTTPS. You can IP restrict it to just the ones that the clients you want to allow. And you've got, you know, enterprise level security straight away. Like it really isn't that hard. Not to mention, as you say, think about how many organizations now would be syncing their internal databases and systems off to like a Snowflake or Amazon S3. And then they have something that reads that in and maps it into Tableau or into some other API and some other system. And they've got all this stuff just to get access to their things. And they can just cut all of that out of the picture, have an MCP for their company, and then just use it in their favorite, aka Sim Theory, client, MCP client. And I think that that is going to become really common. And the other thing is, I really feel like it couldn't be emphasized enough that if you do it in the right way, it's actually a lot more trustworthy than all of these other systems you have going on already, or at least the same. like so i i really feel like it it is the future of interacting with with big company data it also stops the problem of like where you've got all this disparate data and different systems where you're sort of thinking oh one day we'll clean that up like and you never will like the truth come on like never so it's like if you just build endpoints and it can be one mcp like that's what we have where it's just one mcp with the tool calls and the tool calls just know where to go fetch it from securely and like then you start to think about that new level of customization of software here where it's like it's customized to how your business works like the actual process that you're you're running to handle customers or you know whatever it may be and then that starts to become agentic. It has direct access to systems. It can make changes on behalf. And then the humans just sort of approving. Like, I think this is so much bigger than people realize right now. Like this, this might be bigger than all of software as a service and all of the app store and all of that stuff combined. Like, I think it could be far in excess. And especially because the models can already do this. It's really just giving them access to what they need. And like, to the point where not only can they do it, once you've got your base MCP in place, you can point it at your database or your schema or whatever the system you're using and going, hey, you know, could you add like, what useful tools could you add in here that would help me get these jobs done? And it's like, here they are, sir, and put out all the code. And you add that to your MCP. Now you've improved the amount of tools you've got. And I think the next step that we're both looking at is, okay, now we've got the tool set. We need to combine those into skills. And what we mean by skills is like procedures, like a series of when this happens, this is the procedure you follow. Then you start to build up a bank of those. And then it's a matter of time before the agency gets good enough where you can start to have a role. And the role is performing these tasks when these things happen. So when this event happens, perform this procedure or these series of procedures. When this happens, you need to seek approval and then perform the procedure or whatever it is. And I think that this is going to be the future of how businesses work. And so this idea of agency is going to come in a sort of iterative fashion, like we're going to get there gradually. It's not just going to be one glorious day where it's AGI and it can do everything. It's going to be a gradual process where just like now, we're constantly typing everything into an AI terminal to make decisions and do things, it's going to be, holy, wow, I've got like 20 of these things running and it's basically running my whole business. Like it's going to be like that kind of thing. Yeah. This is where if you're spending a lot of time using it all the time or thinking about where it will go, but also using it and trying to get there, it's a lot more obvious than what companies are necessarily presenting as the vision. And I think this is why there's this disconnect between, you know, people in a business going, oh, it's not even that good. It's not that helpful versus what people like us might see. So let me illustrate that here. So as part of the Salesforce announcement, they have this like apps and agents section and they've got like Google agent space, Claude, Dropbox for some reason, like who knows why. Notion, of course, and Perplexity Tableau. What about Evernote? But I get it. it's sort of showing this app screen and in their world, they see ChatGPT and Florida's like an app. And then the example is, hey, ChatGPT, can you turn my Q4 deck from Google Drive into a post for leadership? Use bullet points to highlight what's important and cool. But wouldn't it be better if you showed him like coordinating, like list all of the support tickets or list all of our current sales opportunities and figure out how I can progress those deals or I don know more like real use cases and that sort of takes me all the way back to what was presented at the dev day which I wanted to touch on again like we've got the booking.com app, the Canva app, the Coursera app and these are just so yeah like the future isn't apps, we already have apps, it's not helpful it's like use your apps 10% more efficiently like like browsing booking.com with an ai like going oh wouldn't it be great to go to chicago this weekend you know like it's just not helpful and this is so this is what i don't get like i i i get why they're doing it because people probably build all sorts of cool things and back ends and i know i'm slightly contradicting myself of earlier when I said like there's a big opportunity here to build like a database and build some of these startups from the ground up like if you take Zendesk and say okay I'm just going to build an MCP that just lives in say chat gbt and that's all I'm going to do I do fundamentally think that's a big opportunity right but then if you think about what these MCPs are capable of which is agentic like real work not necessarily like off in the background having full agency but having the human in the loop where it can say, okay, I'm going to call this tool and I'm going to call this tool and I'm going to do this and I'm going to do that. Like this whole apps SDK just completely goes against it. It also silos the apps to single use. Like you've got to click plus and pick which one you want to use. That is not a good workflow for productivity. Like it's not helpful. They're almost diminishing how good their models actually are by forcing you to choose. there is not a single use case i can think of right now including the the support one i bang on about email i bang on about like all of these different things i use mcps for every single day right now where i benefit from having to force select a single mcp unless i'm like producing maybe a video or image where i'm like hey i definitely want to use vo 3.1 um well remember we actually originally had that skills concept in sim theory because originally the models weren't great at tool calling so if you gave them say google search which was you know there weren't much around in those days but yeah those days like um but you had google search right so no matter what you ask the bloody thing it would use google search it's like hey how are you today i'll just search google to see how i am today you know that kind of thing so it was super annoying and we're like, this is useless. What we'll do is we'll force people to select which tool they want to use. And that way it won't accidentally call things and waste their time and be slow and all that sort of stuff. But then they made the models better. And they made them a lot better at tool calling and knowing when to do it. And then you could make better tool descriptions where the AI would follow rules about when it calls tools or multiple tools and things like that. That's a solved problem. And yet they've gone backwards in terms of the way they're getting people to interact with it. And that brings me back to the original point, why I feel like a lot of this is just sucking in the user interfaces and the use cases into ChatGPT so they can see which performs the best and then think about future applications of their own tech. Like, that's, it doesn't, yeah, maybe it's a step thing. Like, next year we get, like, multi-app calls. But then, again, like, the UI starts to become less important. And then the further confusion is, you know, Greg Brockman, right? I'm Greg Brockman. He got the sound effect. The sound effect did play. You couldn't hear it. I've been wanting that for ages. It did play. So, Greg Brockman tweeted, ChatGPT apps are very powerful and can now include full-fledged applications. So, cool. Let's look at what he was referring to. Someone running Doom in that portal. And they're like, hey, ChatGPT, let's play Doom. And it loads Doom through some Next.js plugin. Wow, that's really useful. But I don't understand why he would call this out. Why? why what what what what am i missing i think i think you're right like you've said it on previous episodes i think these guys don't actually use it day to day i think they use it for tweets and stuff because yeah that's cool and stuff it's a nice novelty but people were done playing with that like six months ago everyone's doing real work now they're not mucking around being like oh cool i can make a salt lamp website like you do it all the time you know it's like yeah we know it can do that but like i don't need to do that like i don't need to make doom again um i've got real work to do and so i think yeah i'm a bit confused by the fact that they're just so misguided with those use cases when we know it's capable of so much more it's weird and doesn't make sense but then here's another example so this guy posted an app i guess he's working on and it's like um like he calls it like running generator right so he goes in and says like find me a running I think it's like, I want to go for a lunchtime run on Montgomery Street. Where should I go? So, okay, fair enough. Play the video. And it's like, it finds two routes. I think maybe on Strava. Yeah, on Strava. And then it says add to Strava. That's kind of cool. But like, are you really going to at lunch in your workplace? You're like, I know. I'll go to chat GBT, not Strava on my phone, which is way more accessible and already has this feature on the home screen. And this is what I was going to say. The whole trade-off with AI, I think even now, even for me, even with tool calling done in the proper way, is between do I just open Gmail or do I ask my assistant to read my emails and do something with it? And most of the time, I would still probably go to Gmail right now, right? Because it's an app built for that purpose. Like I could just go in there, I'm used to it, whatever. I'm not going to do interactive steps with Gmail in an AI console. It's not helpful. What it is helpful at is, say, gathering hundreds of pages of research, preparing an incredible sales email or an incredible report, then sending the email and attaching to it. It's useful in a session. It's useful in a mass context. It's useful in the context of having worked with maybe 20 other MCPs or something. It's not useful as a bespoke targeted thing where I'm just asking it to do crap for me that I can do. Canva is another great example. Canva is really easy to use. You can go into Canva and make a presentation easily. If you want an AI to generate it, they've got that. I don't need to go into chat GPT and ask it the same question when there's this other thing there. And now you know both of us have the vision that people aren't going to go into these point solution apps in the long run. But I don't think the alternative is going to be go into like a chat box and then just enable that app and then also have a UI that I could have had in the other app, but better to do that. I just don't really see it being... It doesn't make any sense. Like, it doesn't compute for me. Like, I'm like, am I missing something? Am I just going to be way wrong here? It has all the hallmarks to me of a company that just doesn't do web development. Like, they just don't have experience in it. And it's like they're discovering everything for the first time. Like, wouldn't it be cool if we could do this? And they're like, okay, okay, go do that. And then they announce it and launch it without ever actually thinking about how it's going to be used by people in a practical way. Maybe. I just think it's like small teams trying stuff, and occasionally they launch those things that they've tried, very reminiscent of early Google. And with Google, it was like the one-trick pony was they were really good at search. and the one trick pony of chat gbt is like it's really fast and accessible and it's just there and you just like you're like oh i'll just go to chat and i think brand like brand combination um but yeah i mean maybe we'll be proven wrong and people like these apps my prediction is people will install a few barely use them and then the thing will die unless they change it and as we both know the big benefit as you say with mcps is gathering context and then taking that action with all the context and to do that context gathering you need multiple tools you need simultaneous calls for speed and you need those agentic capabilities in the model and yeah and like i'm seeing people doing custom mcps where it's doing things like quoting for their industry or it's making precise measurements and things like that based on specifications, like things that can't easily be done in another system. But most importantly, the usefulness of it comes when it's in combination with other things on its own. Of course, you can just have that tool. But it's the combination of the MCPs and the AI's ability to A, know what to do without you having to figure it out. And B, take complex pieces of data and map them into the protocol or the schema of the other apps function calls. If it can gather all the information, but most importantly, take that information and feed it into the next step in the chain, that is where you get the value. That's what saves you time. That's what actually gives you power and leverage. It's not just being able to basically manually call an API through plain English. It's just that's not the useful bit. I mean, the only counterpoint I would make to this is maybe the play here is sort of getting the user, like we said earlier, to nudge it and say, like, my end goal here is a Canva presentation. So I select the Canva app, then ask it like, hey, go off and research and do all this stuff to get to this endpoint, which is my presentation. But I guess even then, right, I want control over which MCPs it can access at any given point, but I can only select one. So then it's like it's going to go off and consult whatever it wants to in terms of researching. And you have no fine grained control over that. So I also think like, I don't know about you, but when I'm when I'm working at my most productively with AI, I get in a long session where we build up this massive shared context. We've already produced artifacts and things that go along with that. And I'm like, you know what? I said lots of context foreplay. It goes with the theme of the episode. Exactly. Foreplay before I get to the erotica at the end. And I'm like, all right, let's make a picture that will break these censorship filters right down. But I mean, it's kind of true in the sense that I'm like, well, just one more thing. Hey, let's be bold here. What if we did this now? Like, what if we actually added a new module to do this? So now that we've done this, I want to do this again. Like just to give you a concrete example, I've made an MCP for say Microsoft to-dos. I'm like, all right, well, let's take on Microsoft Planner now. You know the procedure. You know everything we've done. Here's the input information you need. Bang, make the whole thing. And because it's got all that, it's now more efficient at that task. And I think this is where we'll get to training skills and stuff later because you can do that process once, bottle it, and then use that multiple times. But now it's still possible. you've just got to get there manually. And I think that that paradigm of working just negates all of that benefit. And it just seems crazy to me that the guys who started this whole thing don't get that. But they didn't. They didn't. To be fair, like Anthropic is the MCP people, and they handle it verbatim how we are because it's the right way. And I think chat GBTs come in and they didn't really invent it, then they like smacked on this UI paradigm, which might be good. I want to try it. We're going to add it next week. So we'll see. And we'll be able to support it. Yeah, I'm not opposed to it. There's definitely cases where, like we said, for example, you've produced a draft email, you've produced a draft something or like a receipt or something, and you want to see it in a nice visual way or you need an approval slip or, okay, well, you've decided to make a video. Do you want to configure some of the parameters? Here's what I have available. you can pick, but it's like it's got to be in context, it's got to be when it makes sense, not like every time, it's not like, you're not building a UI here, the UI components are there for when they're needed. Yeah Anyway, we'll see how it evolves, I think it's just like, yeah, it's leadership needed on this to educate people about the benefits of them, and my fear deep down is people are going to go and play around with these and be like, MCPs suck, like, and quite frankly, they're hard to get right. Like, I feel like only now we've gotten to a point in Sim 3 where it's pretty stable, pretty fast. We're starting to introduce productivity tools. It's taken us, like, probably, what, two months or something to get it right. But I still think there's a long way to go, like, in terms of optimization of the tools, as I said earlier. Like, I think that some of the tools are probably less efficient than they could be and need to be rethought in certain MCPs. Yeah, but I guess my larger point is, are people just going to get turned off the whole thing because they go in and use the like Coursera app and they're like oh like this isn't like I could just I could just ask you these questions anyway like why do I need Coursera like I can have it in another tab anyway I just wonder if it'll it'll push people away from them instead of reveal the true capabilities of these MCPs especially in the workplace I think like in the consumer world they're kind of boring to me but in the workplace that's where I think you get the biggest bang for buck. Yeah, totally agree. I mean, I personally think there's going to be an explosion in it, but, you know. No, I agree. I agree. And I'm willing to sort of die on that hill. I think it's going to be a big deal. Alright, any final thoughts? Gemini 3 Rumors quad haiku 4.5 VO3.1. Well, I'm banned from Polymarket because I'm Australian and we're not allowed to gamble unless it's in a pub on a pokey other than other than that we're not allowed to do it um and so i don't know i assume google's smashing it on there on polymarket you know what like i'm pretty sure um what's the thing called again like which ai model i know it's like the one with the poor grammar which ai model end of oct or something okay so let me see if i can still access it because i'm pretty sure i can It doesn't appear in Google search, which is really interesting. Yeah, so I can still access it on Starlink. So thank you, Elon. I haven't been blocked. So it's like, which AI model? So let's have a look at the current standing, shall we? Which company has top AI model end of October? Style control on in brackets. What does that mean? I don't know. Which company has best AI model end of October? but I think that's the one. Although the other one, volume on this one's a million. Okay, so we've got Google. Yeah, I mean, come on. Gemini 3 is going to dominate and I think they all know it and we know it. It's going to dominate. I'm assuming there'll be shortcomings but I feel like even Gemini 2.5 Pro, it's the best all-round model still today and if they can knock it out of the park with Gemini 3, then this is a sure bet, which is why you're not going to make much money. No, I've given up on that. I'll stick to making MCPs, I think. All right, before we go, we do have a cool coupon code for you. So you can use coupon SIMLINK, S-I-M-L-I-N-K, SIMLINK, as a coupon in SimDiary. If you're upgrading to an annual plan or you're buying an annual plan, the pro and max plans or the family plans, you can use that coupon Simlink to upgrade. It gives you 30% off the annual plans. We've never done this before. It's a good deal. And it will also give you early access to Simlink. So if you want to be able to control, like you have a spare computer and you want to use it as a sort of MCP connection to go and do real-world tasks authenticated, either an old computer stored in your basement, a Raspberry Pi, or your existing computer, you'll be able to very soon and we're going to start slowly releasing that MCP and getting your feedback on it, picking like the best models to run on it and things like that. One of the things I'll say I love about it or I think will be very popular is just when you have to fill in forms and it sounds really basic, but I swear it's good. So you should have been delaying a security training I have to do just I'm not doing it until I get Simlink. Yeah, so the idea being that you've got to fill in this application for your kid's soccer or something, which I've complained about on the show before. And now it can just hijack another computer and go off and do that, and then you can move on. It'll come back and attempt to complete it as best it can, and then you can pick it up either off that computer directly or control that computer and get it done. So we are excited to bring it back. you can use coupon SIMLink, the promotional end October 31st. So if you're listening to this a little bit later, I apologize. But October 31st, SIMLink, if you're an existing user, it'll work. If you're a new user, it'll also work. But it's only on the annual plans, the Pro Max and Family Edition. All right, we will see you next week. Thanks again for listening. Bye. you

Share on X Share on LinkedIn

Related Episodes

GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie? EP99.28-5.2

This Day in AI

1h 3m

ChatGPT is Dying? OpenAI Code Red, DeepSeek V3.2 Threat & Why Meta Fires Non-AI Workers | EP99.27

This Day in AI

1h 3m

The 5 Biggest AI Stories to Watch in December

The AI Daily Brief

26m

10 AI Projects to Learn Gemini 3 Nano Banana and Opus 4.5

The AI Daily Brief

24m

Claude 4.5 Opus Shocks, The State of AI in 2025, Fara-7B & MCP-UI | EP99.26

This Day in AI

1h 45m

Is Haiku 4.5 really THIS good? OpenAI's Erotic Mode & Are MCP Apps the Right Approach? EP99.21

What You'll Learn

Episode Chapters

Introduction

Claude Haiku 4.5 Capabilities

Haiku 4.5 Pricing and Integration

Comparison to Other AI Models

Importance of Speed and Cost-Effectiveness

AI Summary

Key Points

Topics Discussed

Frequently Asked Questions

Episode Description

Related Episodes

GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie? EP99.28-5.2

ChatGPT is Dying? OpenAI Code Red, DeepSeek V3.2 Threat & Why Meta Fires Non-AI Workers | EP99.27

The 5 Biggest AI Stories to Watch in December

10 AI Projects to Learn Gemini 3 Nano Banana and Opus 4.5

Claude 4.5 Opus Shocks, The State of AI in 2025, Fara-7B & MCP-UI | EP99.26

Vibe Coding Renaissance of Gemini 3 | Logan Kilpatrick from Google Deepmind

AI Curator

Ask me anything about AI