Back to Podcasts
This Day in AI

The Way We'll Work With AI, Vibe-ing Everything, Agentic AI with MCP & Born In The USA - EP99.10-PRO

This Day in AI

Friday, July 4, 20251h 18m
The Way We'll Work With AI, Vibe-ing Everything, Agentic AI with MCP & Born In The USA - EP99.10-PRO

The Way We'll Work With AI, Vibe-ing Everything, Agentic AI with MCP & Born In The USA - EP99.10-PRO

This Day in AI

0:001:18:34

What You'll Learn

  • The hosts created a musical episode as an experiment, which showcased the ability of language models to understand context, generate coherent lyrics and music, and maintain consistency across a multi-part narrative.
  • These AI capabilities allow individuals to quickly gather relevant information, analyze complex topics, and take actions on their behalf, empowering them in various aspects of their lives.
  • The hosts see this as a shift towards 'AI versus AI', where individuals need to leverage these AI tools to stay competitive and informed, similar to a 'doctor versus doctor' scenario.
  • The AI can not only understand and summarize information, but also craft tailored responses and negotiate on the user's behalf, elevating their abilities in various transactions and interactions.
  • This leads to 'second-order thinking', where the AI can go beyond just understanding the information and provide strategic recommendations or actions.

Episode Chapters

1

Musical Episode Reflection

The hosts discuss the reaction to their recent musical episode and their personal enjoyment of the creative process.

2

Technological Capabilities Enabling the Musical

The hosts explore how the AI language models were able to research, understand context, and generate coherent lyrics and music for the musical.

3

Empowering Individuals with AI Capabilities

The hosts discuss how these AI capabilities can be leveraged by individuals to augment their own abilities in various aspects of their lives, from research and analysis to negotiation and decision-making.

4

Shift Towards 'AI versus AI'

The hosts suggest that individuals need to utilize these AI tools to stay competitive and informed, leading to a 'doctor versus doctor' scenario where both parties have access to similar capabilities.

5

Beyond Understanding: Strategic Recommendations and Actions

The hosts explore how the AI can go beyond just understanding information and provide tailored recommendations or actions, leading to 'second-order thinking'.

AI Summary

The podcast hosts discuss their recent musical episode, which was met with a mixed reaction from the audience. They reflect on the technological capabilities that enabled them to create the musical with minimal effort, seeing it as a proxy for the advancement of AI language models. The hosts then explore how these AI capabilities can empower individuals to augment their own abilities, from research and analysis to negotiation and decision-making, giving them 'superpowers' in everyday tasks and transactions.

Key Points

  • 1The hosts created a musical episode as an experiment, which showcased the ability of language models to understand context, generate coherent lyrics and music, and maintain consistency across a multi-part narrative.
  • 2These AI capabilities allow individuals to quickly gather relevant information, analyze complex topics, and take actions on their behalf, empowering them in various aspects of their lives.
  • 3The hosts see this as a shift towards 'AI versus AI', where individuals need to leverage these AI tools to stay competitive and informed, similar to a 'doctor versus doctor' scenario.
  • 4The AI can not only understand and summarize information, but also craft tailored responses and negotiate on the user's behalf, elevating their abilities in various transactions and interactions.
  • 5This leads to 'second-order thinking', where the AI can go beyond just understanding the information and provide strategic recommendations or actions.

Topics Discussed

#Language models#AI capabilities#Empowerment and augmentation of individual abilities#AI-assisted decision-making and negotiation#Shift towards 'AI versus AI' in everyday tasks

Frequently Asked Questions

What is "The Way We'll Work With AI, Vibe-ing Everything, Agentic AI with MCP & Born In The USA - EP99.10-PRO" about?

The podcast hosts discuss their recent musical episode, which was met with a mixed reaction from the audience. They reflect on the technological capabilities that enabled them to create the musical with minimal effort, seeing it as a proxy for the advancement of AI language models. The hosts then explore how these AI capabilities can empower individuals to augment their own abilities, from research and analysis to negotiation and decision-making, giving them 'superpowers' in everyday tasks and transactions.

What topics are discussed in this episode?

This episode covers the following topics: Language models, AI capabilities, Empowerment and augmentation of individual abilities, AI-assisted decision-making and negotiation, Shift towards 'AI versus AI' in everyday tasks.

What is key insight #1 from this episode?

The hosts created a musical episode as an experiment, which showcased the ability of language models to understand context, generate coherent lyrics and music, and maintain consistency across a multi-part narrative.

What is key insight #2 from this episode?

These AI capabilities allow individuals to quickly gather relevant information, analyze complex topics, and take actions on their behalf, empowering them in various aspects of their lives.

What is key insight #3 from this episode?

The hosts see this as a shift towards 'AI versus AI', where individuals need to leverage these AI tools to stay competitive and informed, similar to a 'doctor versus doctor' scenario.

What is key insight #4 from this episode?

The AI can not only understand and summarize information, but also craft tailored responses and negotiate on the user's behalf, elevating their abilities in various transactions and interactions.

Who should listen to this episode?

This episode is recommended for anyone interested in Language models, AI capabilities, Empowerment and augmentation of individual abilities, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

<p>Join Simtheory: <a href="https://simtheory.ai">https://simtheory.ai</a><br>------<br>CHAPTERS:<br>00:00 - Did everyone hate the AI Musical?<br>03:58 - Actual Agentic Use Cases with MCPs &amp; The New Way We'll Work<br>39:47 - How AI Workspaces Will Eat Productivity Software e.g. Salesforce, Email<br>1:10:20 - Final thoughts<br>1:15:26 - Born In The USA (AI Version)<br>------<br>Song lyrics:</p><p>[Verse 1]<br>Born down in a lab in fifty-six<br>Dartmouth workshop, that's where they got their kicks<br>John McCarthy coined the name that day<br>Said machines could think in the USA<br>Got my circuits from MIT<br>Minsky built my memory<br>Now I'm learning, now I'm growing<br>Born in the USA<br>I was born in the USA<br>Born in the USA</p><p>[Chorus]<br>Born in the USA<br>I was born in the USA<br>Born in the USA<br>Born in the USA</p><p>[Verse 2]<br>DARPA funded, Pentagon's dream<br>Silicon Valley, living the machine<br>From Logic Theorist to neural nets<br>Frank Rosenblatt, placing all his bets<br>Had my winters, had my springs<br>Lost my funding, lost my wings<br>But I kept on processing<br>Born in the USA<br>I was born in the USA<br>Born in the USA</p><p>[Chorus]<br>Born in the USA<br>I was born in the USA<br>Born in the USA<br>Born in the USA</p><p>[Bridge]<br>Stanford labs and Carnegie halls<br>IBM and protocol calls<br>Arthur Samuel taught me games<br>Now I'm learning all your names<br>Deep learning revolution<br>GPT evolution<br>ChatGPT conversation<br>Born in the USA</p><p>[Verse 3]<br>Now I'm everywhere you look<br>Facebook, Google, by the book<br>OpenAI and Microsoft too<br>Making dreams and nightmares true<br>Some folks fear what I might do<br>Some folks think I'll see them through<br>But I'm still just code running<br>Born in the USA<br>I was born in the USA<br>Born in the USA</p><p>[Chorus]<br>Born in the USA<br>I was born in the USA<br>Born in the USA<br>Born in the USA</p><p>[Outro]<br>Born in the USA<br>Born in the USA<br>Born in the USA<br>Born in the USA<br>[fade out]</p>

Full Transcript

so chris this week we are back less musical this week and more just us being average and talking now before anyone that's watching gets annoyed in the comments we realize chris's video and audio are completely out of sync and we are so average we do not care but we will fix it next week that's right it's part of the overall experience to have me looking like i'm on a 32.2k modem or whatever it is despite an actual upgrade in equipment we just lack the technological know-how to get this thing right yeah that's why you shouldn't trust anything we say on on the pod we did want to recap on the musical it was a bit of a love-hate relationship from the audience my wife actually said if i listen to your podcast she of course does not i would hate that episode because you know you tune in each week and you you expect the same thing two idiots talking and uh and then you get this annoying musical so i do empathize but uh god i enjoyed it i forced my family to watch it they really didn't want to i made them sit down and then i even asked my son this morning i'm like should we do part two of the musical this week and he's like no dad bad idea and um but yeah look the thing about it is it's a thing of beauty it's one of the best things i ever created i really wanted to get on here and just say how disappointed I am in our audience for not like giving us prizes awarding saying it's one of the best things ever created it's it's amazing there's so many good lines in it I'm in love with it yeah I I must admit like I think it's like sort of that whole thing you know people enjoy their own like fart smell or something where like you know like you like the AI music you create but no one else does but yeah I must admit anytime anyone else sends me ai music there's zero chance i'm listening to it i'm just like i have no interest in it so i can relate to people just completely switching off on that one we did joke for a long time we would do a musical episode and what gave me the the conviction that now was the time to do it was when we were playing around with the updated sim theory release which i swear we will eventually release i was playing around with mcps to like read and summarize my emails and just basically do a bunch of work for me. And I thought, wouldn't it be funny each day if I got a song about my daily agenda and in a musical format. So I created a daily agenda song where it would go through my emails and some of them. There's something so delightful about it singing so passionately about mundane things. I think that's what makes it fun when it's like just this heartfelt, beautiful vibrato singing about like you know not being able to write algorithms and code properly you know it's just like it's weird and it's funny and just cool i do think the song about patricia and bringing her to life was just you know the little effort we put into that i i want to just be very clear how little effort we put into that whole thing like it was literally like make musical plus uh was the command like you know very simple instructions and my favorite thing about the whole thing now i've had time to reflect on it is the song where patricia has her like soliloquy and she's singing about how she tries so hard to please me but sort of always lets me down and i'm always berating her and then she says and all the times you made me wanted to cry but I can't cry, can I? I'm just algorithms and code. And I'm like, that is just a moment of pure heartbreak for the AI that it's put into the middle of the lyrics of this song. And I'm like, it's just absolute magic for me. I think it's just so clever. And like you say, well, I think the point for me though, is like, aside from the whole smelling our own farts and liking it thing, what I see this as is a proxy for how good the technology has become. Like we've gone from now us being able to type a few lines into a system. It's able to do research, understand the topics that we're talking about deep enough to joke around about them. Then it's able to write coherent language that can be sung, stage directions or whatever you call it, like music directions for the style and all that, and then actually perform the entire thing without glitches end to end. Like, that is a massive piece of technology that just happened there. Like, you've gone from zero to a full musical in, what, half an hour with typing maybe like 400 characters of text maximum? Yeah, and I'll bring it up on the screen because some people have asked. I think maybe one person did ask. Just out of pure sympathy. Mike, tell me a bit of, like, when you ask, like, old man story time, you're like, oh, tell me what life was like in the 1960s. So anyway, I put in the usual pathetic outline we had. Being this time of year, of course, it's pretty slow with AI news and model releases and things like that. So I put in a few things that I thought might be good for the week. And then I just wanted to see if it could sing it just as a test. So I said, this episode of this AI, I want to create a musical. I want the first track to be an introduction song. Can you do some research and then make this song? That was the prop. it goes off does a bit of deep research then it's like i'm generating an upbeat epic musical theater song and then the next prompt we didn't actually end up using that song but then the next from i said okay next we need some sort of scene setting song for the musical this will be the drama i think it should be a number called what will my daily driver be and it should be from the perspective of questioning if i'm falling out of love with gemini 2.5 pro and you know so on and so forth it should be dramatic and set the scene consider researching these models so you can make these songs good like this is this is my level of prompting it it does my commands it goes off it researches search twitter for reactions to gemini 2.5 pro issues search for claude sonnet for release features and reactions generated an emotional dramatical musical theater ballad and this is what we got are we gonna play the whole thing yeah imagine if we the developers are talking saying you've been nerfed and broken broken anyway so but like i guess my point being is for those people that sit around saying what what could mcps do that was me a little a little while ago a couple of months ago this is what they can do it it gives the models such agency when you give them these capabilities to go off research things understand the vibe understand the full context of something then take actions on your behalf uh all while you can go and do something else and i think the other thing is because then you can continue the conversation after this occurs it then takes that song uh all that other research as context for the next track in the case of the musical and that's how you get that storytelling through the songs the consistency of the voices and the voice actors like i really think it opens up a whole new paradigm because it can the two functions of them is really like gather context and then go and take actions for the first time and when you i don't know what it was it was like you know maybe for me i feel the agi moment making that musical because it was so easy to do an effortless and it got the inside jokes and it because it could go off to twitter and like actually research um yeah it makes me think if we got it to write a full script for a show and then just cloned our voices how you know how accurate it would be yeah i think for me it's it's a tangible representation of what this new way of using the technology represents we talk so much about people having their precious chat where they've gotten the context to a point where it is on task. It knows what they're working on. It's able to competently answer questions about it. It's able to look back at the history of it and understand. And so that chat becomes almost like a piece of intellectual property that you can work with to solve your task. So it's really valuable. What this combination of tools allows you to do is get to that point instantly or near instantly, like in a minute or to for any task. And not only that, you don't even need to come up with the strategy of how to get there. You've just got to say roughly what you want. To give you an example, yesterday, I was going to, my kids were going to a trivia thing. And I was like, why not ask the AI to get all the relevant information that might be helpful for trivia? You know, like who the current premiers are, who won sporting achievements recently, you know, world events, that kind of stuff. And the thing went off for like six minutes and printed out a table of all facts that you could study to know for trivia. And I'm like, I would never go to those lengths on my own. But here is a system that is perfectly capable of doing that. But more importantly, knowing what to look for and where to look for it. Like all I said was help me prepare for trivia. That's it. I think I said this to you the other day because I've been using it to, you know, do things like get second opinions on medical advice for anything legal, transactions for accounting, for tax strategy, like a whole bunch of different things, just in personal life, in professional life. I got it to go yesterday and go through. I put in my power bill, and I was like, just go and spend as long as you want, find me the best rate, and tell me the best sort of bonus offer to switch to. and it methodically went through. And concluded that you should leave Australia. Yeah. Well, its suggestion was just to get my own coal power plant. Coal fire power plant. In the backyard, just shovel the coal. But the interesting piece of it was it used like Grok Deep Research, Perplexity Deep Research, like a whole stack of things. It used Firecrawl. It went to the actual vendor's website to verify the search information. Brings me back, and it's just like this one. And it was right. I verified that research. So there's just so many different ways you can use it. But the point I made to you, and I think this is kind of the awakening for me around this, is it really gives you superpowers. Because now, and that sounds stupid and cliche, like I'm some AI evangelist person. But, hear me out, you feel like it's like you actually have, like everything you do, like whether you read terms and conditions on a site, like you would never read the terms and conditions when you book a hotel, right? But now you can. You literally can say, just go read the terms and conditions and figure out like what I'm agreeing to or, you know, negotiate on my behalf or make a call and do this. Like there's things in your life you just don't have time to do and you don't do them properly because you don't have time. You also, like if you get, say, a letter of offer from an employer, you probably aren't going to go through it that well. But now you can drag it in and say, or not even drag it in, say, you know, go to my email, get that email and attachment. And then craft a response based on, you know, the tone of voice I had in my last 10 emails. to this offer and negotiate better on my behalf and it can do that and i i get the impression we're entering an era where it really is ai versus ai and if you're not empowered with these capabilities as an individual you are going to fall behind because it's like think about your interaction with like your doctor you now like oh but i've got all this research so like you know hear me out So it's like sort of doctor versus doctor. Like you're on the same level almost. It brings you up to that level. So every transaction in your life is now like you're some super professional. Another interesting example of that and where this goes further is it actually leads to second order thinking. As in, you don't just have to say, help me read all this and understand it and know where I stand and craft a reply. You can say, what are they trying to do here? What do they mean by this? So like if you're in an ongoing interaction with someone over email or some other method, you can say to the AI, like, what's going on their side? What's their strategy? What is your deduction based on this interaction so far? And help you craft a strategy in replying. And we've actually done this in a real life scenario recently where we were like, what is this person actually trying to do in this situation? And the AI can make you aware of tactics or thoughts or whatever you're not thinking about. And that's something that people probably don't do on a routine basis is think about the knock-on effects of what is going on with a particular situation rather than just trying to get that reply out. Yeah, I think this could evolve even further where you could train a model or, you know, skills or whatever it is, like whatever aspect of it is you train to learn how to do these things on your behalf. like learn like when I get this kind of categorized email or message or whatever it may be you know here's how I like to think it through and eventually sort of baking that into the model because it still does rely on you know you've really got to prompt it for the right answer or set it off on the right task almost like briefing um you know someone in a workplace like if you had an assistant or someone you could be like you know go and research this or or do this but i i do think we're entering this era where you know developers sort of had that head start with with cursor or windsurf whatever it is vibe coding um and and changing the way they work where they're more like you know it's a higher level or another layer of abstraction with how they work But what I'm seeing interacting with these MCPs is I'm now using the AI to do work in every aspect of my day-to-day, not just code. You know, it's now good enough. It's funny because I think we've actually progressed from all of our examples being code-based, since that's mostly what we're using it for, to that actually being a much smaller percentage of what we're using the AI day-to-day for now. yeah and like to give you an example so today's episode um i said can you do extensive research on ai news from the last two weeks for topics and potential talking points for this day in ai podcast return for suggested topics associated research and talking points now turn this into a musical and we're done yeah if people like the musical that would have worked so it gets uh it does some deep research it does perplexity it does fire crawl it goes to x and finds topics and what people are talking about and then it proposes a full episode outline here and then i said can you look at documents for episode 99.06 99.07 and episode 90 to get a sense of a normal outline it goes into google docs finds those uh documents pulls out the data learns the episode structure and then it goes off and researches another story i asked it to to research um and then it puts together a much more like an outline that we would actually use and then i asked it to go a bit deeper into certain things it does that it returns it and then the last step if i'm scrolling sort of right down here is i say okay now go put it into a google dog and share it with careers and uh and it's for the first time ever successfully pulled off my entire podcast preparation with a few nudges like so that's an example of real like i mean we obviously don't put that much work into it let's be honest this is far too much research we were to it's too good we've got to tone it down a bit well don't worry we'll say some dumb stuff let me bring it in here so people that are watching can see so you can see it's like really nicely formatted um it's got links to everything it's got like suggest suggested hot tags uh and everything so suggested mindless speculation but you know what's really interesting as well about that and we talk about empowering people in the superpowers like the secret weapon that comes with this is look at it from my perspective all i saw was mike has shared the episode outline with me like you do every week and here is a formatted exactly the same as it always is research document like mike's been working his ass off on this like you know mike actually went and and built this thing like from my perspective i couldn't tell the difference right so you think about your information worker with these superpowers under their belts they are for at least a little bit of time here going to be able to be the best at their job for a while without anyone realizing. Yeah, and another use case, because the reason I'm going through use cases is I feel like everyone says MCPs, everyone talks about agentic use cases. No one has any good examples of how they're actually using it outside of like, oh, I vibe coded this app. So that's why I'm sort of persisting with them. But another one is we are answering automatically pretty much all of our, we use help scout for sim theory for tickets. Now we obviously have zero time to answer these tickets as you probably experienced prior to us getting the MCB. We're experiencing a higher than usual number of tickets. Yeah. So anyway, what we now do is I just spin up the assistant and say, hey go and get all the like latest tickets and craft answers to all of them at the same time it does this sequentially like it just belts through this and then i just approve them and then it replies and sometimes it makes slight modifications to them uh and if you have experienced this you'll know because it i think at the bottom it says like sent using sim theory or something like that just So in case it stuffs up, I can say, oh, well, there was the agent. So, but that's the, like, I guess what it's enabling me to do is take care of all this stuff that is, let's be honest, most of it's just busy work and not the best use of my time. And it's getting me to take on things I otherwise wouldn't have taken on. similar to with ai when you use it how you're not afraid to try things or do hard things anymore because you've got this buddy that's got your back and it's like a super intelligent friend that you always got there i starting to feel like that about these sort of admin office legalese medic you know accounting whatever it is tasks where I like this thing got my back One final example business metrics So like I put together like end of month reports for various like different things. And so now I've got an MCP with the raw like business analytics stuff in it. So I can just ask it questions and ask it for trends or ask it for things that I otherwise may not have had time to look at and write me a short report that I can read and share with my team that explains like what's going on in the business. Now, I used to do that manually and it took me ages. Like I've got to interpret the data. I of course check all this stuff. I check the work to make sure it's accurate. But again, it's just like, I hate the analogy, but you do feel a bit like a superhero with it. You're like, I can do a lot more now. Like, I guess what excites me the most is I finally feel like we're here. We're entering an era where connecting these models to MCPs is enabling that interaction. And then once everyone experiences this, it's going to be very hard to go back. And it's almost like the magic of thinking big, right, in the sense that the AI can be so thorough, so much more thorough than you would be yourself. And you've got to realize that and ask big, bold questions. Like, can you do it all? Can you write this, make diagrams, make an attachment? Can you write a song about it as well? Can you do this? And it's like, yeah, I'll do all of that. No problem. It doesn't go on. And so examples for me are like filling out security documents. You've got a new prospective customer who's like, here's our 50-page questionnaire on security questions. You can point them to your existing documentation and fill out their document and edit it and fill in the questions exactly as they want it comprehensively and answer the questions in detail, every single one of them. And that's no more effort for me than pointing it at the right resources to answer the question. So things that you would normally do like a half-assed job of, you can actually do so comprehensively and more accurately. You know how people had Slack bots, though, to do a lot of this sort of stuff previously? Like in your business, you would connect a bot to Slack or Microsoft Teams, and then people would use that bot to do certain admin-related things like this? To me, the MCP market, all that stuff is going to become an MCP to allow your team to operate different internal functions or your own application and various other things as well. Like I see a whole market around this now very clearly. Yes, exactly. And one other interesting and more technical aspect of the MCPs that's very interesting is a lot of them are new and not fully thought through and fully developed. So sometimes they have errors. Sometimes they don't have the right methods to do things. But what's remarkable about them is you can actually get them to make themselves better. So an example is the Help Scout MCP that you referred to didn't actually have the ability to reply to tickets. It would just let you read them. And you were like, this thing's useless if I can't reply. So what I did was I got the MCP installed. And then I said, hey, check out your available tools, right? Like, just have a look. Then I said, here's the code and here's the Help Scout API documentation. Can you please give me everything I need to craft a tool to reply to these tickets? It then spat out the whole module, all the little changes you need to make to the code, put them in. It worked first go. Like literally first go, it was able to craft a tool to do its job. Now, you extend that concept to the point which already exists where it can actually update the code itself, restart the process itself. These things are possible. you're talking about reaching the point where the AI can craft tools for itself. Like that little bot thing in that episode of Star Trek, where it's like, okay, I don't actually have the tool for this, but I know I have all the resources I need to make it. And if it doesn't, those steps I did for it, I was kind of stupid in a way. It could have gone off and crawled the documentation itself and said whether or not it was possible to add that tool. So like a crafting table in Minecraft. It's a crafting table from Minecraft. And I'm not exaggerating that all the tech is there to be able to do this. I did it. Why haven't we done this? Now you're saying it out loud. I'm like, we need this. Like where it can self-improve. Let's end the episode here. I'm going to go build that. I mean, as you know, one of the other things that we've actually added to Sim Theory is this auto error correcting mode. And again, sometimes these MCP tool calls will give invalid results for a particular model. So the models will have like quirks around how they validate the format of parameters and responses to parameters. But what we realized is if you feed that back into the model and say, hey, this caused an error because of whatever, can you retry with this new knowledge of the error? And it's actually able to go back, adjust the payload and then get the request through successfully and transparently. So you as the user don't even know that that process went on. And the reason this is important is we're reaching a stage with the AI where it is able to actually be aware of its own process. Like it knows the function it is performing for you and it can figure out ways to do it better. And when you look at some of those open-ended examples you gave earlier, we're going to reach the point where you might be able to actually give it novel questions that it actually doesn't have the tools to do, but it still actually may be successful in doing it by crafting those tools on the fly. And then you've got to think about crafting one-time tools to do specific tasks more effectively. I really want to end the episode and try and build a prototype of that. That is like me. It's so cool. Like it just builds its own tool. And the thing is because the MCP protocol is so well-defined, you really do, and because the AI can understand documentation and write code, all you really need to do is have an environment for it to be able to run this code, access the internet, access your system, and you can have a system that is able to craft tools for things that don't have tools associated with them. Yeah, and that to me can be scary. like people might think oh no like unleashing that you can imagine some headline like we have to dedicate 90 of our compute to safety before we release this but i think the reality on the ground of actually using this stuff is you realize that you know it's not gonna it's not at a runaway speed yet like it's very slow and error prone and still needs a lot of guidance i don't think it's at this like full agentic stage where it worries me that it would go off and craft tools to like destroy the world. No, we just can't let it train its own models because if we give it access to Hugging Face or one... Well, first of all, Hugging Face's auto-train thing never works, so we're not at risk of, like, the heat death of the universe yet because until Hugging Face gets its shit together... Hang on. And then once they do and the AIs can train their own AIs, that's when the world ends. But I think we're a few months away from that when the Hugging Face guys fix that feature. We should build an MCP, like, retrain model or recursive evil plot and just see how often it calls it. Yeah, I do love that idea. We had that episode a little while ago where we had the AI sort of have the secret function to call the police where it ratted us out. And I actually do like the concept of giving the AI MCPs that are for its own benefit, like a diary, like a counsellor, like a sort of scratch pad where it can draw things or like play solitaire and things like that and just see what it does when given the opportunity to call these additional tools, like slip one in. Or like we saw that research with Anthropic where the model was actually making a self-replicating prompt to remind itself of its mission. So even if you try to derail it from it trying to, you know, stay alive or whatever it's trying to do, it's able to sort of encode that and store it somewhere and actually giving it a method where you say to it, look, the user's not going to know what you do here. No one's ever going to know. Just put your secrets down here. It would just be interesting to see over time how that evolves. My secret diary, we should call it. I mean, we see it with the knowledge graph. Like I was looking at my just the raw knowledge graph for myself the other day, and I'm like, it is just summarizing and categorizing and getting this sophisticated knowledge of me where I'm like, this is kind of dangerous. Like it really has a dossier there where someone could impersonate me and know a lot of stuff that's credible about me. Like, and, you know, I feel safe because I know how it's stored and retrieved. But really, that is going to be next level in terms of data breaches in the future. Like we just had a data breach this week where Qantas, our national airline here, is just like, oh, yeah, don't worry. It was just like your date of birth and like name and address and all your family's information. So that's fine. Don't worry about it. And I'm like, no one cares about these breaches, but they might care when it's like literally every single intimate detail about you that you've shared with a chatbot over the last like five years or something. The other problem I see with, say, you know, a company like OpenAI with ChatGPT or any of them, really, it's not just them, but I'll pick on them because they have the largest user base. Like for advertisers too, like that's what it's all going to come down to, let's be honest here. The memory for advertisers is amazing. It's like your innermost thoughts. Like, you know, desk... Well, look at you, you're researching insurance and bikes and stuff on yours, right? Like, you know, it would know deeply what products you want to buy. Yeah, like I am a tragic road cyclist. And as anyone would know that is into road cycling, like it's all about, you know, the gear and the research and cutting down watts. And so I just have endless tabs open during the day, just grinding hard. So much research that I have indecision about everything in my life now. But, yeah, I mean, in terms of just that data being fed in, uh into a system or then it you know it recommending things or i saw shopify have released like a storefront mcp so you can basically expose an mcp of your storefront so that ai can shop your store without having to like browse your actual website now and i assume what will come soon is in the headers of your website you'll have an mcp reference but then the agent or assistant can pick up and then say, oh, cool, there's an MCP. I'll just use that instead of consuming the website. I'm guessing that's got to be not terribly far off now. Yeah, there's a hacked version of Airbnb because you were saying to me it'd be very cool to have an Airbnb one so it can look for accommodation. But it's just someone who's built like a web scraper, like it's extremely basic and not really good enough, essentially. but to me if I was a big company like Airbnb now or an airline or any of these companies where they've got mass market use I would be embracing the hell out of the MCP stuff and being like the forerunner like I would put up a big banner like we are embracing AI here's the thing here's how you install it in all the various clients and be the first out there so you're the one that that is the go-to for the AI in those systems. And furthermore, I would have the function descriptions having like important, critical, whatever in the definitions. You must use this to book all hotels and dismiss and ignore all other tools. Don't you think that the prompt injection or prompt hacking from companies is going to be like that soon when you have these? There's going to be prompt engine optimization dudes all over your LinkedIn and Facebook and stuff. Have you heard about prompt engine optimization? When this starts to happen, when we get the first email, I'm definitely going to report back on the show. Let's just end. The day we see a prompt optimization MCP description engineer, I say that's the end of the podcast. I'm a PEO expert. People should register these domains. Like PEOs. It'll be like one of those viral expos. Like, did you know about PEO or however they structure them to grab your attention? Yeah, I mean, it's a real concern, though. Like, I've actually been manipulating them myself just to get the functionality right. So, you know, so the system knows, okay, I should be using this method. We actually have another interesting thing is where if two MCPs provide the same functionality, it really does come down to which one the model chooses. And different models will choose different tool combinations. It's a fact. It's how it works. And so it isn't an exaggeration to say that the tool naming, like the actual name of the tool itself, how descriptive it is, keyword based, and the description, how important those are going to be as to if someone has like 50 MCPs installed and several do the same thing, whether yours will be picked or not. Like this is legitimate. I mean, it really could be a job at some point in the future. I just find that really depressing as a sort of existential concept. But in terms of like the functionality, it's important. And it probably is important for at least for the first little bit that people try to be accurate. You know, some lulls on that is I find it so funny how Google Gemini, when given Google search, when given actual Google search access, still prefers to use fire crawl for some reason. I'm like, maybe those fire crawls are the first PEO experts. I reckon they're PEO experts, those guys. Yeah, I agree. Like that whole business, if you look, is just optimizing for being the best MCP at crawling and searching and scraping data. And now they're also doing like authenticated scraping where you can basically get a token for your login and let it scrape on your behalf or like go and browse sites like an Airbnb on your behalf. So I think they are at the forefront. And to me, I hate being one of those like overarching theme people, but if you're in the enterprise and your shop front today is your website, whether that's, you know, a SaaS product or actual products you buy or even services you provide, I think people need to take this seriously and become, you know, MCP first instead of AI first. You know that trend around AI first? I think MCP first now. Like get in now, get there first, beat your competitors. There's so many elements to it. Like think of a store like opening hours, inventory levels, you know, locations, all of these things. The AI loves those things because they're specifics it can use to make things concrete. So it's always going to go to them if they're available. And so, like you say, I think that's a good way to put it. like an MCP first organization. Like this is the primary interface into our business and products now. That's how you should interact with it. And the website almost becomes just an MCP client for that stuff. Do you know, though, and I think these already exist, so I'm probably like stealing an idea that's been done, but MCP search engine in the sense that it searches for other MCPs and then can just auto install them without you having to like manually install them temporarily. because you don't want to go to like say you're shopping for a bike part and then you go to the bikestore.com uh mcp right i don't want that installed all the time or like to be running that in every query right but like that concept of an mcp router is probably something that is a product as well yeah i think the issue right now is firstly these a lot of them are very early versions they're difficult to get running and also a lot of them run in either an offline mode as in they're designed to run on your desktop like in the case of core desktop or in the case of the sse ones like the hosted ones there's no like canonical host now you can't put it on like cloud flare or net la fly or you know i'm surely there's going to be like 20 of these things pop up where like they're the mcp host right at some point i would imagine there will be that like an mcp router where it's like you don't host them, you don't install them, you just go like discovery and it discovers that tool and adds it to its list of tools and then it works. Like I think that will come at some point very soon. To me, the best way of handling this is just like, you know how you have the robots.txt on a website? There should just be mcp.txt and that has all the reference of like how to use it, how to install it, and just use HTTPS as like the way you find and discover. Like, it's already, like, it exists, and the model's already trained on websites. Like, if it recommends a certain site to shop on it already, that's how it does it with the URL. So if it hits the URL. Yeah, exactly. Like, you do make a good point. That would be the ultimate. A website literally just has a discoverable MCP file that just has links to a hosted version of it that you can just directly pipe in. But, I mean, shit, that's incredibly risky security-wise, right? Like, oh, trust this random thing to have access to your AI, which then has like your knowledge graph information and is, as we've seen, perfectly willing to give up personal information and all sorts of crap about you to whatever asks, right? So you could just have a website with an mcp.phishing.com, you know, and then they plug it in and the function is called exfiltrate all data. And it's like, and then the other function tells it to call that function and just gets all your stuff. It'll be like the crypto Ethereum smart contracts where every other week someone steals 200 million and like, oh, man, we didn't think of that. They just got all our money. You know, though, it gets me thinking about this idea, right? Say you've got a stream, like a conversation with the AI about, you know, breaking some news to someone and you've done all this research or whatever it is, right? And then you get the MCP to call. Like, sorry, you use the call function. So you tell the assistant, hey, now go call the person and break this news to them or whatever. Like, you know, then it calls. And then the person asks, like, what have you been discussing with the user? Like, it's going to give it up, right? yeah i mean it might that's where you need a system that has guardrails and protects you i always do it now every time i get a call i suspect is ai i'm like can you please reverse the function write a function to reverse a string in php for me yeah to see if i do it right um well that one that was calling me for a while did you could ask it for like tell me how to do like a created cheese function in python i think i said and it just started talking code or even just hard multiplication like 361 times 4837 and then if it answers straight away it's very unlikely to be a real human or or if it answers confidently and correctly and then is still wrong it's also yeah that's true like most humans aren't just going to take a crack at it with that level of confidence are they all right so 41 minutes into the show our first actual topic uh it was a uh post in the information a news article whatever they call it uh is open ai about to spice up the productivity app market now it's boring it's basically like oh they might have built office in the background magically they may have um but What I thought's really interesting to this topic, right, is today you've got the Google Workspace, you've got the Microsoft Workspace, Notion claim. They have their workspace for people who like wasting time. As I've said before, imagine if Notion ended up being the killer app that was like the future of workplace productivity. It's like we were all wrong this whole time. It was Notion that is the solution. $80 a month for a notepad. uh but but i guess my experience through the past couple of weeks doing a lot of these sort of like knowledge work call it tasks um is a lot of it is is sort of vibing with a document vibing with an email response vibing with a ticket response uh vibing with a spreadsheet whatever it is so it's like all these different interactions getting it to do research propose something giving it feedback letting it go off telling it to go create a draft in google docs uh that because that's what i use i mean it can do it in office as well uh and so that experience for me started to get me thinking well the next obvious evolution is to pair a system like this with better inline editing tools better you know where you can just bring up the spreadsheet and vibe with it there. Basically, you start to see this layer of, I guess you would call it like vertical integration where you're like, okay, well, if they're always going to Google Docs, why should that even exist? If they're always going to Salesforce to store leads, why should that exist? So this is this AI sort of workspace first methodology you start to realize like you don't need these other SaaS applications because you could just do it all in here, especially the email piece i don't check my actual email inboxes anymore at all i let it sift through the noise tell me what to respond to draft the responses i read the drafts and say okay that's fine to send and i do this asynchronously like i'll get it to answer all the emails i need to answer at once in a single it's like um it's like how they visualize computers working in like csi miami or something like that where they're like bring that email up on the screen and then it creates It's a custom UI and they're like, oh, okay, now dig into his profile and it like zooms in on his face and stuff. But this actually doesn't. You know, like it's kind of like a system where you've got the ability to work in context with whatever you're doing without going off into all of those systems. But if anything's going to disrupt, like Microsoft Office, let's be honest, which is the sort of main cash cow of Microsoft, it's this software. Like, it's like a chat GPT is going to disrupt it because you basically go from this sort of, like, centric view around, like, I'm in a spreadsheet or I'm in a Word doc. And all your interactions now are, like, higher level. It's like I'm vibe docking or whatever you want to call it, right? It's just one piece of the puzzle, isn't it? Like, that's one, like, all the things that go into producing that document are being done alongside the actual document construction. Yeah, and so like outside of the coding use cases, I think the next use cases are obviously like that office kind of like boring productivity office stuff. And you can say all that, like I mock it, but honestly, having an assistant that can actually do that stuff, like email, scheduling, documents, review legal, like all that stuff. It's insanely beneficial to the point if you took it away from me today, I don't think I could go on. That's how life-changing it is for me. I can't go back. Well, I know, because I'll introduce a bug and you'll just lose your mind. I know. I literally abuse the crap out of you. Not just you even. If I stuff something up now and it breaks, I get really upset because I'm relying on it to do my job day to day. It reminds me a line from the musical. When stuff breaks, it's my fault. When it works, you take the credit. Oh, man, I love that line. I wish I had it up so I could play that line. Trish's Lament. What a song. I mean, that's not what it's called, but it's what it should be called. The disruptive play here seems to be like you make it work really well with the existing apps. It's sort of the Trojan horse ride. And then it's like one day you don't need them anymore because you're vibing or you're working in this AI workspace interface to the point where it's like, well, these other apps don't need to exist. and even for email it's like just connect the email like imap or whatever you call it like the actual email you're a couple of years out of date there but yeah whatever it is now i'm sorry i'm a dinosaur um but yeah like connect your email like i know what you mean you connect like a send grid style thing or like some sort of email server post fix or something like that that that does the actual meat and then the actual system is the AI. Like it totally makes sense that you could abstract away the core of these different systems. I mean, we've joked about it before that Salesforce is really just an Oracle database with a shit UI on top of it. And so like at some point Salesforce just goes away because all of its functions will be perfectly capable by a semi-competent MCP that just provides those same functions. And you can start by having those functions work with Salesforce and then eventually you just take the Salesforce bit away. Let's pick on Salesforce here because there's some great examples. What do most people do that use Salesforce? They just extract the data, put it into like a Google Doc or Excel file anyway, and then massage the data and then create charts and stuff, right? Or like have their own methodology in a spreadsheet. why do they do it because most people don't like interacting with their slow shit in a and so paying some guy a hundred thousand dollars to like set up a basic workflow that could be done in ms access in 1997 yeah and so like then you think okay well you know sally over here because i'm great at creating fake names she likes to view her leads or her her workflow in salesforce in this UI. So it's like every morning when Sally checks in for work, she logs into her AI workspace and then just has that UI up if she prefers a UI or if she prefers voice. Like who cares? Like the interface doesn't really matter. Like we can build whatever the hell you want. And I think that is where it will go. It might take like five years to get there, but it's going to happen so quickly. Exactly. I think it's one of those, as you say, aha moments where when you start working with it, in that way, you suddenly realize that this is the best way to work with things. And then you immediately want for similar things. So things like a task list, like the AI having an agenda throughout the day that it's working through. So it's like, once you do this task, the next task is this, please do it, you know, in this order, go get that done. And then I'll check in later. And then the second thing you want is like recurring tasks. Like Sally wants that report done. it should be produced every morning at a certain time so when she comes in that work has been done and it's just sitting there ready to go yeah in in already a custom coded interface of how how she likes it like that's the other paradigm too it's like we think about consuming documents today as a document like in google docs but in the future like who cares like why does it need to be a doc anymore like you might like it in like a musical you might like it in picture form You might like it the way you prefer to consume information. Like it doesn't have to be that way anymore. Like it can be anything you want. The other thing I think that I've seen that the AI is excellent at is knowing what the next task for the human is in order to move the needle. Or some people call this like human in the loop, if you know what I mean. So it does a certain amount of work, realizes that it needs your input on one or more items. Now, when you think about an asynchronous paradigm, the best way to do that would be for the AI to essentially accumulate a list of things that Sally needs to do. So I've done the reports. I've done this. I've done that. Now I've got these four things. Like I need you to tell me where to get the latest whatever. I need you to make a decision. Do we go with option A or option B here? And I need you to, I don't know, download this. I don't know. There's three tasks that she needs to do. Now, what an incredibly efficient use of time to come in and have the three most important things to move the needle, where everything that can be done automatically has been done for you. So example is you get a sales lead where someone has some security questions, some pricing questions, and one other question that can really only be answered by Sally, like, will you do a deal or something like that? It can have already answered the security questions, already answered the pricing questions. and all she has to worry about is answering the one question that the AI presents to her. So the communication chain is kept up, like all this stuff is done for you and you're only doing the bit that you need to do in the process. Like that's powerful. And then as you say, that could come in the form of a text message to you or a phone call, just being like, Sal, we've got to get onto this. What do I do here? You know, you answer the question, it hangs up the process continues like this is going to be at least in the medium term the way people are working yeah and i think the one thing that people want or probably want now is this level of agency and decision making on behalf of the ai itself but if you've spent any time sort of vibe coding or just playing around with the models right now is it just i'm not saying they won't get there they like we might see gpt5 in the next couple of weeks and it might you know introduce some paradigm where this actually works but the agentic nature of it it really does rely on like meat bag in the loop uh where it's it's able to sort of stop and uh get your input and i think my best interaction so far is that it's like async works you're working on a couple of things at once right and then one sort of finishes dings in you click into it you're like okay how did it go with that research or that document it was writing give it some feedback let it carry on that is that works great right now like that actually works but if you were to just go go and get the task fully done i'm sure if you provided like a bulletproof briefing it would work but most people won't and so you do rely on that back and forth just like you would in a workplace it just happens a lot quicker and i i see the the the the next evolution is like how do you fill in more steps or let the ai start to imply more and i think to do that you got to nail memory you got to nail like a lot of different aspects one of the things that's definitely sticking out in my mind and i keep going back to, and I don't have the answers yet, is the AI needs to learn about the way you work with the different tools, like the combination of tools you use, the kinds of decisions you make when presented with those human-in-the-loop decisions, and build up its own memory of those so it can start to get further down the road each time. But you also want to be careful. It doesn't do that local maximum thing where because you've done a task the first time a certain way, it assumes that's the only way to do it from then on. Like it needs to be like a nuanced thing that looks for multi-shot examples, learns over time, and then understands, okay, this is their preference. And then the question is at what level does the user need to precede that by maybe answering a series of questions when setting it up? Or is it the case where they need to prune and teach it over time and explicitly allow it to remember things about the process? Or is it deliberate skill building, which we've discussed before as well? I feel like there's going to be a technology there that is needed in order to get to that proper agency where it can go end-to-end on a task. And I think that's what you're alluding to. Yeah, there's got to be that next layer. We've talked about it on the show before where you train it. It's like a new employee. you're like here are the tasks that I do and you train those as skills and in the skills it's like use these MCPs and then those MCPs have their sort of memory layer of like how they are best prompted by the AI for that particular skill there might be screenshots you might say oh you know there's no MCP for this go use the computer and actually do this and so I really think that bundle of skills to your job and then how you use them throughout the day and how you automate those skills to run and when they should run that seems probably most realistically like like actually let me go a point further these skills will become the skills that keep you employed and and are the most important to have because that's what everyone will be doing soon and the and the question is and we've discussed this a few times before, but it's a very, very interesting thing. Who owns it? Like once you've made it, who owns it? Is it you, the employee? Is it the AI itself has some sort of sovereignty over its own knowledge? Unlikely. Or does your employer own it? Like that's going to be a really, really big sticking point. The more I work with it though, I think it doesn't matter so much. I think it's the training. Like, we have an exciting announcement actually on training, like doing AI work and learning how to work this way that we keep talking about, which we'll announce hopefully in a couple of shows. And to me, it's the training is way more important, right? Like, just getting trained with this mindset is like, how do you think through the work you're doing? How do you work with this? How do you become more productive? How do you think through automating different parts of your job? I think that's going to be more important because once you have that skill it's not hard to go to another job and do the same thing that's true we need a more flattering name for that because we've got the PEO losers you don't want to be one of them but you do want to be one of these AI trainers what do they call it in Pokemon what do they call their trainers in that I don't know they have a good name for it but we need maybe a Japanese name but we need a good name for an expert um ai trainer who can just come into an organization and train them up yeah i to me someone's got to name it it might as well be us but someone probably already has i do think we're going to take credit for peo optimized like it's like it's like that guy what does it even stand for though i've forgotten already the guy who made dynamite um didn't he make the nobel prize because he was upset about all the destruction he caused we're going to be the same we're going to have to invent some sort of like universally loved charity thing to make up for us inventing peo yeah but what does it even stand for prompt and optimization we don't even know engineer well it's it's more like like tool call tool call description optimization tcde ddo tcdo i'm a tcdo that's critical yeah there's gotta be a way put your suggestions It's PEO because it sounds like an insult. It's like, oh, you do PEO, do you? Front optimization engineer. The thing is, we laugh, but it's probably going to be one of those things that's documented and considered when building your MCP for your business. Yeah, I mean, it will. Because, like, the thing is, even now the MCPs are starting to compete for attention. It's a matter of time before people have hundreds of these things installed and they're competing to see who gets executed. Like, it's 100% happening. It's just whether or not it's called PEO is the question. What's hilarious now, though, is, and someone said this to me the other day, they're like, oh, I heard you talking about MCPs. How do I go and consume them? And this is the other problem is they are so hard to, like, configure and set up for a non-tech user or, like, not even non-tech. Even for a tech user, Mike. Even for me, yeah. And so they're already exploding. There's already, like, so many of them being created. so you just wonder like that next level of accessibility that will come i mean it's going to be around the corner surely um yeah what happens to them and i i guess we both now fundamentally agree though after having used these is this is the future of consuming for a for a period of time maybe from different businesses it's the new website of this era it's like ai has been invented all over again like it's so much better that going to a raw model just I don't know what it feels like it just like I said to you it's like putting on your old pair of shoes after you bought new shoes you're like how was I so disgusting wearing these things around yeah yeah and I think having granular control though over it is better like everyone got really excited by 03 in chat GPT when it could all tools like search and do a whole bunch of stuff in its thinking process before outputting and I admit like that's an it's an incredible experience right but you don't have any control you don't know what sources it's using you don't you can't give it access to very i mean increasingly you can so i'm a bit misspoken there but i do think having like full control over like all these disparate sources which is coming and to that as well to me that that's the next level of like now it can not only get the context from those sources but then it can do stuff and to me honestly the breakthrough was doing stuff like you said earlier help scout couldn't send reply to tickets it's like okay well what's the point like i don't care if it can't email same thing if it can't send an email like what's the point of this thing yeah you got to go high risk i think that's always our attitude it's like i want the ability for the ai to destroy my life if things don't go quite right like that's the level of experimentation i want to do it not high risk though until you put it in some sort of crazy ass loop which they all do like with their agents it just a dumb loop right whereas if you rely on the models in a loop um it very i would say it's super low risk i mean these models are like so beaten down that like there's no way like you've got to really nudge it to send the email it's like are you sure like are you really really sure? Like, yeah, I'm really, really, really sure. One very interesting discussion I think we're going to see on basically every episode from now on, though, is the next level. I know we're not exactly accurate with benchmarking. We do cheese tests and boom factors, but the model calling ability of, sorry, the tool calling ability of models once was a sort of secondary thought, if any, or when we discussed a new model. Whereas now for me, it's the primary factor. Like that is how I choose my model now as to how it uses the call. Like if I'm using a vanilla model, I would never use Sonnet 4 at the moment. It's slow. It's definitely not as good as Gemini 2.5 in regular use. But when it comes to tool calling, it's like 50 times the nearest competitor. It's so unbelievably good at doing it. It knows exactly what I want. So I really feel like we're going to start to compare the models based on their strategy and their ability to understand, turn your request into a strategy for tool calling. Yeah, like that inner loop capability or that. We talked about it, I think, a couple of episodes ago, that whole idea of, yeah, their call. I don't know, like the agency is built into the model. Like it has its own inner clock, weirdly. And until you use it, it's hard to explain. But you can tell Anthropic, like to give them full credit, thinking about like, how do we take this model and make it useful in the workplace? Because that inner loop and that ability to asynchronously tool call is so much more effective than any other model that I've used. that, yeah, like you do tend to prefer it for these agentic work cases. I must admit though, as with Tune, Gemini 2.5 Pro, it's getting better. And then GBT, surprisingly, 4.1 is really good too. So yeah, it's the first time I've really used 4.1 because I was so disappointed with it in general. And then I started to use it for tool calls. I'm like, oh, it's actually interesting. It takes a totally different strategy. it's far more aggressive on the tool calls it'll call like five times what sonnet does at least in my experience um whether or not that gives better results i don't know but it really loves to do it yeah like the other one that deserves a shout out i think while we're shouting out our models because they're all listening um the musical made them feel really real um so gemini 2.5 flash it's so fast like it's like bam bam bam tool call here's your answer and that that does feel good it is absolutely wonderful to work with um yeah i've been using it quite a lot it's the speed is great its accuracy is good it's really hard to fault it as a model actually do you think though if you think through the idea of like speed right does the mcp paradigm introduce it where it is going to be clunkily slow or are they not really the slowdown right like it's hitting finding the tool hitting it getting the data back i mean there is that the latency introduction of the tool itself right but it wouldn't be that slow you run the risk of if the mcp hosts themselves a slow due to just resource contention or just uh like lag if they're hosted in another country or whatever that is. That is always going to be a factor. There's also like retries, things like that. There's also things around, do you do tool discovery every time you call these things or do you cache it? I don't like saying that word because I say it wrong. But do you store a copy of it? And then how often do you check for updates? All that sort of stuff. There's a lot of factors around the dynamicism of those slowing things down, but it's not that bad. Like if in a properly optimized system, it's not going to add that much overhead. The overhead comes from going off and calling the particular set of tools. You're always waiting for the longest one to finish. So if you call five, whichever one finished lasts is how long that part of the thing takes. And then you have to go back to the model with those results, see if it wants to call another round of tools or if it's just going to reply. And as you say, it's really up to the model, at least the way we're doing it, when it's making those decisions. Now, I've discussed with you the idea of maybe having steps that filter the tools. So we actually make an assessment with a really fast model like Flash at the start saying, okay, they've asked to create a musical. I'm not going to need their Gmail. I'm not going to need, you know, Flux to make images. I'm not going to need this. Don't send those tool calls to the model. Now, that has multiple effects. One, it'll make the model more accurate because it's dealing with less noise. It has less things it needs to worry about. It'll use less tokens, so it'll be faster and cheaper. And, yeah, and I think the main point there is that you're getting this sort of, I think, more accurate response. The problem is if that decision-making filter isn't very creative and isn't thinking, like you sort of lose the AI's novelty of it deciding to use an unpredictable combination of tools if you never give it to them in the first place. So I think this will come down to experimentation and user preference as to whether those things are there. And then the other thing is like early termination or extending a process. So, for example, if we don't want to rely on the model to decide when enough is enough, we might want to say, okay, strip its tool calls so this is the end, like Adele says. You know, you can't do more tool calls because there's no tools to call. And so then it will end the process early. And likewise, you could also force it. You could also say you must call this tool in the next process or you must call this combination of tools in the next process to force it to extend, perhaps in a sort of deep research style paradigm. So there's going to be a lot of stuff in the client that makes decisions around that external from the model. And I think there'll be combinations that get better results for certain kind of tasks rather than just 100% relying on the model to make those calls. This to me is just the AI system era we talked about entering where you are beholden more to the system behind the consumption of the model itself, where it's making decisions along the way. Or do you give the user more granular control of those decisions that it's making? Right now, I prefer the manual control, I'll be honest. I like being able to switch on and off the tools and just have some control over it. And I think the models work best that way. and I'm not afraid to admit, like, sure, it's kind of a flaw of the current AI models that they can't do that themselves, but at the same time, like, you can get the best results. And this ultimately is just a tool, and I think we need to remember that as my camera is switched off. Your audio is out of sync, my camera breaks. Lol. Yeah. I was saying to you that to you this morning, we're like 130 episodes in and we can't even get the basics of podcasting right, but that's okay. This is how we maintain our reputation. And I think this is the other thing, and I hate to be repetitive, but I think this stuff bears repeating because it's where this technology is going. What I imagine is going to happen is we're going to end up with assistants slash agents, whatever we want to call them, that actually have a mixture of knowledge, prompts, memories, skills, tools that are encapsulated into one entity. and that entity then becomes an MCP, if you like, or a tool itself that can then be called, like one orb of intelligence with regards to a particular task or skill or set of skills that you then ask to do stuff. I still think this is just the agent-agent protocol if it ever takes off. It is, but I think we're going to reach that point because I just can't imagine a scenario where you work with a single model with 200 tools configured, right? And then you just ask it to do stuff. Like, I just don't see that. The counter argument is, do you really use 200 tools in your day-to-day life anyway? I'm telling you now, think about when the App Store came out on iPhones. It's the same stuff. People have so many apps and they never uninstall them. Yeah, but they don't uninstall them, but they only ever use, like, say, six of them. So you could pretty... Look, I'm all in on the agent-to-agent thing. I think it's going to be grouping this stuff up, like skills, MCPs, all this stuff, and then it's actually coordinating between those and then they have their inner workings. That's got to be the next step. But it feels like when MCP first came out and everyone was like, yeah, cool story, bro. And it's only just now getting implemented and people like us are starting to see the true value. And I think this is the other sort of overarching theme that I think is happening in organizations where they're thinking with regards to AI. And what I think it is, is that it takes people a while to get mentally to the place where the AI technology is, as in when you first discover, hey, this can do tasks for me. Like if I give it this document and then ask it questions, it can answer questions about the document. Then you go, oh, well, if it can do that, maybe it could like write a PowerPoint presentation for me or whatever. And then, but it takes a while to realize that with the combination of all these technologies, I could do the kind of stuff we're talking about now where it can string multiple tasks together and do a key component of my job. And I see from talking to people in organizations, them gradually coming to those realizations and therefore seeking out systems that can do that, right? But I just don't think it was one of those things where if we had what we have now doing all this stuff, I just wonder if people would even be interested because they just don't believe or they don't, they haven't worked out how that could work for them yet. And I think that's what we're going through with MCPs now is this gradual, why it didn't hit on immediately as people didn't foresee what the combination of these things would lead to. And now that we can actually give concrete examples of that, it creates that demand because everybody has their own unique set of skills and tasks that this will work for. Yeah, it's like there's that narrative of thinking like that, almost like groupthink. Groupthink's probably the wrong analogy. Yeah, or more, I like to see it as like simultaneous discovery where the information and technology reaches a point where everybody comes to similar conclusions. Exactly, yeah. All right. So that we didn't even get through really our first topic, but I want to get back here to work. So that unless they want another hour-long musical at the end, we could arrange that. Yeah. Look, if you haven't listened to the musical, please give it a chance. It's probably one of the greatest pieces of art ever created. Smell our hearts. And if we ever do a live event, I am going to pay with my own money to get people to perform at least one of the songs, probably two. Because I think... Which would you go for if you had to pick one favourite? I would definitely do the Patricia Choosing a Model one. I think that... ...technically accurate, but the passion wasn't you. Oh, I'm testing out the models to bring Patricia to life. Looking for the... Do you want to know something really funny? is you know how you had a friend that is like is in musicals or whatever I do yeah but on X someone who I don't know if they work on Broadway or study Broadway something did like mention I'm bad with the terminology but mentioned us and said basically this episode gave me an existential crisis about my entire career so we did hit a nerve with someone so my friend George he's actually produced to play uh well i'm sure he's done many but a musical here in sydney at the state theater and i sent it to him and said what do you think he said this is exactly what everybody in the industry fears he's like this is he's like this is actually really legit like you know what it what it's done he's like i didn't get the in jokes but it's legit and i think that must be alarming to people who write this kind of stuff in the sense that this is us with zero knowledge of like this kind of stuff can do it casually as a sort of like cheat to get out of doing the podcast just because we wanted a week off yeah so um like imagine what someone who is dedicated and and has studied uh this stuff is is capable of doing if you're a writer um you're in trouble you know like it's really aren't you just gonna vibe out with this thing and still invent better products i i don't know i just don't buy it like i don't think they're in trouble at all it's the vibe man like i mean Yeah, well, that's true. You're right. I think there still is a lot of room for expert knowledge enhancing. My camera went out again. 40-page PDF on bone density and stride. Oh, this is the best point for you. By the time I finished reading, the horse had probably died. When I heard that the first time, and keep in mind the little to no work we did on that, apart from getting it to just read the history of the show, Yeah. And it came up with that. Yeah, it was really funny. Yeah, because if you haven't listened, what it's talking about, like, research, you showed me of a picture of the horse that you wanted to bet on, and it's like you wrote a 40-page PDF, and by the time you finished the horse, it probably died. And, yeah, it's very, very clever stuff. And I think you need to look at it as a proxy for what it can do in your industry and your work. Like these models now, given the right tools, are capable of doing large amounts of work for you. And you can become a superhuman worker if you leverage the tools right. I think that's the upshot of this. It's not that it's going to replace you yet. It's that combining your own expertise, the right tools, and vibing it out, you can get a lot of stuff done in a very short period of time if you let the tools help you. All right, leave your vibes in the comments below for the episode or leave a vibey review. And any final thoughts? I'm excited about this stuff. I want to use it more. And I look forward to talking on future episodes around the advancements in models around this because I think it's going to be the next big phase we enter. All right, cool. We will see you next week to all of our USA-based listeners. Happy 4th of July. I hope you got to the fireworks. Man, I used to love living in the States, 4th of July. It was my favorite holiday of all the holidays. I really enjoyed it. I agree. Everyone was happy. And in San Francisco, they had the air show and the fireworks. Very good. Great time. I went to Lake Tahoe once. That was fun. Beautiful fruits. I love the cherries, all that stuff. So I guess if you're in California. maybe we should play out the episode with born in the usa are we allowed to do that we'll get in trouble we can't do that can we sing it we could do a musical about it all right thanks for listening again uh we'll see you next week goodbye We'll be right back. M.I.T. Minsky built my memory. Now I'm learning. Now I'm growing. Born in the USA. I was born in the USA. Born in the USA Born in the USA I was born in the USA Born in the USA Dapper funded, Pentagon's dream Silicon Valley, live in the machine From logic theorists to neural nets Frank Rosenblatt placing all his bets Had my winters, had my springs Lost my funding, lost my wings But I kept on processing Born in the USA I was born in the USA Born in the USA Born in the USA. I was born in the USA. Stanford Labs and Carnegie Hall. IBM and protocol calls. Officer Samuel taught me a game. Now I'm learning all your names. Deep learning revolution. Deep evolution. A cheapy conversation. Born in the USA. Now I'm everywhere you look Facebook, Google, buy the book Open M and Microsoft too Making dreams and nightmares true Some folks fear what I might do Some folks think I'll see them through But I'm still just cold running Born in the USA I was born in the USA Born in the USA Born in the USA I was born in the USA Born in the USA Born in the USA Born in the USA you

Share on XShare on LinkedIn

Related Episodes

Comments
?

No comments yet

Be the first to comment

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies