Long Horizon Agents, State of MCPs, Meta's AI Glasses & Geoffrey Hinton is a LOVE RAT - EP99.17

This Day in AI • Michael Sharkey, Chris Sharkey

Friday, September 19, 20251h 9m

Spotify Apple

This Day in AI

0:001:09:00

What You'll Learn

✓Anthropic admitted to accidentally degrading the Claude model quality due to routing issues, suggesting there is introspection and evaluation happening on user inputs to decide which model version to use
✓A paper found that language models can execute long-running tasks successfully, but make compounding mistakes that degrade performance over time, highlighting the need to improve execution rather than just reasoning
✓Autonomous AI agents struggle to maintain performance on long tasks without human supervision, suggesting the need for more complex architectures with supervisory agents to guide the main model
✓High-latency models like GPT-5 may be better suited for autonomous tasks as they can take a more careful, multi-step approach before reporting back

Episode Chapters

Anthropic's Model Degradation

Discussion of Anthropic's admission of accidentally degrading the Claude model quality due to routing issues

The Importance of Execution

Overview of a paper arguing that the real bottleneck in language models is execution, not reasoning

Challenges of Autonomous Agents

Discussion of the difficulties in building AI agents that can maintain performance on long-running tasks without human supervision

Potential Solutions

Exploration of ideas around using more complex architectures with supervisory agents to guide the main model

AI Summary

This episode discusses the recent revelations about Anthropic's Claude model, including their admission of accidentally degrading model quality due to routing issues. It also covers a paper on the importance of long-horizon execution in language models, arguing that the real bottleneck is not reasoning but execution. The hosts discuss the challenges of building autonomous AI agents that can maintain performance over long tasks without human supervision, and the potential need for more complex architectures with supervisory agents to guide the main model.

Key Points

1Anthropic admitted to accidentally degrading the Claude model quality due to routing issues, suggesting there is introspection and evaluation happening on user inputs to decide which model version to use
2A paper found that language models can execute long-running tasks successfully, but make compounding mistakes that degrade performance over time, highlighting the need to improve execution rather than just reasoning
3Autonomous AI agents struggle to maintain performance on long tasks without human supervision, suggesting the need for more complex architectures with supervisory agents to guide the main model
4High-latency models like GPT-5 may be better suited for autonomous tasks as they can take a more careful, multi-step approach before reporting back

Topics Discussed

#Model degradation#Long-horizon execution#Autonomous AI agents#Supervisory architectures

Frequently Asked Questions

What is "Long Horizon Agents, State of MCPs, Meta's AI Glasses & Geoffrey Hinton is a LOVE RAT - EP99.17" about?

What topics are discussed in this episode?

This episode covers the following topics: Model degradation, Long-horizon execution, Autonomous AI agents, Supervisory architectures.

What is key insight #1 from this episode?

Anthropic admitted to accidentally degrading the Claude model quality due to routing issues, suggesting there is introspection and evaluation happening on user inputs to decide which model version to use

What is key insight #2 from this episode?

A paper found that language models can execute long-running tasks successfully, but make compounding mistakes that degrade performance over time, highlighting the need to improve execution rather than just reasoning

What is key insight #3 from this episode?

Autonomous AI agents struggle to maintain performance on long tasks without human supervision, suggesting the need for more complex architectures with supervisory agents to guide the main model

What is key insight #4 from this episode?

High-latency models like GPT-5 may be better suited for autonomous tasks as they can take a more careful, multi-step approach before reporting back

Who should listen to this episode?

This episode is recommended for anyone interested in Model degradation, Long-horizon execution, Autonomous AI agents, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

Join Simtheory: https://simtheory.ai ---- CHAPTERS: 00:00 - Simtheory promo 01:09 - Does Anthropic Intentionally Degrade Their Models? 03:34 - Long Horizon Agents & How We Will Build Them 36:18 - The State of MCPs & Internal Custom Enterprise MCPs 51:04 - AI Devices: Meta's Ray-Ban Display & Meta Oakley Vanguards 1:01:24 - Geoffrey Hinton is a LOVE RAT 1:05:49 - LOVE RAT SONG ---- Thanks for listening, we appreciate all of your support, likes, comments and subs xoxox

Full Transcript

So Chris, before we start the show, we're going to do a little plug like we have been for Sim Theory. If you want to support the show and also get access to all the models and some very unique MCPs, you can do that by signing up to simtheory.ai using the coupon still relevant to getting $10 off any subscription. I also wanted to call out a few new MCPs. I am leaking some accidentally on the screen right now, so I'm going to slowly scroll down. but we have released recently cdream 4 which is like nano banana if you're getting confused with image models it's really good and worth checking out it creates stunning images and it can do very precise edits they also built uh based on a request in the community the audiobook maker mcp so now you can turn a story you might have or a story you create into an audiobook so you can start to judge the model's storytelling capability there's also now zapier in there which enables you to connect to I think 8,000 plus applications right across your business or just things in your personal life as well, which is a really cool NCP to check out. All right, end plug on with the show. So Chris, this week we finally got answers to the question of do model providers intentionally degrade the model or route to cheaper versions of the model to save money after a launch, which has long been speculated. In fact, there was a tweet from Anthropic, which is now calling themselves Claude. I think they did some sort of rebrand in the week. We'd also like to address a concern we've heard in the community. We never intentionally degrade model quality as a result of demand or other factors. Now, everyone speculated that was untrue, but they did release this week a post-mortem of the three recent issues that they had found with the Claude models. They at least admitted the models had become stupider. It's an interesting one because I've long thought about whether they're behind the scenes switching things out. And even their postmortem sort of admits that there is routing going on. So there is some sort of evaluation of the queries you're sending it, at least on Claude.ai when you send it through. And it's deciding which model to give it to, even though people assume, like in the selected, they've got, say, clawed opus selected, but they're obviously dropping it down to either quantized versions or lower versions of models when they deem that's appropriate. And to me, even their admission is sort of like saying, well, oh, whoops, we accidentally tuned it a bit to go to the lower ones when you thought it was the higher ones. So I think even if they are telling the truth, it's still a little bit sneaky what's going on behind the scenes there. Yeah, I mean, one of the bugs was the routing error. Some users' requests were accidentally sent to the wrong type of server specifically servers set up for the massive 1 million token context window so this is their new 1 million context window but they said a routine load balancing change in late august made this problem much worse affecting up to 16 percent of certain certain requests at its peak yeah so it's to me the the big implication here is there's actually introspection like some sort of evaluation going on on what you're sending through rather than it just being some round-robin routing that just happened to accidentally send it to the 1 million context. They're clearly looking at factors like the number of tokens, the content of your messages to decide which model to send it to, at least on the flawed side. I doubt this applies to the API, but you don't really know if you're working directly with Anthropic. There was also this paper. It's called The Illusion of Diminishing Returns, measuring long horizon execution in LLMs. It says, does continued scaling of large language models yield diminishing returns? Real world value often stems from the length of task an agent can complete. We start this work by observing the simple but counterintuitive fact that marginal gains in single step accuracy can compound into exponential improvements in the length of tasks a model can successfully complete. And so essentially they say the real bottleneck in LLMs right now is actually execution, not reasoning. And they argue that basically if an LLM fails at a task, so this is a long-running task, like you could think about it as an agentic task, it's not because it can't figure out what to do, which we've always observed. like the plan generally is right but it's because it's making mistakes during the execution and the paper found that once it makes one error it basically sees that in the message history this like all these errors and then it sort of assumes like oh they want more errors in the output so it it progressively gets worse but then if reprompted correctly the model actually is smart enough to know what to do and do the task, which is a really interesting observation. I think it's not just interesting. I think it really matches up with the way, at least I know you and I work with the AI right now. Like as in the, because my initial reaction to this was to bulk it and be like, you're totally wrong because I find that the longer and longer a chat session gets, the better it gets in terms of answering questions. But what I missed is the point that I'm in the middle there nudging it along and saying, you're wrong about that. This is different. Giving it additional context and gradually improving things to the point where it's giving right answers. What they're talking about is leaving it to its own devices to go through that process and compounding on the fact that makes a mistake. And during the week, we saw that AI village thing that everybody likes where they're getting AIs to go off on a long period of time with goals to try and solve things. and you absolutely see the errors compounding there. It sort of goes down a path that's wrong right from the start, but just keeps optimizing that path, even though there's a fundamental error that a supervisory agent or human is going to realize, hey, no, that's totally wrong. The amount of times I've been working with an AI assistant and said, hey, this approach is wrong. What if we take a totally different approach? And then using the exact same context, it's able to get itself out of that mess and go on to solve the problem. So I definitely completely understand what they're saying here. As we move towards having autonomous assistance where we want them to be more goal-based and give them the starting point and then hope that they get all the way there, they're not going to have me or someone in the middle nudging them along and saying, you're wrong about that. You need to fix that. And so this really probably is the next major area we need advancement in, in order to get those autonomous agents working in a way that everyone expects them to. yeah i think that to me the only missing piece is really getting the the supervisor agent to say to the like the runner agent hey buddy like you've gone down the wrong path that the challenge there becomes from testing this is they're very agreeable so like you ask it hey like look at what's wrong with this and it's not necessarily like definitive in saying like what a human would say where with experience or just with some common sense and logic you're like no that's wrong but um interestingly enough if in the paper they they can get a lot further than you would you would think let me bring up the the fact here so gbt5 can execute over a thousand steps correctly so that's that's a pretty long running task the next best competitor is claude for sonnet at 432 steps, according to the paper. So it doesn't look like they tested Opus. They probably couldn't get enough bandwidth. Yeah, exactly. But GBT-5, in terms of thinking and its ability to execute long tasks, is just so far ahead. And you can kind of tell that working with it, I think, generally, that sometimes I think that's where the intelligence comes from. It can go down the right path. And as you said, we often talk about you can have a chat where you start to go down the wrong path and you've got to go back now and like fork off where it sort of all went wrong and then push it down another path and i think i mean it's probably one of my biggest fears when i work with the assistant over a long session is i don't want to accidentally mislead it or give it information that may confuse it and and muddy things up for the future obviously being able to fork chats avoids that but nevertheless you still want to make sure that you're keeping it on the task and not distracting it from what's important. To me, this can be solved pretty easily if we can get the technology to a point where instead of the human supervisor at the highest level, at least the supervising agent can think somewhat like we would think now. Yes, and I think this is why the emphasis around models isn't necessarily the right thing. I see that the LLMs will become an increasingly diminished part of an AI system. Not that their role isn't important, but it's just that it's more around the logic and the architecture of how the different elements in the AI system work. Like you talk about a supervisory agent. The question is, does it intervene after every step or does it initiate multiple threads of execution and evaluate which is the best outcome of that? Does it have like a sort of, we talked about this in the past, like, is there some sort of voting system where you have multiple opinions from different expert agents within a system to vote? Is this the right way to go or not? Or is it like the, what's that seven hats thing where you've got, you know, the devil's advocate, the skeptic, the different roles where each of those supervisors is saying, hang on a sec, you've completely screwed this up. You need to do it this way instead. and then another agent's like, no, it's correct because I checked this and I did this research and that's right. And I think that increasingly we're going to see systems like that where you have these balancing and counterbalancing elements in there to guide the main thread down the right road. And one of my problems with models like GPT-5 and why I don't use it too often is because it's high latency. But if it's getting that higher accuracy and you can therefore get to a higher level of autonomy in an AI system, then it doesn't matter so much because you can sort of set it off on its path and it'll tell you when it's ready. So it's probably better in those scenarios to have it take a more careful approach, have more steps in the process and get way further down the path than you would you having to be in the loop at every step. I also just question the, like, I guess there's probably two schools of thought right now. And I don't think, you know, there's probably not a right or wrong answer but there's the idea that a model like gpt5 can just eventually like maybe gpt6 can do 2 000 tasks and gpt7 can do 5 000 tasks and i guess i would question does like you say does it really matter because is the technology going to get to a point where you can't run concurrent threads like to me running one thread if it goes down the wrong path or takes the wrong turn you're in a lot of trouble like getting out like undoing that is going to be really tough whereas if you have like three threads with slightly different approaches then i guess the next challenge is how do you pick the the best outcome when you don't necessarily know the answer and and to put that in a practical sense imagine you're answering a support ticket and the the tools are it has access to the database of users and payments through stripe and like pretty much all the things that a regular support agent would have access to maybe even a computer to go and test a bug or like see if the bug is a problem and so you send it off on that path at least one model it comes to an answer for the customer like a draft answer and then you look at it to do that approval step you're like this is this is just completely and utterly stupid and wrong like at some point it's taken the wrong turn but then if you had three versions that are slightly different and maybe one's the right answer like how do you then verify like which is the best answer given that it then goes off in the wrong direction so i just i wonder if scaling the model model's internal clock or the task length that it can achieve versus just using small chunks like completing things in small chunks and having those chunks with success fail criteria um would be better like i it's hard to know without trying yeah i kind of agree with you I must admit, I don't understand it on the fully technical level that's going on inside the models. But my preference has always been do it in smaller steps and let's evaluate and guide you along the way, rather than thinking that some holy grail model is just going to be able to go off and fully solve an issue. I've found the longer they execute, the more detailed the answer, perhaps. But it doesn't always lead to the best real world results in terms of the final output. even if just for the fact it takes so much longer so that it's way slower to iterate. You showed me something during the week that I thought was really interesting, this idea of when a model does something that's not quite right, actually saying to it, which part of your prompt caused you to get this wrong? Or why did you decide to do that instead of this? Or how could I actually adjust your prompt to make sure that we do it this way always in the future? And I think that sort of, you know, cooperative process of the AI being almost like a malleable system that you can work on and work with it to get it to a point that you want. And then sort of freeze that state and say, okay, from now on, this is your starting point for operation. I think that kind of thing is probably going to be a lot more effective than just having a better model. Because the paradigm is always going to be there where the model is just part of the system. The model will never become the whole system. And so therefore, its role in the system needs to be proportionate to how you optimize the whole thing. Yeah, I agree with that because I think about the practicalities. Like if you said to me today, like, I want a research agent that can call and survey people, collate that data, and then present a report to me on any topic. Or I want you to build me a support agent that can handle like 80 to 90% of support tickets, right? And these are things I have direct experience doing right now. And so, like, that introduces a lot of challenges where if you think about, like, trying to actually replace, like, a worker, say, you've got to think through, well, okay, what's the job description? What are the tasks that you do in that job? So, I'm going to use support tickets just because it's something I think everyone understands. And so it's like, you've got all this reference material, you've got knowledge in your brain of just like, you just, you know, pattern recognition, right? And so you've got to simulate all that stuff with the model. I think just running the model and saying like, go do this, it's never going to work. So it's like, prepare the context, prepare the memories, all those typical sort of agentic use cases. But what I'm landing on lately is like, okay, do it manually first. So connect a bunch of MCPs, right, to connect into the help software, the documents, like whatever information you need. Actually connect it in and then go through the process of like, hey, agent, draft a response to this particular ticket and then give it some feedback. and then give it like four examples of where it nailed it and then wrap that in some sort of other MCP or wrapper, which is like you've trained it on that task. And so then you give the agent, you're like, here is your job description. Here is how the tools you have to execute these very defined tasks with obviously security and permissions or whatever you need. And then it's just a case of like, you know it's it's limited scope as to what it can do so it's very unlikely then to drift and go you know go down a path where it errors out a lot um and so it's like almost these sub skills or subtasks within the agent right now with today's current technology if you want to get it to work i think that's kind of how you have to do it i agree because if you think about what the alternative is right the idea is okay i have my same mcps connected so i've got my ticketing system i've maybe got my internal MCP that has access to data about customers and things that are needed to answer the questions and a knowledge base, right? Let's say you've just got those three, but between them, they've got like 30 tools that it can use. And then you throw it at this incredible thousand tasks thinking model and says, here's your tools, here's the goal, here the ticket How does it know which combination of tools to use in order to solve that Like you might have certain checks that need to happen every time like check they on a paid plan check if they on the paid plan that it valid or whatever check that, you know, this flag isn't set in the account. Only things you as the sort of business operator know that matter. With all the tools in the world, the system is not going to pick the exact combination that get the job done in order to solve that problem. and even if it does, there's no guarantee that that will happen every time. And so the thing becomes, yes, the model is perfectly capable of doing this task from end to finish with the right prompt, but it's not always going to have that. And things in the ticket itself might cloud the prompt and then you get different results. So an AI system like you're describing is far more valuable where you've trained it, okay, in these scenarios, here's what we do and these are the steps we take and you've seen that and this was successful and this wasn't. And then you give it four or five examples of that. And then from there, it gets to know your system and infers the combination of tools that need to be called. Or in some cases, you might want to be even explicit. This part of it's compulsory. This part, you use your own discretion as to whether to do that or not. And I would argue, no matter how good the models get, you're always going to need a system like that to accomplish these real world multifaceted tasks. It's just not going to be a model that's smart enough to just figure out absolutely everything yeah it's sort of like putting gpt5 in a tesla and saying like um self-drive plus like it's just not going to be able to do it and and so you know you obviously have specialist models in a car um to do that and and you could imagine one day even just having special agentic based models for particular job roles even like the the model is just a total tune for that role which i think people would pay a lot of money for to be quite honest but But I can also, like, I think, though, having said all that, like, if you look at GBT5 being able to execute for longer and, you know, figure out where it went wrong better than the other models and try other methodologies before it basically, you know, self-destructs and quits out, I think really what the paper's saying is, like, Lord Force on it's pretty good, right? But it does die pretty quickly if it takes the wrong turn or it gives up a lot quicker, right? Whereas GBT5 goes on and on. I do find it suffers from the problems I saw in 01 and 03, though, where it can just go so deep down a path, burn so many tokens, call so many tools, and go so far south in terms of the wrong direction that because it runs longer, you can't necessarily get feedback from another model and say, hey, no, no, no, stop, buddy, because it's still on that sort of run or that its own clock. So I am interested to experiment more, like, around more with this. But I do think right now for Agent Iktars, at the moment, in terms of intelligence and just, like, you know, the ability to make great decisions with MCPs, GPT-5 is really the standout. Like, if I was going to build it today, that's what I would be using. Yeah, and I think that we get to that autonomy in stages. And I think that's the idea that you've been talking about all week is that we need to gradually equip these AI systems to be able to get further down the road on a task for us. So one example you gave during the week was having a dry run kind of concept when you ask an agent to do something. So you go, here's all the tools you need to do. Here's the procedure. Here's an example of it successful. Now go do some of them. but before you finish, give me a summary of what you're going to do. And then you, as the right now, as the human supervisor say, okay, that's great, proceed, or you give it some feedback. And I think that the idea of getting to autonomy the way they're, say, doing in that AI village, it's a fun thought experiment right now, but we can see clearly that the technology isn't there. The way to get there is to gradually equip the models with different, or sorry, rather the AI systems with the tools to get there, perhaps like in context memory, approvals processes, you know, some sort of basic procedures in terms of order of doing things or checklists or something like that. Get the combination of those things that get us closer and closer to automation. And then one day before we know it, we're trusting more and more of them to just do it on their own. Yeah, and I think like this is why, I mean, it's not a secret really. in sim theory our approach has been like don't go to like full-blown a especially like to allow people to build their own agents right like it's not like we can go build a product like flawed code like you could do that um i'm not sure why we would when great products like that exist but those kind of products with the agentic loops you like i think the problem at least in my head I'm trying to solve for myself first is like, how do I have a way I can train my own agents that are very successful and very productive at tasks? And so how I think through that first is, well, I need context and I need to be able to interact with those systems. Like I need it to be able to reliably do stuff. And to me, like MCPs mostly solve that. So it's like, give it access to a computer is sort of a last ditch effort. give it um you know connections into say a salesforce or a zendesk or zapier or like whatever it needs like whatever context you need and then whatever uh like actions you need to take so it's like okay now now you can get context now you can take action um you have an underlying framework where the model itself can go and execute these then you need sort of a safety framework around that like you don't necessarily want it to uh take certain actions all the time but in some other agents you might want it to. So I think the next part becomes, okay, well, they're not very good at just running wild, so how do you train it to do tasks and skills? So can you allow the user to build and train an assistant to do skills? Okay, now it's successfully calling skills. like imagine skills as like a unit like a series of like contexts that can gather and actions it can take like solve a support ticket is a skill and it's a combination of things and mcps and then you know that to me then becomes okay cool it can do it like it can reliably do these skills like 80 percent of the time okay now how do we make it agentic like how do you then like set like it's like free willy like how do you set this thing free into the wild and see how it performs and then measure that performance like you know and then like just like we saw in ai village which we'll get to in a minute it's really then calling back to the human and saying hey i'm like i'm stuck here i need some help and then how can you reduce like those interventions um with the agents but i think what what excites me like your tesla isn't it the number of human interruptions are required yeah but but what excites me most about this is then i can imagine a world where listeners of the show and people in general can go and say like i have this very defined process in my business which is causing busy work and like for us it's like with sim theory like we we obviously are still doing like all of the support tickets ourselves mostly but for us it's like can we build an agent to take care of like 80 to 90 so we're only dealing with like things where it's like we really take care of them completely right like actually diagnosing and solving problems not just quoting shit from the knowledge base yeah yeah i should i should add that that it's like actually taking action like like fixing something fixing account problems like doing real work um that's just really busy work for us and that to me that's so meaningful and i'm sure there's so many people out there that listen going okay well there's all of these processes and tasks and and to me this is like the holy grail right that we're all chasing and so i think like being able to have your assistant that you're working with day to day but then having your sort of agentic runners in the background doing like actual busy work and maybe look maybe it's only like 50 percent of those tasks that successful at but that's 50 percent of stuff you're not not doing but a good example again just sticking on the support ticket one is just gathering all the relevant information like looking at recent error logs, looking at account status and information, looking at some sort of history of what's going on to say, hey, I've found these three fairly obvious things or this one fairly obvious thing that this probably is. That alone could save you 10 minutes investigation and you're doing that per ticket. And I think there's so many business processes like that where it's really the getting all the different pieces together. I've got to log into five different systems to situate myself and work out what's going on in the problem. Whereas the AI with its access to all the different tools and systems and context and a history of successfully solving tasks of this kind is able to get you so much further down where you're just like the, again, that Homer Simpson pressing the enter key, just yes, I would like to solve this. Yes, I'd like to solve this. It's just simply because the AI is just, it's effortless for it to do those things for you once you show it but i think there's another big like like there's a large amount of work that really needs to be done for this all to come together and that's around especially in the enterprise or business context the the whole idea of like the true custom internal mcp where you're accessing like internal databases or running commands um at least for us like running commands that maybe like we would do it through like sshing into a server where where it's like a pretty important command that we're doing and it's like how can you build mcps in such a way with permissions that agents can use where they can access sensitive internal data or actually take meaningful actions in the business or access data that's simply just not available like it's not like everyone's got all their data in like salesforce or a snowflake database like the world's not that simple and easy so i think i just see this huge boom like this enormous boom for people to be out there in the enterprise going and building internal mcps that just like i built like the video maker or the audio book mcp but like these sort of like productize mcps within an organization that can do very specific things from start to finish for that organization that can be called and then you can sort of bring those up to the agentic level and then all of a sudden this is a meaningful agent to that that business and i think the models are just so unbelievably good at classification and data structures and things along those lines that once you expose it to your internal data for your company like actual you know raw log files or raw database records and those kind of things that to us look like it's just overwhelming because there's just too much data and you can't really make sense of it. AI is just so good at that stuff. Like it can take all of that and synthesize it. And you just like, make me a beautiful visualization that explains what the hell's going on here. And it can draw the connections and it can come up with ideas of how to represent it that solve your problems. And I've seen that a lot of times. And this is why I think the real rise in the next little while is exactly what you just said. Every company needs to have an internal MCP that exposes data. And yes, there needs to be security. And yes, there needs to be permissions, like as in don't allow the AI, for example, to access things that you wouldn't want your staff to see. Or if you do, make sure they've got the appropriate role to be able to do that. But the idea is that there's so much metadata and other things in a company that will allow the assistant to do a much better job and get further down the line. That having these internal MCPs available is just going to multiply what you can do, keeping in mind you can combine it with every other tool there is around. Like even I thought of a simple example yesterday. Imagine you're a sales rep who needs to prepare for a sales meeting and you've got your internal CRM data and account data about the customer, their recent usage and how they've been using your system, who the staff members are who are accessing it. You get the system to gather all this context together. Then you listen to a podcast while you're on the way to the meeting that gives you all the information that you need in order to prepare for the meeting, including the account history and everything. It could just be immensely powerful and fun. And again, because the AI system can access all this stuff without you having to put it in there and copy paste context and all this stuff, it can be just done effortlessly. It could even read from your calendar and know the meetings coming up and proactively send it to you yeah i mean i think like a lot of this stuff i'm doing now with business metrics and collating them from multiple systems and getting it to give me a snapshot so i can go into a meeting like way way more informed but to give people like a really practical example we internally have an mcp for sim theory called sim i just yeah mcp i nearly said mpc for some reason sim mcp and it has a bunch of like uh tools that it has access to like creating custom plans finding users finding workspaces things like that just like stuff that will eventually want um an agent to be able to perform like these are very like important tasks for it to add capability so i think it's like these are the kind of like really good tooling, like tooling stuff that it can use, but also just exposing the like context from the business as well. I think that's something that's underestimated. And I did see earlier in that report from ChatGBT, like data analysis and just understanding business data in general is quite complicated with data spread across multiple systems. And if you look at providers like say a snowflake or even a sales force like the bread and butter of these companies or the pitch for years has been to like collate all this data together and then you know then you'll magically get these insights and be able to take action and quite frankly like i think llms coupled with mcps with internal data like from your raw like you deciding what data it has access to you can cut out like the middlemen you don't need them anymore you can just start asking questions like if i want to see like how many active users do i have i can ask the assistant and then say like create a chart for that um then i can create a document insert that chart into it now i have a full report on on you know something that i've been asked a question to or you can imagine like in marketing use cases like like pulling in advertiser data and creating a report on that coupled against like new customers coming in like there's just i think that data analysis of stuff. I know a lot of people are doing that at an elementary level now, but I think once you unleash the power of just connecting it into organizational data, what it enables right now and what it will enable in the future is just pretty unfathomable. Yeah. And I think this is not even accounting for companies that have proprietary data, like data coming from say sensors and like other systems and things like that, where they can actually then combine that into the external context, like data that's just not exposed to the external world or on some SaaS API. And the possibilities there are huge because these are systems that in the past would have cost hundreds of thousands of dollars or millions of dollars to develop that you can just in 10 minutes actually access. Like we had an example recently. There's a company called Soulcast that gets sun irradiance data. I don't even know what the hell that means. I guess how bright it is outside or something. But they have an MCP that they're using in Sim Theory with Open Interpreter, Code Interpreter, sorry, to graph this data like in immense detail. Like you should see these graphs, like how much data it's able to do. The graphs are really pretty. And this is something that previously, I mean, you would have had to have a full-on developer like access the API, build these different dashboards. And then that's not even accounting for the fact you can combine that with other contexts. And how many companies are there out there like that, that have all this incredible data out there, possibly already on APIs, that are going to be available for the – it's just that crossing of the systems and the different output tools and things that make it so powerful. It the and the AI ability to make sense of it and synthesize it And I think one of the reasons and I thought this when we were looking at those reports earlier that you saw that the science and analysis had actually gone down on chat GPT and I think on the Anthropic one as well, because I made a note of it. And I thought, that's weird. But I wonder if it's people lost trust of it because they were just pasting CSVs or something into the raw prompt. And then they're like, oh, it's not accurate. But that's not accounting for the fact now you can tell the model, hey, you're not good calculations but this tool is and you have access to this tool take that data shove it into the tool and then you make the evaluations based on what that tool tells you knowing that those calculations are accurate and i think this is but this i think you're describing the problem right now is people in the know at least will tinker around with like prompts and responses and like what combination of mcps and like how to ask it the right way or like how to put instructions in like i know for my um experimental support agent that I've been working on. It's a case of just like so much tinkering. And I often think like, how would you explain this today to someone starting out? Like someone in a job role where you're like, did you know you can automate all of this? Like, check this out. It's just still like, no matter how easy we make it, or like not easy, but especially the MCP paradigm, right? Like it's quite challenging. Like even making them installable on Sim Theory, like and it still has its faults but that challenge in and of itself is just so hard because the protocol is such a mess that it's like right like it still seems very tech elite right now whereas I think it it needs to be brought more mainstream this idea or this construction of these um I don't know assistants or agents using these tools yeah I think especially because the the underlying concept is so simple, but yet the actual practical implementation of it makes it quite hard to work with some of them. Like, is it a remote MCP? If it is, is it SSE or is it HTTP event stream? And if it is, what kind of auth does it use? Does it use a key? Does it use headers? Does it require an auth token? So many of the MCPs literally require people to like, become a developer on whatever the system is, generate a new app, get that app approved, generate a token and then have a way to refresh that token on an ongoing basis for the MCP to continue working? Who can do that? Even as a technical person, it takes a lot of overhead because you've got to understand how does this particular system do it and where do I have to log in? Do I have to register as a dev? This is no good. And then the other way that's actually part of the protocol that is the best, which they call OAuth 2.1, aka Discovered OAuth, The idea that you can have an OAuth workflow without being a developer on their system, as in you call off to say GitHub as an example, and it says, okay, this system that we just trust its name, Sim Theory, is trying to access the following data. Do you approve? You say yes, the system gets a token, and that's it. So when it works, it works really well. But if you look at Atlassian Intercom, hang on, I've got a spreadsheet here, of all the companies where they've implemented. Name and shame. I'm going to name and shame them because I want them to fix it. So they've implemented the protocol. So Atlassian, Century, Intercom, Asana, Raindrop, Monday.com, right? All of these have discovered auth in their MCPs, except it's discovered auth for the elite few clients that they pre-approve. So if you're like Windsurf or, I don't know, all the favorite darlings of the world, if you're one of them, you're literally hard-coded into their code as an authorized redirect URL. And if you're not on that list, it doesn't work. So you simply can't offer those MCPs as an MCP client unless you're like a pre-approved dev. I think where this is a bigger problem is not just us whinging about it from our perspective, but when we talk about those agents, like if you're at an organization and you're trying to build an agent with data, you have stored in one of these platforms, right? Like Intercom is a great example. Like you want to automate support as a layer. I know they have some, their own sort of agent around this, but say you want to just fetch contacts from Intercom or whatever it is as part of like some holistic agent for your business you're trying to build. You can't. Like, you just cannot auth into that thing because you aren't on that list. And I think this is the problem. I mean, you could, just strictly speaking, you could register yourself as a normal integration, follow their integration path and get approved. So, technically, yes, you could do it. But the problem is you're talking about weeks of work in terms of time, overhead, admin, that kind of thing, depending on the platform. So, it sort of goes against the whole plug-and-play idea of these discovered auth. But I thought, isn't the whole protocol of being discoverable and open and, like, these things are plug and play in the future of the agentic workflow and world? Well, yeah. It just, like, the state of MCPs is an absolute mess. Like, it just needs a big overhaul where it's like, you must follow these rules or you can't call it that. I don't know how they're going to fix it. And I might be stealing your point here, but, like, every other day, some website announces, we're launching the first MCP registry that's going to be the comprehensive list of all the available MCPs. And you go there and it's like the same 10 or the same 15 or something that are there. And even when you dig into them, you find that some of them are just linked to like a GitHub repo with vague instructions on how to set it up. And a lot of them, as I described, where it's like it's sort of half implemented. So look, we have to forgive them that it's a brand new thing. It's changing. Everyone's not 100% sure how it should work. But I really feel like this area really rapidly needs some sort of cohesion and consistency. So they really are plug and play. Because I think when that happens, it's going to feel a lot better. Because I think right now you get a little bit of mistrust with the whole thing because something that was working suddenly stops. And like, there's a lot of up and downs in terms of getting it consistently working. But listen to this. So Atlassian's description on the brand new launch GitHub MCP registry. This is the latest one. all all mcp servers there's 39 on this side on some there's like you know a million but these are all supposedly ones that follow the right auth protocol but then you look at Atlassian's description remote mcp server that securely connects during confluence with your llm ide or agent platform of choice lol as long as your choice is one of them yeah as long as you you have verified the choice by us but anyway so i don't know it's a bit of a mess and i i again it's like we've been saying it for a fairly long time like all the pieces are out there it's just like to to me at least the way i see it is like step by step going through like you know carefully step by step like can you get the context can you get the the models right can you get the memory right can you get all these components right in order to actually let people build like really reliable agentic autonomous use cases and i know people though before before i quit this point i know people are going to be like n8n and like all these other like drag and drop things i just i'm not sure in my world that's an agent like my vision of an agent is like hey buddy like you're now my support worker here's a list of instructions and and like basically here's your job description here's a bunch of the expectations like here are the things that we expect you to be able to do now go do it it's not like individually wiring up if this then that with an llm call mixed in between. That's not, to me, that's not it. Exactly. It's not like a flow chart where you're just designing the process because you've not been a programmer and now this sort of allows you to become one. That's not where we want to get to. We've got to use the intelligence of the model. And I think that sort of leads into the point that I wanted to make there, is that the reason we're going on about it being a mess and the reason we care about it so much is when the MCPs are working and they're working together, especially when you can combine it with an internal one for your company. Results are amazing. And you're the best at this. Like I've seen you do so many for our company where you've built these comprehensive, detailed analysis pages, charts, insights, questions to ask, and things like that, combining four or five sources from different MCPs that all relate back to one another. And you look like a genius. Like it makes you look like a true expert. And in a way, you kind of are because you're actually able to cut through and see the really cohesive, important information from a holistic business perspective. And I think that the reason that we want more MCPs in there and want them to work better together and more reliably is because everyone can experience this. And I think it's where the real delight and thought of the future comes from because you're like, wow, this is really amazing. Yeah, and I just think there's so much, like, there's also so many challenges because you're in a market where so much overhype, so much promise, and then people then go and use it. And I would argue, honestly, like, even with our own MCP implementations, because of the challenges we face around auth and, like, the consistency of these tools, like, you can go in and have a bad experience from time to time. And then that will often, including myself in the early days of testing, I'll be like, I'm never trying that MCP again. Like it didn't work for me once. So I'm done with it. And I kind of think that with the high expectation setting and then the hype cycle with models and things, people sort of forget. Like, you know, if you look at just like image and audio and video models in the last couple of weeks that have been announced, that video maker MCP that I built. someone sent me by the way like the most amazing video i built the other day which is far greater than any demo i ever did but it's just like those tools are just sitting there they're still sitting there right now and i just joined a bunch of them together and people are like oh my god i didn't know i could do this and i think that that's the challenge right now with the like using llms or using the mcps is like you can have a bad run and you're like oh my god this thing is so dumb but then you never sort of have the time at least to go and revisit it and play around with them again and so i think that's why having these like more refined sort of packages of like here's an assistant here's the mcp's perfectly tuned towards it um select the apps that you use in your business okay here's the perfect like intelligence uh assistant for it that's probably going to be easier for a lot of people to get that aha moment where they're like i can't live without this and then on the other end with the agents i think it's like you know there's like a combination of roll-ups and training and i think if you can do that in an environment where you're just sort of chatting like you're being interviewed about a job role um or the ai's observing you do the job for a while and then it's like hey i think the job description's kind of this and i'm going to need these tools um that is probably the better entry point for the masses into this stuff yeah almost like you say to the model, if you could manufacture a dream tool to help you gather the context to solve this, or if you could, you know, manufacture a dream tool to take action here, what would it be? Define it for me. And then you take that definition and then use AI to write code to shove that into your MCP. And now you've upgraded it. So it's better able to do that skill. And I really think that we're going to see a huge rise of MCPs. And I think there'll probably be competing curated paid ones that are bar superior to the existing sort of half-assed ones we have now yeah i i think looking forward to maybe it's like 12 months but i can see a point where like if you look at it today a lot of developers do use clawed code and cursor agent and i think there's a co-pilot one now must admit i haven't even tried it but the like they like there are very like small bugs or tasks that they're really good at going and finding the context handling them for the developers they can be working on more than one thing at once and i i think that agent paradigm is obviously just going to improve and get better over time but i can see in the next 12 months people you know spending a lot more time wiring up these agents for disparate tasks and there was this point made by um by dario um you know at anthropic when he's not counting his billies about how he predicted in six months 90 or 80 some random percent who cares that code will be written by llms and not um developers and people lately have been like oh well six months has passed man and like that's not happening but it depends who you ask like for me personally i would say he's right like six months ago i was writing a lot of manual code still six months later rarely i'm just yelling like do this plus no you're wrong you're an idiot you know like you're directed out i'm the same and uh there's just certain things where you're like why would you why would you write it out by hand the ai is going to be far more comprehensive it's going to cover more cases it's going to handle the errors properly it's going to do all that stuff This is what I think about automation. I think it's the same thing coming. Maybe six months from now, people will just start automating away different processes in a business to the point where there's no human. There's definitely going to be human in the loop in approvals, but that vision that people keep talking about and people think progress has slowed, i'm of the belief having like revisited a lot of this stuff lately that like it's not actually slowing down and the real impact will come and it's going to come and everyone's going to be like oh ai hasn't changed much there's not as many models coming out you know like people will fall asleep a little bit i think like that trough of disillusionment but behind the scenes there's going to be people building these agents improving them over time getting them to do certain parts of their job and you know i like i want to deliver that vision to people that use sim theory by the end of the year like i want to be doing that myself and i want other people to be doing that even if it's like an elementary version but i think the mistake most people made when they heard dario say that was they thought you know the developers would be cut out of the equation and he meant like the llm magically somehow writes all this code and i think if you think about agents today you think the same way you're like oh all the support workers will lose their jobs and what will they do and it's like no they probably won't um they will be wiring up and automating huge parts of their job and their job will be to control and supervise and run those things just like developers stop writing code they will stop answering tickets and supervise um at greater scale maybe that means they need fewer resources in the future but i can't again see those people like unless they're bad at their job immediately having job loss as a result of the technology you might actually see more people hired to wire up different uh automations to grow faster potentially yeah and at least the companies who use this technology the most and properly are just going to be so much more efficient that they will rise while the others gradually fade away i think that's the crucial point speaking of fading away our technology reliability for a tech podcast is not great i love how it's a sign of how average our show is like your bookshelves like collapsing in the background my camera turns off mid-recording just every other day disintegrated during the week so i'm using these are from like kmart or something yeah i i always wonder who buys those kmart headphones and now i've i've found out it's me i just the thing is i similar to the ai you know I prioritize tasks and my priority is building cool software. It's not headphones. All right. Now we're on that average track. Let's change the tone a little bit from us ranting about the future. So our boy, Mark, I don't think he had the chain on when he announced it. They announced some pretty cool technology. It's the Meta Ray-Ban display. And so it's like my Meta Ray-Bans, which of course I've lost now for the purposes of demonstrating. But so right now the Meta glasses, for those unfamiliar, have some creepy cameras on the front, a video camera and a regular camera. And they have audio and they have the built-in assistant. And you can ask it like what you looking at or you know you can set time and do tasks What kind of plan is this That the main one right honestly i don think i used the ai assistant apart from asking it the weather and asking it to play a certain playlist what monument is this that's the eiffel tower well wait i've got a pretty funny demo for you in a sec so the yeah the the reality is they're just really good headphones and i like wearing them for the headphone capability i rarely take photos with them and i pretty rarely use the only when in public bathrooms yeah um and so anyway they released the meta ray-ban display and i think this is this is something that i think is pretty cool so there's a little screen in the right eyelid um and for those that are watching you can sit up on the screen now and it can do things like give you directions you can reply to texts um all that kind of stuff now you would think you would have to talk to this assistant like an idiot but even better you can be a bigger idiot and wear a wristband um which basically detects the uh like movements uh in your hand and what you can do is reply to a text by like scribbling on your leg so you can you could be like writing in a you know like one of their examples seriously was in a meeting basically if you're bored shitless uh that you could be texting still and no one would even notice because you can't even see the light oh yeah people won't notice someone like completely freaking distracted by something under the table the interesting part too is when people were demoing these and using them and like reading messages their eye movements were so weird and they're like looking at someone being like yeah you can't tell at all like i know i know from even this morning when we're preparing for the podcast i was distracted by something off screen you noticed immediately like people can't concentrate on something completely different and look like they're engaged in the situation it doesn't work like that this whole ai embedded screen thing i think it's i don't know i don't know how i feel about it like i you know i i the thing i think that's crazy about it is they're missing the point of what it could actually be amazing at which is passive context gathering of the situation you're in like if we we know how good vision models are now right like your example from like freaking a year ago where you showed it a photo of you driving in the car and it knew you were in Newcastle and it knew roughly like where you were in the situational awareness. Think of how much faster and better vision models have gotten since then and the ability for it to make inferences. Like imagine a system that's just constantly inferring the environment around you, telling you information about the things you're looking at, the people you're interacting with. You're in a public bathroom. I should not be recording this. Chris, we've spoken about this. But do you know what I mean? Like texting and freaking calling, like they're just not interesting. Like do something cool with your environment. Like think about a work environment. Like I know Microsoft has had the HoloLens for a while, but like in a work context, just imagine the ability, even looking at a screen, for example, could be gathering context. Like it's almost like a permanent screen share, except it's actually able to go into your environment as well. Think about other things. Like I write a handwritten to-do list. I could look at it and then have that come up on a notepad on the computer, for example. Like there's a lot of really cool things you could do, like to take snapshots out of situations. But now they've got the platform, maybe that'll happen. Like if they open an app store for it and an SDK, you could probably like when it recognizes a person, say as like an API or like a hook in that SDK, then bring up a tile, which is everything it remembers or knows that you've chatted about with that person before. But then it's like, at what point does it just destroy humanity where we're just all so wired in that it's like, and you can see their eyes moving, like, to figure out what your name is and what you're talking about. Like, the whole thing to me, I'm not so sure about. But here's where I think it can work is, like, in business scenarios and also in sporting scenarios. And this is a new pair of glasses they also showed off called the Oakley Meta Vanguard. And these have a camera in the nose, which I think is pretty comical. Uh, but as someone who cycles a lot, having a camera to be able to just like record something I see on a right or just for my own safety, like, you know, uh, is pretty interesting to me. Also, you can do video calls to them. So like, you could like, I could call someone, I don't know. I mean, you wouldn't do this too frequently, but you could call someone and then they can like see what you're seeing. I think it's kind of cool. But this also has integration into Strava and, and garmin um and the ai model can connect so it can talk to you while you're writing and be like hey you're you're in zone two or whatever it is yeah like telemetry information and like it could probably give you strategy advice like you know but it can i mean that's what they they demoed like it can be like come on push harder you know you're on this particular workout on your garmin and you know it can also then play music so i think this kind of like ambient computing for a particular use case, right, to me makes total sense today. And, like, I will probably buy a pair of these, to be quite honest. Can we as Australians buy them? Probably not. Yeah, we can immediately. Day one, which is awesome. That is pretty cool. I think I'd like to give it a go, especially if, so you said, is there an SDK for it, like anyone can develop for it? Or it's only... I don't, I mean, I might be proven wrong, but I don't think they've announced anything like that yet. I hope that it comes because I think building apps for this, I know a lot of people that listen to our show would probably be really interested in that. I know that there's a Kickstarter or whatever, like an open source kind of one that is being worked on that's probably going to take 60 years and never be delivered. But this idea excites me if it's programmable. I think that makes it truly exciting if you can do that. I think just the new input device is what also excited me a lot around this idea. Like if you want to turn up and down the volume of something, like if you're listening to music, you can just like move your hand like it's a dial as long as you've got this wristband on. And then it can like obviously know that. But I also think this is where like probably the Apple or Android ecosystems has a huge advantage in this sort of like immersive AI assistant world because if you think about it like you've got an Android watch or an Apple watch if they can replicate what Meta's been able to do in terms of detecting the signals in your hand and the motion and stuff eventually then like you've already got these devices surely that that might give them some sort of advantage but I think you've got to give big props to meta here for pushing this like wearable ai they're like by far the leader i don't think anyone else is doing it this good that i'm aware of and those glasses honestly they're so addictive even though you look like a creep half the time wearing them like you know if you're going for a walk or a run or like uh riding a bike or whatever they're fantastic to have on just knowing you can kind of i don't know like i don't do it much but just knowing i can call on an assistant and be like oh hey what what's this or what's that it does appeal to me weirdly I'm not sure I'm intrigued maybe I'll be wearing some on the next episode but I did have to play this so unfortunately and like mad props to them for doing live demos I don't want to fully troll here but their demos didn't go so well so here is one of their demos about cooking of course because that's why you would use these glasses make a Korean inspired steak sauce using soy sauce, sesame oil what do I do first? What do I do first? You've already combined the base ingredients, so now grate a pear to add to the sauce. He's done nothing. What do I do first? You've already combined the base ingredients, so now grate the pear and gently combine it with the base sauce. All right, I think the Wi-Fi might be messed up. They tried to flame the Wi-Fi. Flame the Wi-Fi. I don't know if it was a joke, because they kept blaming the Wi-Fi throughout the entire presentation. But every live demo, every demo they tried, apart from, I think, the live translation one, which also didn't work at first, you know, failed. That's the other thing I didn't mention. They can live translate as well. So if you're in a conversation, it can detect the direction of the voice that's speaking to you. I don't know how it does this with like an array of mics or something. and then it can live translate using the ai model and put up on the screen like what that person's saying so you can use that for like language translation or if you're deaf which occasionally i am you can you can see sometimes i could have these glasses on and it's just like reading what someone's saying to me um just like i'm you know watching some foreign film uh that is that's pretty amazing that's actually really amazing yeah so anyway they're kind of cool i'm interested what people think like would you build an app for these meta ray-bans do you think this is the future of computing or are you like nah like this is like i'm just gonna stick with my just a cool toy to play with i mean it doesn't have to be the future it can just be fun for now yeah i i personally do like i would like to buy maybe a communal pair that we share and just try out i don't think i want to drop like you know three thousand dollars on two pairs of these but it would be kind of interesting to try out here. So, Chris, before we go, I did want to give a shout-out. Obviously, we talk about Jeffrey Hinton on the show quite a bit. And our man, Jeff, he had his heart broken recently. There was a bit of media coverage about this AI godfather. This is a real story. We did not make this up, to be clear. It sounds like the kind of stuff we would make up to slander him. Yeah, it really does. It feels like someone's released a troll press release and put it out there, and then media outlets have picked it up. But anyway, so it says, business insider, AI godfather, Jeffrey Hinton says a girlfriend once broke up with him using a chatbot. So maybe this. When he says once, chatbots have not been around that long. It sounds like he's talking about the 1970s or something, but it must have been like last year. It says Jeffrey Hinton said his ex-partner used ChatGPT to critique him during their breakup. AI is increasingly used for personal interactions, not just industry applications. Prior research from OpenAI found that the bot increases loneliness in power users. So what I find funny is he is quoted in this article. So he said this IRL. She got the chatbot to explain how awful my behavior was and gave it to me. he told the Financial Times, I didn't think I had been a rat, so it didn't make me feel too bad. I met somebody I liked more. You know how it goes. So he admitted to being a love rat. I mean, that's an admission, right? Like, I met someone I liked more, so I ditched my current person and moved on. Yeah, I mean, he's the godfather of AI. You know what I realized? Maybe our still relevant comment is wrong. He is a love rat, and he's used the AI thing and got back into the media to get women. I assume it's women, but, like, get partners, right? Like, he's actually using his clout to get people, and he's found somebody he liked better. He's an old man. He's unattractive. Like, he's only getting people because of his notoriety, right? So, like, what a plot twist. And, like, the thing is he had to admit this. Like, they didn't, like, grill him. They didn't cross-examine him, and he had to admit that he got called a love rat. He voluntarily gave this information. He wanted it out in the media. that Jeffrey Hinton is a player. Yeah, that's the message he's trying to get across. Still relevant, love rap player. It's crazy. Like, we couldn't, I couldn't, I wouldn't have made this up. So ladies, Jeffrey Hinton, he's out there, he's dateable. He is a love rap. It'd be funny if someone made a song about that, wouldn't it? If only we had the tech. If only we made songs on every show that no one wanted to listen to and play them. All right. It's been a good show. If you are interested in the song, it'll be after the outro music, the Jeffrey Hinton Love Rat edition song. God, it's good. I really like it. It's good. Chris and I competed. Who could make the best Jeffrey Hinton song about him being a love rat? I lost. Chris is this. This is a passionate. For once. I think this is probably the first time you've ever made a better song. You did most of the musical. I'll give you that. And that was one of the best things ever created. But yeah, when it comes to Jeff Hinton, I'm deeply invested. This is my second Geoffrey Hinton song, and I think two of the best probably ever created. Just a reminder, the musical. I think out of the 1,200 views it's had, I've been at least 200 of them or more. I must say, when I'm at a low point in the week, I put the musical on. It's true happiness for me, that thing. Like, I know it's wildly unpopular and outright hated by a lot of people, but there's just something about it that makes me smile. Yeah, it's all for the lols. All right, we will see you next week. Thanks for listening. And, yeah, if you want to check out SimTheory, simtheory.ai, use coupons, still relevant, which is very relevant given what we're about to play. All right, we'll see you next week. Goodbye. Warrior. Warrior. There's a man in AI, goes by Jeffrey H. He's the godfather of tech, but he loves to play. Got his neural networks running, but his heart's on the prowl. When he meets someone better, hear that love rat howl she got Chat-chip-y-tater tell him what a rat he'd been But Jeffrey just laughed and said, let me explain I'm Jeffrey the love rat, king of AI Swiping through the ladies like I optimize Got my deep learning charm and my neural net game When I find someone better, I'm gone without shame Jeffrey the love rat That's my claim to fame You the artificial intelligence To play the dating game His ex pulled up the chatbot Said explain his ways The AI wrote a thesis On his cheating days But Jeffrey read it through With a confident grin Said I didn't think I was a rat Now let the games begin Power users getting lonely, but not this AI king. He's got algorithms running for the next best thing. I'm Jeffrey the Love Rat, king of AI. Swiping through the ladies like I optimize. Got my deep learning charm and my neural net game. When I find someone better, I'm gone without shame. Jeffrey the Love Rat, that's my claim to fame. Using artificial intelligence to play the dating game Open Aces, don't ask bots about your love life But Jeffrey Scott had figured out he don't need advice He's training his romantic models on the side With backpropagation through his player pride From Toronto to the Valley, all the ladies know his name The pioneer of passion in the neural dating game He's revolutionized romance with his gradient descent Every heart he breaks is just an experiment I'm Jeffrey the Love Rat, legend of AI Teaching machines to love while I say goodbye Got my turrentess charm and my transform of ways In the kingdom of romance, I'm setting the pace Jeffrey the love rat, master of the game Use the deep learning algorithms to stake my claim So if you meet Jeffrey at a conference or bar Remember he's a love rat, that's his avatar He'll optimize your heart, then move on to the next The godfather of AI and the king of complex

Share on X Share on LinkedIn

Related Episodes

GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie? EP99.28-5.2

This Day in AI

1h 3m

ChatGPT is Dying? OpenAI Code Red, DeepSeek V3.2 Threat & Why Meta Fires Non-AI Workers | EP99.27

This Day in AI

1h 3m

Claude 4.5 Opus Shocks, The State of AI in 2025, Fara-7B & MCP-UI | EP99.26

This Day in AI

1h 45m

Is Gemini 3 Really the Best Model? & Fun with Nano Banana Pro - EP99.25-GEMINI

This Day in AI

1h 44m

Are We In An AI Bubble? In Defense of Sam Altman & AI in The Enterprise | EP99.24

This Day in AI

1h 5m

Long Horizon Agents, State of MCPs, Meta's AI Glasses & Geoffrey Hinton is a LOVE RAT - EP99.17

What You'll Learn

Episode Chapters

Anthropic's Model Degradation

The Importance of Execution

Challenges of Autonomous Agents

Potential Solutions

AI Summary

Key Points

Topics Discussed

Frequently Asked Questions

Episode Description

Related Episodes

GPT-5.2 Can't Identify a Serial Killer & Was The Year of Agents A Lie? EP99.28-5.2

ChatGPT is Dying? OpenAI Code Red, DeepSeek V3.2 Threat & Why Meta Fires Non-AI Workers | EP99.27

Claude 4.5 Opus Shocks, The State of AI in 2025, Fara-7B & MCP-UI | EP99.26

Is Gemini 3 Really the Best Model? & Fun with Nano Banana Pro - EP99.25-GEMINI

Are We In An AI Bubble? In Defense of Sam Altman & AI in The Enterprise | EP99.24

Why Sam Altman is Scared & Why People Are Giving Up on MCP | EP99.23

AI Curator

Ask me anything about AI