

Context Engineering for Productive AI Agents with Filip Kozera - #741
TWIML AI Podcast
What You'll Learn
- ✓More powerful AI agents do not necessarily mean more autonomous agents - human feedback is still crucial within the reflection loops.
- ✓The future of work will involve AI agents surfacing the data and creativity that they cannot generate on their own, to be completed by humans.
- ✓WordWare's approach is to allow users to express their ideas in natural language, which is then executed by a reactive agent that calls various tools in a reflection loop.
- ✓The 'dossier' concept provides the context, resources, and assignment for the agent to execute, with the agent deciding on the appropriate tools to use.
- ✓Metadata around the tools, such as required context, feedback, and authority, is important for the agent to function effectively.
- ✓The spectrum between deterministic tools and more agentic tools is an important consideration in the architecture.
AI Summary
The podcast discusses the approach taken by WordWare, a company building AI agents to assist users in building other background agents. The key focus is on the importance of natural language as the 'assembly code' for large language models, and the need to simplify the entry point for users to express their ideas and have them executed by the system. The episode also delves into the concept of 'context engineering' for these AI agents, where the agent needs to understand what it doesn't know and bring in human feedback at the right time.
Key Points
- 1More powerful AI agents do not necessarily mean more autonomous agents - human feedback is still crucial within the reflection loops.
- 2The future of work will involve AI agents surfacing the data and creativity that they cannot generate on their own, to be completed by humans.
- 3WordWare's approach is to allow users to express their ideas in natural language, which is then executed by a reactive agent that calls various tools in a reflection loop.
- 4The 'dossier' concept provides the context, resources, and assignment for the agent to execute, with the agent deciding on the appropriate tools to use.
- 5Metadata around the tools, such as required context, feedback, and authority, is important for the agent to function effectively.
- 6The spectrum between deterministic tools and more agentic tools is an important consideration in the architecture.
Topics Discussed
Frequently Asked Questions
What is "Context Engineering for Productive AI Agents with Filip Kozera - #741" about?
The podcast discusses the approach taken by WordWare, a company building AI agents to assist users in building other background agents. The key focus is on the importance of natural language as the 'assembly code' for large language models, and the need to simplify the entry point for users to express their ideas and have them executed by the system. The episode also delves into the concept of 'context engineering' for these AI agents, where the agent needs to understand what it doesn't know and bring in human feedback at the right time.
What topics are discussed in this episode?
This episode covers the following topics: Natural language programming, Reactive agents, Context engineering, Tool metadata and agentic protocols, Future of work.
What is key insight #1 from this episode?
More powerful AI agents do not necessarily mean more autonomous agents - human feedback is still crucial within the reflection loops.
What is key insight #2 from this episode?
The future of work will involve AI agents surfacing the data and creativity that they cannot generate on their own, to be completed by humans.
What is key insight #3 from this episode?
WordWare's approach is to allow users to express their ideas in natural language, which is then executed by a reactive agent that calls various tools in a reflection loop.
What is key insight #4 from this episode?
The 'dossier' concept provides the context, resources, and assignment for the agent to execute, with the agent deciding on the appropriate tools to use.
Who should listen to this episode?
This episode is recommended for anyone interested in Natural language programming, Reactive agents, Context engineering, and those who want to stay updated on the latest developments in AI and technology.
Episode Description
In this episode, Filip Kozera, founder and CEO of Wordware, explains his approach to building agentic workflows where natural language serves as the new programming interface. Filip breaks down the architecture of these "background agents," explaining how they use a reflection loop and tool-calling to execute complex tasks. He discusses the current limitations of agent protocols like MCPs and how developers can extend them to handle the required context and authority. The conversation challenges the idea that more powerful models lead to more autonomous agents, arguing instead for "graceful recovery" systems that proactively bring humans into the loop when the agent "knows what it doesn't know." We also get into the "application layer" fight, exploring how SaaS platforms are creating data silos and what this means for the future of interoperable AI agents. Filip also shares his vision for the "word artisan"—the non-technical user who can now build and manage a fleet of AI agents, fundamentally changing the nature of knowledge work. The complete show notes for this episode can be found at https://twimlai.com/go/741.
Full Transcript
I think a lot of people think that more powerful agent mean more autonomous agent. I actually think that's false. What you end up needing to do is incorporate human feedback even inside of these reflection loops. So basically the job becomes how to make sure that the agent knows what it doesn't know and bringing the human at the right time into these reflection loops. So when we think about how the future of work will look, it's exactly that. It's the data that the agent cannot find, the taste or creativity that it cannot come up with on its own, surface to human as work. All right, everyone, welcome to another episode of the Twimla AI podcast. I am your host, Sam Charrington. Today, I'm joined by Philip Cozzera. Philip is founder and CEO at WordWare. Before we get going, be sure to take a moment to hit that subscribe button wherever you're listening to today's show. Philip, welcome to the podcast. Hey, Sam. Pleasure to be here. I'm excited to have you on the show, and I'm looking forward to digging into a really interesting conversation, both about kind of what you are building at WordWare and the way you have approached the problem of building AI agents for users, but also we've talked about a bunch of interesting kind of topics around context engineering, the future of work, the fight happening at the application layer. All of these things will be interesting to dig into together. But let's get started by having you share a little bit about your background and introduce yourself to our audience? Sure. So very long story short, I actually was pretty lucky with my choice of research. I did research into essentially LSTMs back in 2016, which were the precursors to transformers architecture. In 2018, I started my first company trying to augment human memory with always-on listening devices based on GPT-2 and BERT. And I must say, I was a little bit before my time it felt like banging my head against the wall sometimes with gpt2 or you know showing promise but not really being there uh again uh long story short uh ended up exiting that company took a year off we don't talk about these moments enough in silicon valley it's always seems to be grind grind grind but i sailed the atlantic i climbed a couple peaks in nepal and I was back at it. And this time around, we essentially approached Wordware as the new software where the words are actually the code. Hence kind of the new take on the natural language programming. In the beginning, we were a much more developer-focused platform. We found some fault in our hypotheses and right now we're building Wordware is essentially the companion that helps you build other background agents. And we can get into what are background agents in a second. That's super interesting. And right now there's a big risk that this suddenly becomes a sailing podcast if we dig into that topic. But I'm going to resist that and talk a little bit more about the agentic side of things. You know, when I looked at what you guys are doing, it touched on some themes that I found super interesting. I've built a bunch of agentic workflows with tools like N8N and Zapier and Make and the like. And your proposition to users is that you allow them to build these kinds of workflows or agents with natural language as opposed to dragging boxes around and doing a lot of pointy clicky, which sounds really interesting. Talk a little bit more about the philosophy that led you down that path. It sounds like there was maybe a pivot from a developer-oriented approach to something that's more user-focused or end-user-focused. Yeah, so I think the core of all of this is that the thing that matters the most right now is the natural language. It is essentially the assembly code of LLMs and putting that obstruction front and center is very important. And, you know, at the beginning, we were much more focused on kind of developers and then very technical users where essentially you are sending particular snippets of text and to different LLMs and kind of really embracing the chain of fault technique by enabling you to also call different models. Right now, we've changed it a little bit. And I can go into reasons. I think we took quite a lot of inspiration from where the software kind of vibe coding market is. Can get into like, why is that important? But we essentially want to simplify the entry point toward where the engine stays the same. But we kind of want to make sure that humans know how to express the idea and the assignment for the agent. And as long as they can write it in a document format, it can be executed. And the question there is that obviously at the beginning, the latency and the cost of it is much higher. because you're essentially using a React agent behind all of it, trying to make sense of it and not make explicit tool calls, but rather have the agent decide on what to call, what code to write, et cetera. Help me kind of get a mental model for the next level of detail. The user describes their problem and what they're trying to do. and is the next step like a kind of a compilation step in a sense of like you're taking what they're trying to do, you're parsing it into, you know, what agents might need to be involved, what tools might need to be involved or is it more of a, you know, zero shot kind of thing where you have created something where you just give that to a system that's the same independent of the prompt and it knows how to execute all of these things that the user might want to do. Each agent in its simplest form is just a reflection loop with ability to call tools. And when we're passing, we call it a dossier, which includes all the context needed to execute an assignment. You pass different data into it. You ideally truncate it to be manageable inside of the context window, which we know, you know, all the time we hit the limitation of the 200,000 tokens. on Opus or a million on Gemini, et cetera. And the assignment is the function, the resources, the context, let's say, all of your data from Slack is the X in the FX, you know? And the assignment can be kind of, you know, taken as an algorithm, which, you know, uses LLMs, obviously, very, very heavily. And once it starts on the assignment, it uses a React agent. That's a very famous paper I recommend reading and basically calls different tools in the reflection loop. And the agent itself is actually deciding on what tools to use out of its whole repository, which ideally you are kind of passing in the dossier so the execution agent doesn't get too confused. we have seen like a big, big drop in performance when presenting like even Opus 4 with more than 15 tools. Are you pre-processing that dossier to explicitly identify tools or are you allowing the agent to do that on the fly? The agent will have to choose from, you know, a set of tools, but we definitely limit the number. You know, we have right now in our repository around like 3,000 different tools. And based on the assignments and the resources, you kind of want to limit that. There is like two different approaches. You could have a architecture where there is a continuous agent, which depending on the execution of a particular loop iteration is actually choosing the right tools for that iteration. or if you are very sure about how to execute this particular assignment, you can just kind of hard code it into the repository that the execution agent can call. Talk a little bit about how MCPs come into play in your architecture. First of all, MCPs are just tools. The only difference between them is that we agreed that, let's say you're running a team and each person in the team has some particular function, okay? and they know how to do one thing. And here we are kind of assuming everyone is a single task kind of persona. And all the MCPs, the innovation of MCPs is that we standardize a sticky note that you put on the forehead of that particular tool. So when you come into the room, you don't have to know them, know them, but you kind of know if you are getting a task from your client, you as the CEO of all of these tools know to whom to give that task. And, you know, we've kind of discovered quite a lot of limitations to do with MCPs. That sticky note lacks things. So it lacks a couple of different, in the end, it's still a description. It's just, we should force people to describe it a little bit more. So things that we are adding. Sticky note is only as good as the description that people are giving to the underlying tools and what they can do. So ideally, I would like to bring a world where we have an even stricter format of that sticky note. And some of the things that could be added is, you know, required context. Because if you just end up calling the tool and not giving it the right context to kind of move forward with that task, that's useless. It should have a feedback in some way. So if you already called that tool and that tool returned an error, you kind of want to call it again from the main agent and give it feedback. Authority is an important one. So basically, there should be some kind of way when we are taking action in people's lives, we should know whether that authority has been given to that particular tool. And if not, then what's missing? And there is a couple more. I think those four kind of describe it well enough. But we basically always throw a JSON into that description that compiles all of these things And most of our tools are actually agentic So they run some kind of fellow LAN before they actually process and just use an API, right? In the end, it's an API wrapped for natural language. When you say you throw a JSON into the mix there, are you saying that you are adding all of those things beyond what is available via the MCP itself? In other words, you're onboarding new APIs or MCPs and you're taking on the burden of, you know, filling out that metadata where it might be deficient? Yes. So essentially, it's still we're still using exactly, you know, the right, the same place in MCP as the description. We're just populating that description with a lot more things. So, you know, the context, the task, the authority. And as long as you then do the right things with it inside of the tool, then it's all good. So, you know, that's a little bit of a problem with kind of tools that are not managed by you. And for us, you know, a good tool is essentially still has an API somewhere in there, which just learns how to parse everything into the API. But now with ORD has an actual first-party MCP inside of it. So then it basically, you lose a little bit of, you lose out on latency, but you're a little bit more sure that what's happening inside of that tool actually makes sense and it is what you want to do. So tools can be agentic, tools can be, hey, walk out my dog, which then, you know, calls, goes to Rover website and does browser use there and so on, that can be a tool as well. And there is a little bit of a spectrum there between what is a full powerful agent and what is a basic tool, which is deterministic. And so when you hear folks talking about these agent-agentic protocols like A to A or ACP, how do you think about that in the context of what you've learned about using tools and the more agentic tools? All of this is a schema of how to communicate between tools or if you want them to be agentic, you can call them agents. Or software systems. Exactly. So regardless of the exact schema, I think what you'll end up doing in your own architecture is to use something that works just for you. So we don't actually utilize the A to A. We, you know, we utilize some MCTs. But providing the additional context kind of closes the gap between MCT and A to A. Yeah. So that's like, I don't know the ends and outs of A to A. I just know that like basically you need to find yourself on the right side of the wave. So it seems right now that an MCP is a wave, which essentially means that people are putting, you know, for the last 20 years, we've been putting a GUI on top of databases. And that's called essentially a UX for humans. And what we are trying to do right now is we're trying to define what is the best UX for agents to interact with that particular database. which both SAS is a database with some manipulation and a graphical interface. And it's an interesting problem. I think the game is far from being over. It's just people are catching on that if you are linear, you need to actually put out a way for agents to interact with all that data that you've beautifully presented to humans but have not found a way to present to AI. So I'm really wanting to ask you a direct question, and that is, does it work? And the context for that question is, in my experience, building out agents with like, again, N8N, for example. you know I do I often will go through this experience where you know I'll build this fully agentic thing and leave a lot to the LLM and you know it kind of works kind of doesn't work so I'll pull more and more out of the LLM and do more pre-processing and apply you know my own knowledge about the problem to kind of structure it more and more for the LLM and what you're telling me is you've got this one system, you feed it a prompt and it's going to just work without that kind of explicit, you know, business knowledge. And I don't question that it, you know, will work at some point, you know, given a certain level of capability of LLMs. But I want to, you know, that's a very valid question. Yeah. I think graceful recovery is extremely important in these systems. And in that way, I think a lot of people think that more powerful agent mean more autonomous agent. I actually think that's false. And what you end up needing to do is incorporate human feedback, even inside of these, you know, reflection loops. And if the agent, so basically the job becomes how to make sure that the agent knows what it doesn't know. And bringing the human at the right time into these reflection loops. So when we think about, you know, how the future of work will look like, it's exactly that. It's the data that the agent cannot find. The taste or creativity that it cannot, you know, come up with on its own surface to human as work. And, you know, that's on the reflection part of things. So how to figure out what tool to use and when. But there is also a part where it just lacks the right tools. It doesn't have the authentication or it doesn't have many things. So the first question for the human in the loop could be, do you have an API key for this? Can you authenticate via web? But actually what we found out is that it's good to have a graceful kind of graceful ability to fail in that aspect as well. And you almost need to manage a to-do list for the human. So everything where the agent cannot figure the shit out, you know, it will be like, hey, I added it to your to-do and good luck. Got it. And is that happening? Like, does WordWare like connect to my Slack and the agent is pinging me, you know, asking me for things or is it some agentic inbox on your system? So we're currently in a closed beta for the kind of AIOS product. And we basically have, you know, a macOS app, a web app. We're basically an agent, which is a companion, which is serving as that kind of manager and the last line of defense between you and the fuck out of agents that are trying to get some shit done. and there's an agent, there's that companion, which kind of protects you. But the rest of it is actually like there is a to-do, there is a bunch of human in the loop tasks, which is slightly different. We haven't really figured out the glossaries for all of these, but a human in the loop is kind of an intervention by an agent, which stops for approval, rejection, modifying of the modification of logical steps where it got itself in some kind of dead end or just kind of editing the final output. So those are the kind of four things that an agent can ask you to do. And there is also things where the agent just gives up and says, hey, I cannot do that. And then you add it to your own to-do. So this is kind of reflecting on how I see humans working in like 2028, you know? It's a lot about getting outside of this concurrency of one of chat interfaces. Maybe you have two agents doing something for you. You're closely monitoring it. You're essentially micromanaging your agents. One way to kind of look at this is look at what's happening in agentic coding tools and project that into general work. And so, you know, we've seen the rise of, you know, these agentic swarm coding tools and like Grok Heavy is maybe an example of this as well, where you start a task and this task spins up a bunch of sub agents that run off and, you know, do your coding task or, you know, your research task. and your job becomes, you know, the agent becomes less of an augment, a way to augment you as a developer, but a way to augment you as a developer manager kind of thing. And I've actually, yeah, there's a couple of very important ideas here. One, it's just a companion. So you still have to take responsibility for its work, which works better than like a top to bottom push from the management of using some particular tool because then nobody ends up taking responsibility for its outputs. And that's one part of our thing that like we are better than AIs. We are a legal entity that has responsibilities, can get fired and we give a shit. And so when I look at the current, I'm Polish originally. And when I come back to Poland, I see this human pride in implementation of very particular technical solutions. And these people often say, AI doesn't work. It doesn't do the thing. And when I look at the best engineers right now at my company, I think about the people who have had experience managing interns, have taste and are extremely good technically, but know which parts to delegate. And they work with tools like, codecs, devin, and like with interns, a lot of the work, you just need to throw it out. You cannot make it like get to the code base. And in the particular parts where you as an engineer understand that there is not much training data for that particular problem, then you end up solving it yourself. And this is very important to kind of understand for which problem there is a lot of training data and kind of develop that intuition to be like, hey, actually, if I would have scraped all of Stack Overflow, there was probably 400 instances of this question being asked. That perfect And this actually also influenced our architectural decisions where something could be slightly suboptimal in the kind of implementation but we know that there a lot of training data about this particular implementation, which ends up allowing us to manage our code base with AI much better. So it is optimal. There's an interesting way of like decomposing a problem in that you're explicitly thinking about what's likely to be, you know, within the training data, which I guess is another way of explicitly thinking about what the LLM is going to be good at and not, which is kind of table stakes. But thinking about it from a training data perspective maybe helps you do that. And it's crazy because like we're basically choosing like our suboptimal, you know, like architectural decision based on the employees we have available, right? You're not going to write your stack in Java if you only have like, you know, Python developers, you know? It's like you end up having to choose. And yeah, it's very interesting to me that we are already making, like we're already changing our lives to let AI help us easier. Along that specific point, I've thought a lot about how AI changes the future of like developing new programming paradigms and frameworks and the like. And the example that comes to mind and illustrates this for me is agents are really good at like creating React applications. That's the canonical thing that they know how to do. They're less good at, you know, Svelte or other things that I might want to do that are somewhat less represented in the training data. Or, you know, God forbid, some new thing that I want to get out there and, you know, promote for developers to adopt. Like, how do I get them to do that? Well, I've got to make it really easy for agents to kind of take all the, you know, documentation, for example, you know, becomes really key. But not necessarily for the humans, but for the AIs that will need to consume it so that they can help people write code with it. And it's very interesting of like, when are you again against the wave versus kind of being able to stay on the wave? And I think, you know, V1 of this company has been really focused on developers, which then created agents, which were somewhat deterministic and then exposed it as an API and plugged it into their code. And, you know, we got very quickly to some decent revenue. And this only speaks to how crazy the AI market is because, you know, we've raised $30 million based on that biggest round out of YC. And then soon after, we realized that we're going against Vibe Coding. A year ago, Vibe Coding wasn't really that big of a thing, you know. And it's just such a crazy industry. Like, I don't know if you've been following what's happening with Windsurf. but my god it's it's just shit is like i think my main lesson there is like don't get burnt out as a ceo and don't involve yourself with potential acquisitions which might fall down and or like don't tell your team about it because once it kind of seems like it went through your morale and the company and your own ability to push forward like the windsurf has been at it for seven years and they pivoted recently and they had 10 million ARR when they pivoted. Isn't that crazy that like 2025, we are pivoting away from 10 million, like Manus had 20 million of ARR and they pivoted away. And it's just crazy how just the bar has gotten so high. There are a lot of lessons and takeaways in the whole Windsurf thing. And that's probably a podcast on its own. But, you know, there was another, you know, thing in the news about Salesforce and Slack blocking API access for Glean, which I think has a lot of implications for a company like yours that depends on getting access to, you know, your customers' data in these silos that that data lives in. This has come up for me in discussions as well with just agents that scrape the web. And we've already seen like companies starting to put up barriers to prevent agents. I think Cloudflare came out with an offering that, you know, is going to allow publishers or site owners to charge agents to come and scrape their content. You know, how do you think about that as someone whose, you know, company is about, you know, in some way kind of, you know, liberating access to this data that's in these silos, or at least allowing, providing another way for organizations to leverage information in silos? So it's interesting, this is a much more difficult topic than in software engineering, because in software engineering, you own your whole code base. And, you know, we are trying to automate the menial, the knowledge worker tasks and make sure that humans spend much more time on the creative tasks and you know essentially automating all of these tasks requires the access to all of that data if you don't have data like access to you know your google sheets and your notion you cannot respond or draft a response to an email asking for a particular update about some financial metric you know and what i'm seeing is three different ways that this can play out. So one is, maybe two, one is that people will start closing it off and we can see Slack basically in their new terms and conditions basically blocks you from holding a particular flash, how is it called, not flashpoint, that like holding all of your Slack data. And this essentially means companies like Glean cannot really kind of, you know, answer precisely about these things. And it might be that everyone becomes really greedy and everyone who has that data will try to become their own AI agent. So you'll have an AI agent with Notion, with Slack, and they will charge you for each call. I think that would be suboptimal. And it would mean that you need kind of like a personal assistant to chat with all of these agents. And there's a completely different world where, you know, there's so much benefit from chatting with your index data on Notion via Cloud, that Notion just cannot close it. And then if we would be in Europe, if ChatGPT and Cloud have access to it, the small startups would as well. I have no clue how it would pan out in America power, you know, there's not such strong anti-competition, such strong, like, proactivity into kind of making sure that competition is healthy. So those are the two ways. I see different companies who are actually, and in the end of the day, trying to build very powerful AI agents to try to own one particular channel. So a great example of this is Granola, who, you know, like, I can see what they're doing. I can see that they want to become a universal AI agent for everything that gets said. And they basically approach it. And I know the founder, but I have no data from him at all. It's just my guesses. That, for example, they recently raced a round, which, you know, like it wasn't like great, but I think they might just be losing so much money on transcription and just being okay with it because they are getting so much data and they hold it. I ask them to give me access to transcript to the stuff that I say because I end up processing my transcripts in a different way. And I just can't get it. And I'm like, oh shit. So you guys are like, you guys are hiding this from me. And it's an interesting way in. How will this look like when we have the IO Johnny Ive stuff that OpenAI is paying $6.5 billion for and it just listens to everything. And it's an additional hardware that you wear. I don't know. This brings up a bunch of different things for me, I think. And then it kind of goes back to a point you made earlier, which is a lot of the apps we use are UX on a database. And the value, maybe that's an oversimplification, maybe it's a UX on some business logic on a database. But for things like CRM and things like ERP systems and many of the tools, particularly in enterprise, the business logic is relatively thin and the value is in the data that the users have put into this database. And if this front end, the front door to that is shifting from a UX to something else, to an agent, Salesforce and others who have all this data, they don't want to necessarily cede that front door experience to Claude by publishing an MCP and letting users bypass that Salesforce experience. And certainly, you know, they don't want to lose the value of that data by giving it away easily. Salesforce in particular has always frustrated me in this way. I remember as a, you know, as a small Salesforce user, it's like, you know, you could be paying $50, $60 a user per month, but to get access to the API, you needed to spend another $10,000 a year. It's like, it's my data. Like, what the hell? and so it's very interesting where will regulations step in you know based on GDPR or CCPA you can just request all of your data and they are obliged to give it to you but that would mean that you know as a startup you might have to ask about it like we will ask on behalf of you every 24 hours you know like and also like if they end up charging for it you know that's that's like and they might try to block it in some way unless like well gdpr is pretty like airtight like it's pretty good you own your data i'm not so sure about ccpa it doesn't specify that they give it to you in a way that's useful or easy right yeah yeah but then ai steps in and uh hey like the uh the the user experience that needs to be beautiful for humans can be just like freaking SQL database and you'll have all the information very easily accessible. One thing that AI is great is writing SQL queries. So, you know, then, yeah. So that's the magical part, right? The whole UX ends up being a prompt with an ability to write code, in this case, SQL, and the data. Suddenly, I would call this almost like an agent experience, right? And how quick that is, how you basically don't need any of the blocks. So the data problem is very high on my mind and basically how to make sure there's like startups right now which try to do exactly the same, but also build Slack and Notion and make sure that they own that. You commit everything to this one. to Slack and Notion, you mean? No, no, no, just doing... Oh, just offering everything. Yeah, and calls as well, ideally. It's interesting, like, you know, I'm thinking of, like, another tool that I use, ClickUp. Like, it's essentially like a project manager like Asana, and over the years, like, they have added every feature under the sun to this thing, like whiteboards and this and that, and, you know, like, they probably We do have a Slack-like thing. And when I think about that with an old world lens, I think that like, you know, their core value proposition is like highly commoditized and they need to throw more and more stuff in to justify their license. But there's another lens that you're kind of speaking to here, which is, you know, they need to or they may end up being the beneficiary of just having all the data. Like the data is what is key and important in this world that we're moving into. And the dynamics are slightly different. Yeah. Likewise, when I think about a business like yours, like in the old world, you know, if I look at a Zapier, I think probably the most complex thing in that business is managing all these connectors and, you know, building relationships with these partners and getting them over the hurdle of like you know one connector at a time to build out this catalog whereas you know it's kind of i suspect it's slightly different world for you like it's still like there's still an effort that's proportional to the number of connectors but like you got mcps even where there are no mcps you can probably use an ai to slurp in an api and like make it accessible to your system. But then you still, you know, in this world, one of the parallel universes that you project, you're like, oh, then thing is like building relationships with these companies where you're asking for their data dumps every day, which doesn't sound easy either. Yeah, that would not be pleasant, especially that you wouldn't be building a relationship. You would just have a lawyer on staff, which is chasing everyone who is not like, you know, adhering to you owning your actual data. So it's the opposite of building a relationship. It's building a litigation. I completely agree. I think this will essentially, which parts of your resources in that dossier that we've mentioned before ends up being something that you wrap a UX for the users to add data into is a question mark. You can think about like Granola in the end of the day. It's a folder of transcripts on which there is one prompt written. And the main, I think they were very, very lucky that they hit the right timing after COVID. Where, yeah, essentially, CRISP used to be something that reduces the noise before all the big platforms had it. And they basically created a virtual microphone, a virtual speaker that wrote, like basically sent data to both your headphones and their system. So they basically built Granola for Granola as a thing. Just because of a coincidence that they already built this way of capturing it without having to add a bot into your meeting, which I think is the main innovation of Granola. That you don't have to have this awkward third thing. and I don't know if you're running Plan Gnola, I am randomly right now and I don't even know about this and other people have tried it, they just nailed timing and you know the product is it's not that complex but it's just their way to market with it is genius and then you know how they're going to progress and grow on top of that data that's a real question mark but if you could imagine that you have a folder somewhere on your computer with all the transcripts you could imagine that you just find ways to add to it it doesn't matter if it's you know i'm exploring all different potential things that like end up uh end up you know combine like computing your data this is a quite a sophisticated microphone which just gets stuck into my uh into my uh iphone i just connect it like this and this is a sophisticated microphone on capturing stuff for me. And then later on, I can just kind of, you know, run for transcription, which is not really, which is a commodity, right? Interesting, interesting. I think it speaks to this context engineering and the importance of gathering and having access to this context. You know, because that's what, you know, without that, like the LLM isn't all that useful. Like you've got your prompt, you can tell it what you want it to do, but you're telling it how you want it to operate on some set of data that you own or control or has access to or that it can find out on the world itself. But certainly your pre-existing thoughts about the thing through your calls and transcripts is going to be more interesting than what it finds out for you at least. Just one last point for you to wrap up this topic. It's an interesting approach to say for now, all of the UXs are actually, the GUIs are actually the best way to get that data. And for example, Manus recently built in a persistent authentication. So they basically hold a little instance of a logged in version of whatever you have going on and they keep it live. so they don't have to re-log you in into everything. And, you know, for example, for me, that means like I could, you know, I actually have a background agent which scrapes a bunch of websites which I pay for access to about kite surfing conditions. And I get a notification through our system, actually, which tells me today's a great day. Take a 11 meter kite, not the 13. It's going to be blasting from 2 to 3 p.m. I have just put a meeting in your calendar regarding this. And that is an integration or something that you've kind of built with Manus connecting it to your system? No, we actually just built the whole thing. So essentially our companion built an assignment. The assignment you iterate on this assignment, which is in a document format where you explain exactly what needs to happen. And, you know, there's like little things like I'm 90 kilograms and I need to make sure that I take the right kite and that I choose the right height and ebb in San Francisco have flow and ebb. And ebb helps you go upwind which is great. And I want the perfect combination of it. But like ebb and flows, they are not appearing at the same time so they actually do not run a 24-hour schedule which means that it's actually a complicated algorithm to know when to go to kite surfing. Interesting. I mean that Brings up a very tactical question that I had in looking at Wordware. I thought I saw something in an FAQ that suggested that you weren't enabling like always on agents that like sit in the background and like repeatedly do a task for a user, but that the user had to launch the agent or something like that. But you're describing something that's completely different. We have 2000 different triggers that ambient agents can act upon. and we just like, I would recommend, I actually don't know when this is coming out, but we might have to cut out this version, but I would recommend everyone go to Sona.ai, like the Finnish Sona, and that's our companion. It helps you reflect. It helps you be the better version of yourself and helps you be the companion that builds all of these background agents for you and sign up for a waiting list because the stuff we're cooking is pretty powerful. I probably have like, I don't know, 30, 40 different background agents doing everything from prioritizing my email, catching me up on, you know, beautifully voiced and transcribed, beautifully, sorry, beautifully transcribed and voiced to me a morning update. It searches for every Beagle document and puts it through a simple prompt of like, hey, anything weird, or it searches through all of my transcripts, applies the books that I'm actually trying to become a better leader. So 15 Rules of Conscious Leadership and The Great CEO of N and tells me about how I can do better. It collates all of my previous one-on-one transcripts and checks whether somebody is performing or not in the team. There's just so much stuff or monthly checks through all of my Slack and gives me a report on the company sentiment and where potential sources of conflict. Checks through all linear tickets in order to see whether we're on track and is anyone polluting our linear board with some random bullshit. And there's a lot that's going on when I'm not working. I love it. I go somewhere and then I get a bunch of notifications and I respond to them and I get work done. That's very cool. Awesome. Well, Philip, it was great to meet you and catch up with you and learn a little bit about what you are up to. I'll definitely be keeping an eye on Wordware and checking out the tool. Perfect. Thank you so much, Tom. I've had a little fun. Thank you. Thank you. Thank you. Thank you.
Related Episodes

Rethinking Pre-Training for Agentic AI with Aakanksha Chowdhery - #759
TWIML AI Podcast
52m

Why Vision Language Models Ignore What They See with Munawar Hayat - #758
TWIML AI Podcast
57m

Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757
TWIML AI Podcast
48m

Proactive Agents for the Web with Devi Parikh - #756
TWIML AI Podcast
56m

AI Orchestration for Smart Cities and the Enterprise with Robin Braun and Luke Norris - #755
TWIML AI Podcast
54m

Building an AI Mathematician with Carina Hong - #754
TWIML AI Podcast
55m
No comments yet
Be the first to comment