DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever

Latent Space • swyx + Alessio

Tuesday, October 7, 2025

Spotify Apple

Latent Space

0:000:00

What You'll Learn

✓The Apps SDK allows developers to integrate ChatGPT into their own applications, providing a more seamless and customizable experience.
✓The AgentKit provides a suite of tools to help developers build, deploy, and optimize AI agents more easily, addressing the challenges of developing production-ready agents.
✓OpenAI has embraced the open MCP protocol as a way to enable interoperability between their tools and other developer platforms.
✓The live demos at Dev Day showcased the progress OpenAI has made in developer tools, but also highlighted the challenges of executing flawless demos on stage.
✓OpenAI's approach to developer tools has evolved over time, with a focus on iterative learning and incorporating feedback from the developer community.

AI Summary

The podcast discusses the recent OpenAI Dev Day event, focusing on the launch of the Apps SDK and AgentKit. The Apps SDK allows developers to integrate ChatGPT into their own applications, providing a more seamless and customizable experience. The AgentKit provides a suite of tools to help developers build, deploy, and optimize AI agents more easily. The hosts and guests discuss the evolution of OpenAI's approach to developer tools, the importance of the open MCP protocol, and the challenges of demoing new features live on stage.

Key Points

1The Apps SDK allows developers to integrate ChatGPT into their own applications, providing a more seamless and customizable experience.
2The AgentKit provides a suite of tools to help developers build, deploy, and optimize AI agents more easily, addressing the challenges of developing production-ready agents.
3OpenAI has embraced the open MCP protocol as a way to enable interoperability between their tools and other developer platforms.
4The live demos at Dev Day showcased the progress OpenAI has made in developer tools, but also highlighted the challenges of executing flawless demos on stage.
5OpenAI's approach to developer tools has evolved over time, with a focus on iterative learning and incorporating feedback from the developer community.

Topics Discussed

#Apps SDK#AgentKit#MCP protocol#AI agent development#Developer tools

Frequently Asked Questions

What is "DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever" about?

What topics are discussed in this episode?

This episode covers the following topics: Apps SDK, AgentKit, MCP protocol, AI agent development, Developer tools.

What is key insight #1 from this episode?

The Apps SDK allows developers to integrate ChatGPT into their own applications, providing a more seamless and customizable experience.

What is key insight #2 from this episode?

The AgentKit provides a suite of tools to help developers build, deploy, and optimize AI agents more easily, addressing the challenges of developing production-ready agents.

What is key insight #3 from this episode?

OpenAI has embraced the open MCP protocol as a way to enable interoperability between their tools and other developer platforms.

What is key insight #4 from this episode?

The live demos at Dev Day showcased the progress OpenAI has made in developer tools, but also highlighted the challenges of executing flawless demos on stage.

Who should listen to this episode?

This episode is recommended for anyone interested in Apps SDK, AgentKit, MCP protocol, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

At OpenAI DevDay, we sit down with Sherwin Wu and Christina Cai from the OpenAI Platform Team to discuss the launch of AgentKit - a comprehensive suite of tools for building, deploying, and optimizing AI agents. Christina walks us through the live demo she performed on stage, building a customer support agent in just 8 minutes using the visual Agent Builder, while Sherwin shares insights on how OpenAI is inverting the traditional website-chatbot paradigm by embedding apps directly within ChatGPT through the new Apps SDK. The conversation explores how OpenAI is tackling the challenges developers face when taking agents to production - from writing and optimizing prompts to building evaluation pipelines. They discuss the decision to adopt Anthropic's MCP protocol for tool connectivity, the importance of visual workflows for complex agent systems, and how features like human-in-the-loop approvals and automated prompt optimization are making agent development more accessible to a broader range of developers. Sherwin and Christina also reveal how OpenAI is dogfooding these tools internally, with their own customer support at openai.com already powered by AgentKit, and share candid insights about the evolution from plugins to GPTs to this new agent platform. They discuss the surprising persistence of prompting as a critical skill (contrary to predictions from two years ago), the challenges of serving custom fine-tuned models at scale, and why they believe visual agent builders are essential as workflows grow to span dozens of nodes. Guests: Sherwin Wu: Head of Engineering, OpenAI Platform https://www.linkedin.com/in/sherwinwu1/ https://x.com/sherwinwu?lang=en Christina Huang: Platform Experience, OpenAI https://x.com/christinaahuang https://www.linkedin.com/in/christinaahuang/ Thanks very much to Lindsay and Shaokyi for helping us set up this great deepdive into the new DevDay launches! Key Topics: • AgentKit launch: Agent SDK, Builder, Evals, and deployment tools • Apps SDK and the inversion of the app-chatbot paradigm • Adopting MCP protocol for universal tool connectivity • Visual agent building vs code-first approaches • Human-in-the-loop workflows and approval systems • Automated prompt optimization and "zero-gradient fine-tuning" • Service Health Dashboard and achieving five nines reliability • ChatKit as an embeddable, evergreen chat interface • The evolution from plugins to GPTs to agent platforms • Internal dogfooding with Codex and agent-powered support

Full Transcript

Hey, everyone. Welcome to the Layton Space podcast. This is Alessio, founder of Kernel Labs, and I'm joined by SWIX, editor of Layton Space. Hello, hello. And we are here in the OpenAI Dev Day studio with Sherwin and Christina from the OpenAI Platform team. Welcome. Thank you for having us. Yeah, it's always... Thanks for being here. Yeah, it's such a nice thing. We've covered three of these Dev Days now, and this is the first time it's been so well organized that we have our own little podcast studio in the Dev Day venue. And it's really nice to actually get a chance to sit down with you guys, so thanks for taking the time. Yeah, I feel like Dev Day is always a process, and we've only had three of them, and we try to improve it every time. And I actually, I know for a fact that I think we have this podcast studio this time because the podcast interviews and the interviews with folks like yourselves last time went really well. And so I want to lean into a little bit more. I'm glad that we were able to have this studio for you all. We were kneeling on the ground interviewing like Michelle last year. I just saw it post-production. I thought it was. We had to have people like cordon off the area so they wouldn't walk in front of the cameras. People would just come up. Hey, good to. I'm like, we're recording. I guess if you guys have been to three, what stood out from today? Or what's your favorite part? I feel like the vibes are just a lot more confident. You are obviously doing very well. You have the numbers to show it. Every year in Dev Day, you report the number of developers. This year is 4 million. I think last year was like three. I have more questions about that kind of stuff. But also just very interesting, very high confidence launches. And then also, I think the community is clearly much more developed. I think there's just a lot more things to dive into across the API surface area of OpenAI than I think last year, in my mind. I don't know about you. Yeah, and we were at the OG Dev Day, which was the Dali hack night at OpenAI in 2022. And I think Sam spoke to like 30 people. So I think it's just crazy to see the growth. Yeah, honestly, I think it's kind of similar to this podcast studio, which is I think we've had a number of Dev Days now. We honestly were slowly figuring things out as a company over time as well, both from a product perspective and also from how we want to present ourselves with Dev Day. And at this point, we've had a lot of feedback from people. I actually think a lot of attendees will get an email with a chance for feedback as well. And we actually do read those and we act on those. And like one of the things that we did this year that I really liked were all of those like there was like some art installations and like the little arcade games that we did, which was, you know, came up with via like engaging with the feedback. Yeah, the arcade games were so fun. I loved like the theme of all the ASCII art throughout. This is my first SF dev day, but I've been to the Singapore one. That was actually my first. Oh, yeah, that's the one I spoke at. I saw you there. That was my first week of OpenAI. So really in the deep end. We're on a plane to Singapore. Yeah. Yeah, that's awesome. Well, so congrats on everything, and kudos to the organizing team. We should talk about some developer API stuff. So we're going to cover a few of the things. You're not exactly working on Apps SDK, but I guess what should people just generically take away? What should developers take away from the Apps SDK launch? How do you internally view it? So the way that I think about it is I actually view OpenAI since the very beginning as a company that has really valued kind of like opening up our technology and like bringing it out to the rest of the world. One thing we talk about a lot internally is, you know, our mission at OpenAI is to one, build AGI, which we're trying to do. But two, you know, potentially, you know, just as important is to bring the benefits of that to the entire world. And one thing that we realized very early on is that we as a company, it's very difficult for us to just bring it to every, truly every corner of the world. And we really need to rely on developers, other third parties to be able to do this, which is, you know, Greg talked about the start of the API and like kind of how, you know, that was formulated. But that was part of, you know, that mentality, which is we need to rely on developers and we need to open up our technology to the rest of the world so that they can partake for us to really fulfill our mission. So the API obviously is a very natural, you know, way of doing that, where we just literally expose API endpoints or expose tools for people to build things. But now that we have, you know, ChatGPT with its, I don't know, like 800 million weekly active users, I forgot the stat that we shared. I think it's like now the fifth or like sixth largest website in the world. And the number one and number two most downloaded on the Apple App Store. Oh, yeah, with Sora. Yeah, but that one, like, it moves around all the time, so it's kind of hard to celebrate. Just screenshot it when it's good. Yeah, we definitely screenshot it and share it when it was good. But kind of going back to my main point is like we've always kind of engaged the developers as a way for us to bring the benefits of AGI to the rest of the world. And so I view this as actually a natural extension of this. Candidly, we've actually been trying to do this a couple of times with last dev day with GPT, two dev days ago with, I'm sorry, two devs ago with GPTs and plugins, which was, I think, not tied to a dev day. So I view this as like, again, we love to deploy things so iteratively. And I view it as just a continuation of that process and also engaging deeply with developers and helping them benefit from some of the stuff that we have, which in this case is chat GPT distribution. Okay. And when, so Apps SDK is built on the MCP protocol. When did OpenAI become MCP-pilled? I'm sure internally you must have had, you know, design discussions before about doing your own protocol. When did you buy into it and how long ago was that? I think it was in March, I want to say. It's hard for me to remember kind of like the exact. March was the takeoff of MCP. Okay. Yeah, yeah. So we built the Agents SDK and we launched that alongside the Responses API in early March. And I think as MCP was growing, that felt like a really, and, you know, we're building kind of a new agentic API that can call tools and just be much more powerful. MCP was kind of like the natural protocol that developers were already using to bring all the tools into their system. And I think like in March is when we added in MCP to Agents SDK first, and then soon after with kind of our other products as well. Yeah, I think there was like a tweet or something we did where it was like OpenAI, you know, is... Yeah, there was definitely a moment. I think there was a specific moment in a specific tweet. But what I will say, though, is like, and this is honestly that credit to the team at Anthropic that kind of created MCP is I really do think they treat it as an open protocol. Like, we work very closely with I think, like David and the folks on the like, you know, consortium. And they are not, you know, really viewing it as this like, thing that is specific to Anthropic, they really view it as this open protocol, there's like, it is an open protocol, the way in which you make changes feels very open. And we actually have a member of our team, Nick Cooper, who is sitting on kind of like that steering committee for MCP as well. And so I think they are really treating it as something that is easy for us and other companies and everyone else to embrace, which I think they should because they do want it to be something that is very embraced by all. And so because of that, I think it makes it a little bit easier for us to embrace it. And honestly, it's a great protocol. It's very general. It's already solved. Why would you make that? Yeah, it's very general. There's obviously still more to do with it, but it was very easy for us to integrate because of how streamlined and how simple it was. Yeah. My final comment on apps SDK stuff, and then we'll move to AgentKit, is I always see abstractly when you wireframe a website or an AI app, it used to be that the initial AI integration on the website would be you have the normal website and then you have a little chatbot app. And now it's kind of like inverted where there's ChatGPT at the top layer, and then there's like the website embedded inside of it. And it's kind of like that inversion that I honestly have been looking for for a little bit. And I think it's really well done. Like actually all like the integrations and like the custom UI components that come up, you had like Canva on the keynote there, and it looks like Canva, but like you can chat with it in all the context of your ChatGPT. That is an experience I've never seen. Yeah, and I think that's kind of back to the iterative, learning that we've had. That I think was because we've learned a lot from plugins. So when we launched plugins, I remember one of the feedback that we got. I don't know if people here really remember plugins. It was like March 23. One of the points of feedback was like, oh, you can integrate. We told all these companies that you can integrate these plugins into ChatGPT, but they really didn't have that much control over how exactly it was used. It was really just like a tool that the model could call. And you were just like really bound by ChatGPT. And so I think like you can kind of see the evolution of our product with this. And like this time, we realized how important was for companies, for third-party developers to really own and steer the experience to make it feel like themselves, help them really preserve their own brand. And I actually don't think we would have gotten that learning had we not had all these other steps beforehand. Awesome. Christina, you were the star today on stage with the Agent Kit demo. You had eight minutes to build an agent. You had a minute to spare, and then you had some issues with the I wasn't sure. Honestly, I was like, let's do a little bit less testing and maybe we, I don't know how much time I killed on the widget. I was extremely stressed on the download thing. Yeah, I was stressed out. If a UI bug is what like takes the demo down, I'm going to be so sad. I think it was a full screen, yeah, like focus. I heard the window wasn't in focus or something. Maybe you want to introduce AgentKit to the audience. Yeah, so we launched AgentKit today. Full set of solutions to build, deploy, and optimize agents. I think a lot of this comes from working with API customers and realizing how hard it actually is to build agents and then actually take them into production. Hard to get kind of that confidence and the iterative loop and writing prompts, optimizing them, writing evals, all takes a lot of expertise. And so kind of taking those learnings and packaging them into a set of tools, that makes it a lot easier and kind of intuitive to know what you need to do. And so there's a few different building blocks that can be used independently, but they're kind of stronger together because you then get the whole end-to-end system and releasing that today for people to try out and see what they build. Yeah. So I find it hard to hold all the building blocks in my head. But actually, chronologically, it's really interesting that you guys started out with the agent SDK first. And then you have agent builder. You have a connector registry. You have chat kit. And then you have the eval stuff. Am I missing any major components? Those are the main moving parts, right? Yeah, I think that's it. And then, I mean, we also still have the RFT fine-tuning API, but we technically group it outside of the agent kit umbrella. Got it, got it, got it. Yeah, so it's weird how it develops, and it's now become the full agent platform, right? And I think one thing that I wasn't clear about when I was looking at the demo was, it's very funny because what you did on stage was uh build like a live chat app for dev days uh website yeah did you get a chance to try it out yeah i tried to try it out it was awesome and actually i kind of wanted to ask like how to deploy where's merch yeah exactly i was like where'd you where'd you click the merch anyway um i i and this is very close to home because i done it for my conferences and like it it a very similar process but like um i think what was not obvious is like how much is going to be done inside of Agent Builder I see there some actually very interesting nodes that you didn get to talk about on stage like user approval. That's like a whole thing. And, you know, like transform and set state, like there's like a kind of like a Turing complete machine in here. Yeah. Yeah. So, I mean, I think again, like this is the first time that we're showing Agent Builder. And so it's definitely the beginning of what we're building. And human approval is one of those use cases that we want to go pretty deep on, I think. The node today that I showed is pretty simple, like binary approval. Approve eject. It's similar to kind of what you'd see for MCP tools of approving that an action can take place. But I think what we've seen with much more complex workflows from our users is that it's actually quite advanced, like human in the loop interaction. Sometimes these could be over the course of weeks, right? It's not just kind of simple approval for tool. There's actual decision-making involved in it. And I think as we work with those customers, we definitely want to continue to go deeper onto those use cases, too. Yeah. What's the entry point? So are developers also supposed to come here and then do the two-code export, like just segment the use cases? Yeah, so I think the two reasons that you would come to Agent Builder are one, kind of more as a playground, right, to kind of model and iterate on your systems and write your prompts and optimize them and test them out. And then you can export it and run it in your own systems using Agents SDK, using kind of other models as well. The second would be kind of to get all of the benefits of us deploying that for you, too. So you can kind of use maybe like natural language to describe what type of agent you want to build, model it out, bring in subject matter experts so that you really have this canvas for iterating on it and getting feedback, you know, building data sets and kind of getting feedback from those subject matter experts as well. And then being able to deploy it all without needing to handle that on your own. And that's a lot of the philosophy around how we're building it with ChatKit as well, right? You can kind of take pieces of it. You can have a more advanced integration where it's much more customized. But you also get a really natural path of going live with really kind of easy defaults as well. Do you see it as a two-way thing? So I build here, I go to code, then maybe I make changes in code, and then I bring those changes back to the agent builder? Yeah, I think eventually that's definitely what we want to do. So maybe you could start off in code, you could bring it in. We'll also probably have ability to run code in the agent builder as well. And so I think just a lot of flexibility around that. The one thing I'd say, too, is a lot of the demos that we showed today, I think were, like, you know, aired on the side of simplicity just so that the audience could kind of see it. But, like, if you talk to a lot of these customers, like, they're building, like, pretty complex. Like, you've got to, like, zoom out on that canvas quite a bit to kind of, like, see the full flow. And then for us, you know, we were kind of, like, working with a lot of customers who were doing this. And then, you know, if you turn that into, like, an actual agent's SDK, like, file, it's, like, pretty long. And so we saw a lot of, like, benefit from having the visual set up here. especially as the setup grows longer and longer. It would have been a little difficult to kind of showcase this, but even on like some of the... You can do it in eight minutes. Right. Yeah, you can do it in eight minutes. But like even with some of the presets that we have on the website... Yeah, exactly. So one of the things that we launched today as well, alongside just like the Canvas, is a set of templates that we've actually gathered from our engineers who are working in the field with customers directly of like the kind of common patterns that they have in our own basically like playbooks when we're working with customers on customer support, document discovery, and so kind of publishing those as well. Data enrichment, planning helper, customer service, structured data Q&A, document comparison, that's nice. Internal knowledge assistant. Yeah. And I think we just plan to add more to those as we can kind of build those out. I always wonder if there should be, so we're not the only agent builders, but obviously by default of being an open AI, you are a very significant one. Any interest in like a protocol, like interop between different open source implementations implementations of this kind of pattern of agent builder? I think we've thought about it, especially around, I'd say, agents SDK. I would actually say maybe even zooming out a bit more from just this is like, yeah, we were also sitting here and kind of observing things being made over and over again. Even besides agent workflows, we're kind of watching what the industry is trying to do with responses, like what we've done with responses API, like stateful APIs. And so obviously, we were the first one to launch responses API, but a couple of other people have kind of adopted it. I think Grok has it in their API. I think I saw LMSYS just did something, but not everyone. And so unfortunately, I don't have a great answer today of yes or no, but we are kind of assessing everything and trying to see, hey, there has been a lot of value with MCP, hopefully with our commerce protocol as well. ACP, yeah, I definitely did not forget the name. And so even thinking about what we want to do with agents, with the agent workflow, the portability story around that, as well as the portability, I'd say, even of responses API would be great if that could be a standard or something. And developers don't need to build three different stateful API integrations if they want to use different models. Yeah, and I think that's one of the, so it's not exactly a protocol, but one of the things that we launched today with evals too is ability to use like third party models as well and kind of bring that into one place. And so I think definitely kind of see where the ecosystem is at, which is, you know, using multi-models and kind of having. Third party models as in non-open-air models? Yeah, yeah. It'll work with evals starting today. Yeah. Okay, got it. We have a really cool setup with Open Router where we're working with them and then you can bring your Open Router setup. And then with that, you can actually write your evals using our data set tool to create a bunch of evals. And you'd actually be able to hit a bunch of different model providers, take your pick from wherever, even open source ones on together, and see the results in our product. Yeah, that's awesome. um speaking more about evals right like uh i think i saw somewhere in the release docs that you basically had to expand the evals products a little bit to uh to allow for agent evals um maybe you can talk about like what you had to do there yeah yeah i was gonna say so the i actually think agent evals is still a work in progress so i think we've like made maybe 10 of the progress that we need um here um for example i think we could still do a lot more around multimodal evals. But the main progress that we made this time was kind of allowing you to take traces. So the Agents SDK has a really nice traces feature where if you define things, you can have a really long trace, allowing you to use that in the evals product and be able to grade it in some way, shape, or form over the entirety of what it's supposed to be doing. I think this is step one. I think it's good to be able to do this. But I think our roadmap from here on out is to really allow you to break down the different parts of the trace and allow you to eval and measure each of those and optimize each of those as well. A lot of times, this will involve human in the loop as well, which is why we have the human in the loop component here too. But if you look at our evals product over the last year, it's been very simple. It's been much more geared towards this simple prompt completion setup. But obviously, as we see people doing these longer agentic traces, how do you even evaluate a 20-minute task correctly? And it's a really hard problem. We're trying to set up our evals product and move in that way to help you not only evaluate the overall trajectory, but also individual parts of it. Yeah. I mean, the magic keyword is rubrics, right? Everyone wants LMS, judge rubrics. Yeah, yeah, yeah. Obviously, we're missing this logo. Okay, great. The other thing I think online I see the developer community are very excited about is sort of automated prompts optimization, which is kind of evals in the loop with prompts. What's the thinking there? Where's things going? Yeah, so we have automated prompt optimization, but again, I think this is an area that we definitely want to invest more in. I think we did a pretty big launch of this when we launched GPT-5, actually, because we saw that it was pretty difficult as new models come out to kind of learn all the quirks about a new model. Yeah, the prompt optimization. We have a big prompting guide for every model that we launch, and I think building out a system to make that a lot easier. We definitely want to tie that in completely with evals. that we should be able to kind of improve your prompts over time, improve your agents over time as well, if they're kind of made in the agent builder, based on the emails that you've set up. And so I think we see this as like a pretty core part of the platform of basically suggested improvements to the things that you're building. I actually think it's a really cool time right now in prompt optimization. I'm sure you guys are seeing this too. It's like not only are there a lot of products kind of like gearing around this, so like kind of what we're thinking about, but I also think like there's a lot of interesting research around this, like GEPA with like the Databricks folks are actually doing really cool stuff around. uh this um we're obviously not doing any of the cool gpa optimization right now and in our product but uh we'd love to we'd love to do that soon and um also it's just an active research area so like you know whatever uh mate and the databricks folks like might think about next what we might you know think about internally as well um uh whatever new prompt optimization techniques come out i think we'd love to be able to have that in our in our product as well um and yeah and it's interesting because it's coming at a time when people are realizing that prompt you know like like i feel like two years ago, people were like, oh, at some point, prompting is going to be dead. No. Like, you know, and it's like, you know. It's gone up. Yeah, yeah, yeah. And if anything, it is like become more and more entrenched. And I think that, you know, there's this interesting trend where like it's becoming more and more important. And then there's also interesting, cool work being done to like further entrench, like prompt optimization. And so that's why I just think it's like a very fascinating, you know, area to follow right now. And also is an area where I think a lot of us were wrong two years ago, because if anything, it's only gotten more important. Yeah, I would say like what, Shunyu used to work at OpenAI, now it's an MSL. We call this kind of like zero gradient fine tuning or zero gradient updating because you're just tweaking the prompts. But like, it is so much prompt that it's actually like you end up with a different model at the end of it. There's a lot of like things that make it more practical too, just like even from our perspective, like we have a fine tuning API and like it is extremely difficult for us to run, you know, and serve like all of these different snapshots. Like, you know, Laura's great. And Thinking Labs just published, John Schumann just had a cool blog post about this. But man, it is pretty difficult for us to manage all of these different snapshots. And so if there is a way to hill climb and do this zero gradient optimization via prompts, yeah, I'm all for it. And I think developers should be all for it because you get all these gains without having to do any of the fancy fine-tuning work. Since you are part of the API team, you lead the API team, and since you mentioned Thinky, I gotta throw a cheeky one in there. What do you think about the Tinkerer API? So yeah, it's a good one. So it's actually funny, when it launched, I actually DMed John Schulman. And I was like, wow, we finally launched it. So the- Because you used to work with him. Yeah, yeah. So we it actually funny So at yeah so right when I joined OpenAI like this has actually been I think a passion project of John Like he been talking about doing something in this uh like in in this shape for a while, which is like, uh, a truly like low level, um, research, like fine tuning library. And so we actually talked about it quite a bit. Um, uh, when he was at open AI as well, it's actually funny. I talked to one of my friends who said that when he was at Anthropic, he also, you know, worked on the idea for a bit and i think he's a man on a mission yeah i mean john's like so great in this regard he's he's like so purely just like interested in the impact of this because it's one it's one it's like a really cool problem and then two it also empowers builders and researchers like you saw all the researchers who like express all this love for tinker because it is a really great great product and so um i'm just really happy to see that they they shipped it and uh i think he was really happy to kind of get it out there in the world as uh as well yeah it's this is probably this is very much a digression but like it's weird as someone passionate about api design that it took this long to find a good fine-tuning api abstraction which is effectively all he wanted he was like like guys like i don't want to worry about all the infra like i'm a researcher i just want these four functions and it's kind of interesting yeah yeah cool before the opening icoms team barges in the room i know so what feedback do you want from people like the agent builder. For example, the thing I was surprised by was the if-else blocks not being natural language and using the common expression language. I'm sure that's something already on your roadmap. What are other things where you're kind of like at a fork that you would love more input on? I think like one of the things that we spent a lot of time discussing was like whether we want kind of more of like the deterministic workflows or more LLM-driven workflows. And so I think like getting feedback on that, honestly, having people model existing workflow, a lot of what we did was kind of work with our team on, especially with engineers who are working with customers, like modeling the workflows that already exist in the agent builder and like what gaps exist, like what types of nodes are really common and how can we like add those in. I think that would be like the most helpful feedback to get back. And then as we expand kind of from just like chat-based, like right now the initial deployment for agent builders through ChatKit, we plan on kind of releasing more standalone like workflow runs as well and kind of the types of tasks that people would like to use in that type of API. So like more modalities, for example. Yeah. I mean, I think like for sure, like more modalities, like, you know, I think kind of voice would be, is already something that a lot of people have talked to us about even today at DevDay. So I think modalities for sure, but also more like the logical nodes of what can't be expressed today. Yeah. Well, you know, you're building a language, right? You have common expression language, which I never heard of prior to this. I thought this was this Python, this is JavaScript, and then there was a whole link in there. Was that a big decision for you guys? You know? I think that was more just kind of like a way that we thought we could kind of represent a mix of like the variables and I don't know, like conditional statements. The other thing I'll also mention is that you let, once you, so there's a trope in developer tooling where anything that can store state will eventually be used as a database, including DNS. So be prepared for your state store to become a database. I don't know if there's any limits on that because people will be using it. It's actually funny. I'd heard this quote before, and there's definitely some truth to it. I don't know if our stateful APIs have become a database. Just quite yet, but who knows? I'm in conversation with you. Well, you charge for it. You charge for assistance. Storage, yeah. The storage. Yeah, yeah. So there's some limit on that. Yeah, but it's very cheap. I remember we priced it. I think if you wanted to kind of like dump all your data somewhere, I don't know, this is like the most, like transforming it all into this shape. It's useful, it's easy. It's the best place to put it, but yeah. But also please don't do this because I think it'll put quite a bit of strain on Ventod and our info team and what we try and do, so yeah. How do you think about the MCP side? So you have OpenAI first-party connectors, you have third-party preferred, I guess, servers, you would call them, and then you have open-ended ones. Do you see that part of registry-like functionality expanding, or do you see most of it being user-driven? Auth is the biggest thing. If you add Gmail and Calendar and Drive, you have to auth each of them separately. There's not a canonical auth. What's the thinking there? Yeah, I think definitely for the registry, that's why we want to make it a lot easier for companies to manage what their developers have access to, managing the configurations around it. And I think in terms of like first party versus third party, like we want to support both of those. We have some direct integrations and then anyone can kind of create MCP servers. I think we want to make that a lot easier to like establish kind of private links for companies to use those internally. So I think like just really excited about that ecosystem growing. Yeah, I think one of the coolest things observed, too, is just I actually think we as an industry are still trying to figure out the ideal shape of connectors. So, I mean, part of why I think the 1P connectors exist, too, like we end up storing quite a bit of state. It's like a lot of work for us. But like by having a lot of state on our side, we call them sync connectors, we can actually end up doing a lot more creative stuff on our side when you're chatting with ChatGPT and using these connectors to kind of boost the quality of how you're using it, right? Like if you have all the data there, you can do all this like re-ranking. You can like do we can put in a vector store if you want to put it anywhere else. Whereas and so there's some inherent tradeoffs here where like you put in a lot of work to get these like 1P connectors working. But because you have the data, you can do a lot more and get higher quality. But then the question is like, oh, my God, there's such a long tail of other things, which is where the MCP and the third-party connectors come in. But then you have the trade-off of you're beholden to the API shape of the MCP creator. It might actually work well. It might not work well with the models. And then what happens if it doesn't work well? Then you're kind of at the mercy of this. And MCP, by the way, is really great because it already does some layer of standardization. But my sense is they're still going to be more evolving here. And I think we want to support both of them because we see value in both right now, especially working with developers. We want to have kind of like all options kind of on the table here. But it will be interesting to see how this evolves over time. Yeah. When I saw about three, four months ago when you launched the forum for like signing with chat GPT interest, I think to me that's kind of like the vision where I log in and I have the MCPs tied in and then I sign in with chat GPT somewhere. And I can run these workflows in that app where I'm logging in. So, yeah, I think Sam, you know, said in an interview that he used ChatGPT as like your personal assistant. So I think this is like a great step in that direction. Yeah, I think there's a lot more to go in that direction. But so far, no plan on like ChatGPT or OpenEIS IDP, right, which is a different role in the off ecosystem. Yeah, it's interesting because so direct answer is like no plans right now, of course. But I actually think we currently have some version of this, which is our partnership with Apple. Because with Apple, you can actually sign in to your ChatGPT account. And some of that identity does carry with you into your iOS experience with Siri. I don't know if you've actually used the Siri integration. I actually use it quite a bit. But if you sign into your ChatGPT account, the Siri integration will actually use your subscription status to decide what type of model to use when it passes things over to ChatGPT. And so if you're, you know, just a free user, you get, you know, the free model. But if you're a Plus or Pro subscriber, you get routed to GBT5, which is, I think, what they... I think we also recently announced the partnership with Kakao. Oh, yeah. Kakao is another one. Yeah, where I think it's a similar thing where you can sign in with ChatGPT. Kakao is one of the largest, like, messenger apps in Korea and kind of interact with Kakao directly there. Yeah, I mean, Sam's been talking about it for a while. It's a very compelling vision. And we obviously want to be very thoughtful with how we do it. You know, now you have a social network, you have a developer platform. You know, my strategy was very, very valuable. Yeah, exactly. Okay, so and then on the other side of the office, something I was really interested to look at, and I couldn't get a straight answer. Is there some form of bring your own key for AgentKit? Like when I expose it to the wider world, obviously, like I mean, by default, I'm paying for all the inference. But it'd be nice for that to have a limit. And then if you want more, you can bring your own key. Yeah. I mean, we don't have something like that yet. But I think, yeah, it's definitely an interesting area too. Yeah. It doesn't do it out of the box today. But developers have been asking about it for forever. It's a really cool concept because then as a developer, especially a new developer, you don't need to bear the burden of inference. Yeah. I think when you get into the business of agent builders that are publicly exposed, where you have an allow list of domains, it rhymes with this exact pattern of someone has to bear the cost. Sometimes you want to mess around with the different levels of responsibility. Yeah. I will say in general, if you kind of look at our roadmap, we engage a lot with developers. We kind of hear what are the pain points, and we try and build things that address it. And ideally, we're prioritizing in a way that's helpful. but yeah we've definitely heard from a good number of developers that like the cost is or like all of the like copy paste your key like solutions right now which are like huge security like hazards because developers don't want to bear the burden of of inference you know hopefully we make the cost cheaper so it's the models keep getting cheaper yeah yeah so hopefully you know that helps but uh but what we realized is as we make it cheaper you know the demand for that goes up even more and you end up you know still spending quite a bit but um yeah so we definitely heard this from a lot of developers and it's definitely something top of mind yeah do you see this as mostly like an internal tools platform though like to me like you've been doing a big push on like the more forward deployed engineering things it's almost like hey we needed to build this for ourselves as we sell into these enterprises might as well open it up to everybody what drives drive building these tools like you think of people building tools to then expose or mostly on the internal side yeah i mean and so like i think our again our first deployment is chat kit which is intended to be for external users. But I think one of the things that we also did see a lot as we were working with customers is that a lot of companies have actually built some version of an agent builder internally to manage prompts internally, to manage templates that they're sharing across the different developers that they have, maybe the different product areas. And we were seeing that over and over again as well and really wanted to build a platform so that this is not an area that every company needs to invest in and rebuild from scratch, but that they can have a place where they can manage these templates, manage these prompts, and really focus on the parts of agent building that is more unique to their business. It is interesting, too. From a deployment perspective, it has spanned both internal and external use cases, right? Kind of like these internal platforms, people use it for data processing or something, which is an internal use case. But if you saw some of the demos today like there have been a huge number of companies that are trying to do this for external facing use cases as well Customer service is one template in here Customer service the like ramp use case We this internally and externally Our customer support help already powered on AgentKit and then various other internal use cases as well And one of the things that I actually think the team has done a really great job of, so Tyler, David, and G1 on the team, they built the, especially the chat kit components, they built it to be very consumer-grade and very polished. You kind of look at that, there's a whole grid of the different widgets and things that you could create there. Like, ideally, people see it and they see it as, like, these very polished, like, consumer-grade-ready external-facing things versus, like, you know, you think of internal tools and, like, the UI is always, like, the last thing that people care about. But, like, you really, you know, push the team. And I think they did a really great job of making the chat kit experience, like, really, really consumer-grade. And it should feel almost like ChatGPT and with, like, really buttery smooth animations and, like, really responsive designs and all of that. Yeah, I think your point on widgets is, like, definitely, like, really resonates, right? Because ChatKit, it handles the chat UX, but we're also just building like really visual ways for you to represent like every action that you want to take. And that is definitely like very high polished. Yeah. And when working with customers, like those have been the most helpful customers for us to work with. Because, you know, when Ramp is thinking about, you know, how what what they want to publicly present to people, like they have a pretty high bar, as they should, as well as, you know, all the other customers that have been iterating on it. And so that kind of feedback from our customers has really helped us up level the general product quality of the launch that we had today as well. Yeah. Would you open source ChatKit? Talked about it. We talked about it. There are a bunch of tradeoffs. I think so. So ChatKit itself is like an embeddable iframe. And so I think the actual. Iframe. Yeah. And so that helps us keep it like evergreen. Right. So if you are using ChatKit and we come up with new, I don't know, a new model that reasons in a different way, right, or kind of new modalities that you don't actually need to rebuild and, like, pull a new component to use it in the front end. I think there's parts of, you know, widgets, for example, that is much more like a language and can definitely is something that is easier to explore that for as well as kind of the design system that we've built for ChatKit. But I think as part of the actual iframe itself, I think there's a lot of value in that being more evergreen experience that is pretty opinionated. There'd be no point in being open source. Then you don't get the benefits of it. Being Stripe alums, Stripe checkout. It's all optimized for you. So I'm not a Stripe alum, but Christina is. And the team actually is the team that built Stripe checkout. Yeah, so it's very similar philosophically, right? So Stripe, you know, can build elements and check out and not every business needs to rebuild, right, the pieces that are really common. And I think we see the same with chat. We see chat being built over and over again, especially as we kind of come up with new, you know, modalities like reasoning, everything. It's not really something that is easy to keep up to date. And so we should just do that and leave kind of the hard parts of building agents again to the developers. Does it feel, I mean, I know WordPress is like a bad connotation in a lot of circles, but to me it almost feels like the WordPress equivalent of like chat is like, hey, this is like drop in thing. And then you have all these different widgets. Do you see the widget becoming a big kind of like developer ecosystem where people share a widget? Is that kind of like a first party thing? And then what's like the MCP versus widget forest. No, exactly. I mean, it's kind of like, it seems great for people that are like in between being technical and like not really being technical enough. Yeah. Yeah. I mean, I think that's a big part of building widgets, right? Like it's already kind of in the language that is very consumer friendly. You can use, in our widget builder already, you can kind of use AI to create those widgets and they look pretty good. I don't know if you guys have gotten a chance to try that out yet, but definitely see kind of, I don't know, a forest. If you haven't tried out the Widget Studio and the demo apps as well, yeah. You got a custom domain like Widget.studio, which is cool. I actually don't know how we got that. Yeah, everything's in ChatKit.studio. And then we have the playground there. So you can try out what ChatKit would look like with all the customizations. We have ChatKit.world, which is a fun site we built. I was spinning the globe for a while this morning. It was like a fidget spinner. Kasia also uploaded some of her solar system stuff and all the demos as well. Yeah, and then that's where the widget builder. Yeah, so it's really come together. It's taken almost more than a year to come together and build all this stuff, but it's coming together. Yeah, it's something that we... You definitely planned all of this up front. Oh, yeah, yeah. We have the master plan from three years ago. So no, but like, I think, especially on this stuff, I think there was like an arc of a general, like, you know, platform that we did want to kind of build around. And it takes a while to build these things. Obviously, Codex helps speed it up quite a bit now. But yeah, I will say it does seem great to kind of like start to have all the pieces start fitting together. I mean, you saw we launched evals and we got the fine tuning API for a while. And we laid all the groundwork for some of the stuff over the last year. And we're hoping that we can eventually make it into this full feature platform that's helpful for people. I think you have. Since you did the Codex mention, maybe a quick tip from each of you on Codex Power User tools or tips. So there's actually a funny one that one of the new grads has, I think, taught our team in general. And I think this is a point for just how new grads and younger generation people are actually more AI native. So one of them is to really lean in to push yourself to trust the model to do more and more. So I feel like the way that I was using Codex, and so for me, it's usually for my personal projects, they don't let me touch the code anymore. But you give it small tasks. So you're not really trusting it. I view it as this intern that I really don't trust. but what a lot of the like so we had an intern class this year but a lot of the interns would do is just like full yolo mode like trust it to like write the whole feature and it like it doesn't work or worse it like doesn't work sometimes but like i don't know like 30 40 percent of the time it's just like one shots it i actually haven't tried this with like codec gbd5 codecs i bet it i bet it probably like one shots it even more um but one tip that i'm like starting to like i feel like undo this like like relearn things here uh is to like really lean into like the agi component of it and just like really let the model rip and like kind of trust it because a lot of times they actually do stuff that surprises me and then i have to like readjust my priors whereas before i feel like i was in this like safe space of like i'm just treating this i'm giving this thing like a tiny bit of rope yeah and uh uh and because of that i was kind of limiting myself with how effective i could be like sure but okay but also is there an etiquette around submitting effectively you know vibe coded prs that someone else now has to review right and it's like it can be offensive Codex do reviews now. It actually reviews itself. Does Codex approve its own PRs a lot more than humans? It doesn't get approved them. I was going to say, I think the Codex PR reviews are actually one of the things that my team very much relies on. I think they're very, very high quality reviews. On the Codex PR side, for the Visual Agents Builder, we only started that probably less than two months ago. And that wouldn't be possible without Codex. So I think there's definitely a lot of use of codecs internally, and it keeps getting better and better. And so, yeah, I think people are just finding they can rely on it more and more. And it's not, you know, totally vibe coded. It's still, you know, checked and edited, but definitely as a kicking off point. And I think I've heard of people on my team, it's like on their way to work, they're like kicking off like five codecs tasks because the bus takes 30 minutes, right? And you get to the office and it kind of helps you orient yourself for the day. You're like, okay, now I know the files. I have the rough sense. It's like, maybe I don't even take that PR, and I actually just still code it. But it helps you just context switch so much faster, too, and be able to orient yourself in a code base. There are so many meetings nowadays where I have one-on-ones with engineers, and I walk into the room. They're like, wait, wait, wait. Give me a second. I got to kick off my codex thing. I'm like, oh, sorry. We're about to enter async zone. It's almost like your notes, right? You're like, let me. And they're typing like, OK, now we can start our one-on-one, because now it's great. Yeah. Cool. We're almost out of time. I wanted to leave a little bit of time for you to shout out the Service Health dashboard, because I know you're passionate about it. Oh, yeah. Well, tell people what it is and why it matters. Yeah, so this is a launch that we actually didn't, you know, it didn't get any stage time today, but it's actually something I'm really excited about. So we launched this thing called the Service Health Dashboard. You can now go into your usage or, like, your settings account and kind of see the health of your integration with our OpenAI API. And so this is scoped to your own org. So basically, if you have an integration that's running with us doing a bunch of tokens per minute or a bunch of queries, it's now tracking each of those responses, looking at your token velocity, TPM that you're getting, the throughput, as well as the responses, the response codes. And so you can see kind of like a real-time personal SLO for your integration. The reason why I care a lot about this is obviously over the last year, we've spent a lot of time thinking about reliability. We had that really bad outage last December, longest like three, four hours of my life and then had to, you know, talk to a bunch of customers. We haven't had one that bad since, you know, knock on wood. We've done a bunch of work. We have an Infer team led by Venkat and they've been working with Janna on our team and they've just been doing so much good work to get reliability better. And so we actually, again, knock on wood, we think we've got reliability in a spot where we're like comfortable kind of putting this out there and kind of like letting people actually see their SLO. And hopefully, you know, it's three, four, soon to be five nines. But the reason why I cared a lot about it is because we spent so much time on it and we feel confident enough to kind of have it behind the product now. Five nines is like two minutes of outage or something. Yeah, yeah. We're working to get to five nines. Yeah. What is an extra nine take? It's exponentially more work. So, you know, and then, but like we always, we were, you know, in the last couple of years we're talking about like hitting three nines and hitting three and a half nines and then hitting four nines um uh but yeah it's it's exponentially more work i could i could go for a while on the on the different different topics but uh we'll have to do that in a follow-up like i mean that's all that's the engineering side right yes yes yes like you're surveying six billion tokens per minute we actually zoom past that yeah that's the that's the it's outdated yeah but um yeah it's been crazy the growth that we've seen um awesome i know we're out of time it's been a long day for both of you So we'll let you go, but thank you both for joining us. Yeah. Yeah. Thanks for having us. Thanks. Thank you. That's it. How was that? That was great. Okay. We had the mics off or. The thing I didn't want to say on the podcast was on the Tinker thing.

Share on X Share on LinkedIn

Related Episodes

⚡️Jailbreaking AGI: Pliny the Liberator & John V on Red Teaming, BT6, and the Future of AI Security

Latent Space

AI to AE's: Grit, Glean, and Kleiner Perkins' next Enterprise AI hit — Joubin Mirzadegan, Roadrunner

Latent Space

World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI

Latent Space

⚡️ 10x AI Engineers with $1m Salaries — Alex Lieberman & Arman Hezarkhani, Tenex

Latent Space

Anthropic, Glean & OpenRouter: How AI Moats Are Built with Deedy Das of Menlo Ventures

Latent Space

⚡ Inside GitHub’s AI Revolution: Jared Palmer Reveals Agent HQ & The Future of Coding Agents

Latent Space

Comments

No comments yet

Be the first to comment

DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever

What You'll Learn

AI Summary

Key Points

Topics Discussed

Frequently Asked Questions

Episode Description

Related Episodes

⚡️Jailbreaking AGI: Pliny the Liberator & John V on Red Teaming, BT6, and the Future of AI Security

AI to AE's: Grit, Glean, and Kleiner Perkins' next Enterprise AI hit — Joubin Mirzadegan, Roadrunner

World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI

⚡️ 10x AI Engineers with $1m Salaries — Alex Lieberman & Arman Hezarkhani, Tenex

Anthropic, Glean & OpenRouter: How AI Moats Are Built with Deedy Das of Menlo Ventures

⚡ Inside GitHub’s AI Revolution: Jared Palmer Reveals Agent HQ & The Future of Coding Agents

AI Curator

Ask me anything about AI