

Amp: The Emperor Has No Clothes
Latent Space
What You'll Learn
- ✓AMP was created as a disruptive new offering to avoid disrupting Sourcegraph's existing enterprise Kodi product
- ✓The goal is to build the best coding agent, which requires rapid iteration and adaptation to the fast-changing AI landscape
- ✓AMP is growing over 50% month-over-month, with some teams spending hundreds of thousands per year
- ✓The team operates in a 'duct tape and personal project' mode, with no formal code reviews or scaling assumptions
- ✓Existing engineering practices had to be abandoned to enable the speed and flexibility required for AMP
AI Summary
The podcast discusses the transition from Sourcegraph's Kodi product to their new AI-powered developer tool, AMP. The hosts explain how AMP was created as a disruptive new offering to avoid disrupting Sourcegraph's existing enterprise customer base. They highlight the need for rapid iteration and a willingness to constantly adapt to the fast-changing AI landscape, rather than trying to extend an existing product. The discussion covers AMP's growth, pricing model, and the internal challenges of running a fast-moving, experimental project alongside a more established business.
Key Points
- 1AMP was created as a disruptive new offering to avoid disrupting Sourcegraph's existing enterprise Kodi product
- 2The goal is to build the best coding agent, which requires rapid iteration and adaptation to the fast-changing AI landscape
- 3AMP is growing over 50% month-over-month, with some teams spending hundreds of thousands per year
- 4The team operates in a 'duct tape and personal project' mode, with no formal code reviews or scaling assumptions
- 5Existing engineering practices had to be abandoned to enable the speed and flexibility required for AMP
Topics Discussed
Frequently Asked Questions
What is "Amp: The Emperor Has No Clothes" about?
The podcast discusses the transition from Sourcegraph's Kodi product to their new AI-powered developer tool, AMP. The hosts explain how AMP was created as a disruptive new offering to avoid disrupting Sourcegraph's existing enterprise customer base. They highlight the need for rapid iteration and a willingness to constantly adapt to the fast-changing AI landscape, rather than trying to extend an existing product. The discussion covers AMP's growth, pricing model, and the internal challenges of running a fast-moving, experimental project alongside a more established business.
What topics are discussed in this episode?
This episode covers the following topics: AI-powered developer tools, Rapid product iteration, Disrupting existing business models, Adapting to a fast-changing technology landscape, Agile development practices.
What is key insight #1 from this episode?
AMP was created as a disruptive new offering to avoid disrupting Sourcegraph's existing enterprise Kodi product
What is key insight #2 from this episode?
The goal is to build the best coding agent, which requires rapid iteration and adaptation to the fast-changing AI landscape
What is key insight #3 from this episode?
AMP is growing over 50% month-over-month, with some teams spending hundreds of thousands per year
What is key insight #4 from this episode?
The team operates in a 'duct tape and personal project' mode, with no formal code reviews or scaling assumptions
Who should listen to this episode?
This episode is recommended for anyone interested in AI-powered developer tools, Rapid product iteration, Disrupting existing business models, and those who want to stay updated on the latest developments in AI and technology.
Episode Description
Quinn Slack (CEO) and Thorsten Ball (Amp Dictator) from SourceGraph join the show to talk about Amp Code, how they ship 15x/day with no code reviews, and why subagents and prompt optimizers aren’t a promising direction for coding agents. Amp Code: https://ampcode.com/ Latent Space: https://latent.space/ 00:00 Introduction 00:41 Transition from Cody to Amp 03:18 The Importance of Building the Best Coding Agent 06:43 Adapting to a Rapidly Evolving AI Tooling Landscape 09:36 Dogfooding at Sourcegraph 12:35 CLI vs. VS Code Extension 21:08 Positioning Amp in Coding Agent Market 24:10 The Diminishing Importance of Model Selectors 32:39 Tooling vs. Harness 37:19 Common Failure Modes of Coding Agents 47:33 Agent-Friendly Logging and Tooling 52:31 Are Subagents Real? 56:52 New Frameworks and Agent-Integrated Developer Tools 1:00:25 How Agents Are Encouraging Codebase and Workflow Changes 1:03:13 Evolving Outer Loop Tasks 1:07:09 Version Control and Merge Conflicts in an AI-First World 1:10:36 Rise of User-Generated Enterprise Software 1:14:39 Empowering Technical Leaders with AI 1:17:11 Evaluating Product Without Traditional Evals 1:20:58 Hiring
Full Transcript
Hey, everyone. Welcome to the Latent Space Podcast. This is Alessio, founder of Kernel Labs. And today there's no SWIX. He's in Europe with AI Engineer, but I'm joined by Quentin Torsten from Sourcegraph. Welcome. Thanks. Great to be here. Great to be here. So we already had origin of Sourcegraph with Bianca and Steve. So we'll put the link in there. And this was when you launched Kodi. And today, I guess, Kodi is a brand that has passed and now of AMP. Let's maybe start there. Obviously, when you're CEO of Sourcegraph. Thurston, what's your role, I guess, title? How do you describe what you do? CEO is much easier. I'm not going to name my internal title, but I'm... The dictator of AMP. Yeah, that's the internal title, yeah. But it's, yeah, I'm the lead engineer. I'm one of the creators of AMP, yeah. So were you part of like the thumbs up, thumbs down on Kodi brand? Like, how did you get to AMP? Let's tell that story. I mean, I'll start. You can jump in. But basically, I came back to SoftSquare February. And then this was when Cloud3537 happened too. And then Quinn and I started hacking on, you know, what if we just take Cloud37? And what if we give it just tools and let it go nuts? You know, like no constraints, no, a lot of the other stuff that we had in Kodi, which works for Kodi. Let's just start trying this out. and we started a new project and we were, you know, I remember first weekend SF where I would stand up in the middle of the room, like, Quinn, you gotta, you gotta see this, like, this is crazy. And then he was like, okay, let me try this. And then we went off from there and then we realized relatively quickly that it's a different kind of product where Cody was very much, first of its kind with Rack and Assistant Panels, Assistant Sidebar, but with, you know, a tool calling agent where I define an agent as a model, a system prompt and tools and tool prompts that go along with this that you give a lot of permissions for. So it can actually, you know, see the file system, interact with the file system or your editor. It's a different thing. And we realized we got to handle this differently. We got to reset expectations. We got to tell users that it's a different thing and they got to use it differently in some sense. And also that we cannot make it work with a $20 subscription, which back then was seen as a, you know, offensive thing to say. And now they're charging money. Yeah, exactly. But now, you know, people are paying hundreds of dollars per month, which I've been saying this every day for the last two weeks. That's crazy to me still, like how far we've come. So, you know, this is just how it started. Like, okay, this is a different thing. We were astonished, surprised, amazed by what these models can do. So we decided let's reset expectations. Let's tell a new story. We have enterprise customers for Kodi, but they have expectations. We have contracts. These are large contracts, long running contracts. And you can't just say, guys, here's a new mode. It costs whatever, how many much dollars more. It works completely differently. You need to hold it in a different way. So in order to avoid this and to avoid being disrupted, you create a new thing that kind of disrupts the business on its own, you know? That's, I don't know, want to add? Yeah. The only thing that matters is building the best coding agent. Nothing else matters. Because if you can build that, that's way bigger than anything else that came before. And to be clear, nobody has built that yet. We are getting better and better. But I think you've seen this treadmill of tools that you use as a dev. First, it started with Copilot. And then Cody, we were really good at Chatrag. And then Cursor and Windsurf showed that kind of IDE forks and partially agentic things could get better and better. And then, you know, the next generation AMP and Cloud Code. And now you're already seeing people say, oh, well, Codex is better than Cloud Code. And there's not been any tool that has stuck with devs for more than six or 12 months or something. Six months, yeah. And we saw that firsthand. We're now on our second iteration and we are able to move so much faster given that it has a totally different name, totally different brand. And some people don't even know that Sourcegraph or the people behind Kodi made AMP. That has been so good. So I do not know how, if you had an AI tool that was relevant nine or 12 months ago, how you can even bring the same brand and same customer contracts along with you and make a good product. It is so liberating to be able to say totally different. On the technical level, Kodi was or is, you know, it's a Sourcegraph product. So it's kind of, it works with the Sourcecraft platform. That means you're tied to the release cycle of the Sourcecraft platform. And Sourcecraft is in the cloud. We have, you know, cloud versions of Sourcecraft, but also on-prem for some customers. Completely different game. And with AMP, we basically said, let's not do this. Let's build something that allows us to ship 15 times a day. And that's what we've been doing over the last six months. Like we're still doing this and it's a game changer, not just, you know, anybody who's done this knows this, But internally and externally, you need to reset expectations that this is a new way of how we build software. And having a new project with a new way to do it is, I think, a better way to do it than to try and get like the old to move in this new way. Because it would take longer. Are there any numbers that you share about developers, like, you know, AMP usage overall? It's growing really fast. It's growing more than 50% month over month, a lot faster in, you know, some weeks. And really what we have seen too is there's a huge change in who's using it. So we have teams with like two or three people that are on annual run rates of like hundreds of thousands of dollars. So that's it. We also made a decision to not try to go to every single dev in an enterprise, which we had done with Cody. We pick off the people that want to move as fast as we want to move, that want to stay at the model product frontier like us. So it's all about just being able to move really fast. And I think that the way that agents work today, most of them are used in your editor or CLI interactively. You have one agent at most running with you at all times. That's going to be blown up with async agents when they're running 24-7 concurrently in the background. Then you can have 10 or 100 times as many, and that's going to dominate inference. That's going to dominate the output you get. So it's really, you know, AMP is growing really fast, But it's more about how do we get to be the first ones with that like 10 to 100x improvement. And everything is about how can we move fast and learn along the way. It just so happens that we are positive gross margins along the way. I would say that's one of the biggest axioms that we have with AMP is that we don't know where this ride is going. But what we do know is that it's changing every few months. and you know start of the year right cursor was the king and the biggest fastest growing site of all time now if you were to ask a lot of developers what do you think is the dev tool king i don't think they would name cursor as the first one and then in i think a couple months later or maybe a couple months before somebody this was from somebody in sales they said like i don't know what it was but they were basically saying blah blah blah makes cursor look like github copilot you know like makes it look old and boring and enterprisey and this is like just think about this like copilot is not that old like it was state of the art i don't know maybe two years ago or something and now the world has changed completely and we know that this is not over yet like that the changes are still coming so from an engineering and business perspective this is priority number one position yourself in a way that you can react to these changes and position your product and your expectations and your technical code base in a way that lets you react to these things as fast as possible. And then everything else flows from everything else we've done is basically based on this, like that everything can change at, you know, release of another model or something. But how are you doing it internally from a team perspective? Because, you know, obviously you have a lot of customers already on the SourceGraph product. There's kind of like this tension of, you know, going founder mode and kind of burning the bridge on maybe some of the old use cases versus having a smaller team and a dictator for a new product. How does that look like from like a building the company perspective? When you have a really popular, successful product that's highly profitable, that funds a lot of this craziness. And we're able to do this also with the customer trust. So there's a lot of things on AMP that we do, like no consistent pricing, no user model choice, no checking off all the boxes that security and compliance and legal want that takes nine months, we're able to get away without doing that stuff because we have that customer trust. So that has been a big thing. It requires you to totally change how you think about an existing business. It's not a way to sell through that same channel to those same users. It's a way to use that trust and that revenue to fund crazy stuff that you got to do. But it's something that we deal with all the time. And we've got really smart devs. And yet it is hard for people to throw away everything that they have learned about how to build software development. And so in some cases, it's been really refreshing to have people that have only ever been at tiny, like one person companies. And they come here and they have no preconceived notions about how you do planning or anything like that. And that is, it's great because you can throw all of that out of the window. Yeah. We've had, this was radical in some sense that when we started it was quinn and i working on main no code reviews nothing and just pushing and it was like a personal project and i think we're both experienced engineers so it would be everybody owns their stuff you push and if you break ci you go and fix it or if the other person is awake you fix it or something and it seems like when you move this fast and you ship this often you have you know throughout the day there's like 15 decisions you have to make where you have to flip between the duct tape personal project mode move fast and this is how they do it at google mode and it you know requires a certain expertise or it requires also to be free from like the thinking of the last 15 years of like always do it like google like we always scale up and the base assumption between like the whole google thing was always that oh we found product market fit now we have a product let's scale this up right every company i ever worked in was based on this assumption that this is the product, let's make it proper and engineer it up. But now with these changes, what's ingrained in AMP is the understanding that, well, even if it scales up, we have to be prepared that somebody pulls the rug and a new technology comes out and it kind of shifts everything. So we have to be prepared for this. And again, it all flows from this. So now in our development mode, the team is super small, you know, compared to, I guess, other companies, but I think we're around eight people now on the Amcore team. And we still don't do formal code reviews. We still push to main. We still ship 15 times every day. We dog food this as much as possible. And it turns out that in a fast moving environment like this, this beats a lot of other things like fast feedback loops and using the product yourself and dog fooding it. Using the product to build the product beats a lot of established processes, you know and we can get away with it because we can dog food it and how has it been internally received i think we have the luxury of you know making use of the infrastructure that we already have for example have a fantastic security team right security team comes in guys let us take care of the security stuff for amp you know so this is fine and i'm like cool like i don't have to worry about this then we have infrastructure people guys let us take care of how to run this in the cloud. Cool. I don't have to worry about this. I can concentrate on the client or the UX idea. So this is a nice spot to be in where we can move fast, but use platform teams to kind of make sure that it doesn't break or it scales up or whatever, but still have like the, you know, the tip of the iceberg can melt and be rebuilt basically while the thing beneath the waterline is stable, you know, not the greatest analogy, but I think it's, it's a, there's a distinction between, you know, like platform stuff that does work, but on the UX or product application layer, you want to be able to kind of tear the thing down and rebuild it as fast as possible. And I think that's what we're doing. One thing is you get a separate team. And then the other thing is, how do you put that team to work, right? Like if you look at like the coding agent space, I mean, obviously you started with Kodi and I think there was maybe a thesis behind it. And then you had the rise of clock code, now you had Codex CLI, which is trying to catch up. I would say they're maybe a little behind on the UX and all of that, but they obviously have billions of dollars to train a custom model. So that kind of weighs a lot of the option. How did you decide about the structure? So you have both a plugin for IDs. So I use M-Code and Cursor, but I can also go in the CLI and use M-Code. Was that an easy choice? Like, was there a lot of discussion on we should just do one of the modes? Like supporting both is obviously more work, right? And a lot of these products don't support both. So what was that initial design choice of the structure of the product? And then we'll dive into the models as well. So we started with the VS Code extension because it was the easiest thing to get off the ground. Like when you have a VS Code extension, you have a marketplace, you can ship this, you can update it 15 times every day. You don't have to think about updating stuff. You also are next to the editor. And looking back, you know, it's been six months, the editor might be dying or you might do a lot of coding outside the editor. back then it sounded much more radical than it does sound right now so we started with like let's explore this and having the thing next to your editor is a good place to start and you can see the cursor you can do selection whatnot but we were really like from the start we didn't want to have like a deeply integrated thing it was always like ah let's keep the feature small we got to be able to move fast and then we build up the cli on the side as like a different client which also gives us the ability to abstract like the core and the client stuff so that's a nice boundary to have but then to be 100% honest we were also surprised by how many people were fine with using a CLI for cloud code for example like if you had asked me half a year ago I would have said no way like a CLI tool. And what we realized is, well, a CLI is not just, you know, it's a UI, sure, but also it's a CLI program. That means you can run it on SSH. You can run it in any other editor. You can run it in multiple split panes. You can run it in multiple tabs. If you want to do this in VS Code, you have to rebuild a lot of stuff and you have to rebuild the way you switch between conversation. you have to rebuild i mean ssh works out of the box in vscode sure but still like you're tied to this and we had an experiment an internal one about a desktop application so like a standalone application and turns out yes that's great to have multiple agents but you also have to reinvent everything that right now terminal gives you for free right if i use ghost or iterm or vesterm or whatever i can command n command t i get tabs you know splits different environments per tab you can cd into directories you can set nfires you get this for free right and if you do it in a desktop application then you run into the issue of you know what people see with like a lot of the async agents oh you run and run the task set the nfires which directory you have to be in what's you know what's in the path whatnot you have to do this beforehand and in the terminal you get it for free so that's kind of the short version of it that we started with dscode because it was easy and it gave a lot of feedback we could concentrate on the stuff that matters and not worry about stuff like distribution which vscode takes care of and then with the emergence of clis we noticed that it's a big big improvement or there's other advantages to it so now then we rebuilt the cli twice and now we have like a really nice tui with our own framework and one interesting thing is our vscode extension has a lot of advantages over the cli for example it's easy to display diagrams. It's easy to display images. It's easy to render a bunch of stuff. Like we can do command return to submit messages, you know, all of that stuff. And turns out we have like an internal poll last week at our company meetup where Bayang was asking who of you uses the CLI and who of you uses VS Code and was a 50-50 split. And it's very strange that it comes out like this and there's not a clear winner and both have advantages and disadvantages. And so right now we have both. But do you cut that data based on the level of the engineer or maybe the specialty, you know, maybe front end versus back end? How do you segment that or do you just take it? I mean, we haven't really segmented it if I had to, you know, guesstimate here. There's also a generational divide where I would say the younger people, you know, younger than 25, their terminal seems old to them. And they were much more inclined to use the stuff in the editor. but yeah we don't have any fancy segmentation i think not to sound too dramatic but like one of the other guiding principles that we've had from the start with amp was whenever somebody is like what's the data on this or do we have like uh analytics on this it's like well did you look for it yourself like did you try it out did you talk to customers like we constantly talk to customers that beats a lot of other stuff so yeah we don't have any segment analysis of who uses what and where and how? I use both. And this idea that everything is changing, it applies to this. We looked at this, we saw the way that things were going and how more flexible a CLI was. And we, about three weeks ago, we said, we think probably it's painful, but we will kill the VS Code extension for AMP. And we said that, I laid that out and I didn't like it, but it seemed like that's how things were going. And then you think about async agents, which probably need to be on your phone and on the web, or maybe you use WhatsApp to interact with them. That's a whole other mode of interaction. Well, and if it's on the web, that's like the VS Code UI, not the terminal UI. And then there's this other thing that we're planning on doing that I can't share more about, but that also makes me think, well, actually we really need to keep the VS Code UI in. And so this thing that seems so obvious, actually there's two other completely different things out of left field that totally overturned it. so we're keeping it and it's definitely adding some more complexity but there's a lot of things we can do to reduce that and simplify it but there's always a hand hovering over the button to can we get rid of this like can we shed weight like can we get rid of the last can we reduce complexity so we're again in the spot of if a new model comes out we can react quickly and you know sure it's good engineering and there's not a lot of duplication but still updating one client is still faster than updating two clients. So there's this constant tension between what's the most minimal product that we can have. And, and, you know, just to pick some other examples, there's a lot of niceties you can do in VS Code where, for example, you have recent, not recent, but a common example, you know how in VS Code, you can hover over diagnostic and then you can say, you know, fix this or whatever. And then people would ask, can't you add like a let AMP fix this button and it's like you can also amp knows about your selection knows about the diagnostics it can see all of this so you can just ask like fix this for me and if you type three words it will usually do it so that's something where it's like well you can already do it it's a nicety but let's remove this surface area let's remove this other thing that we have to backport or keep working or whatnot and tiny example but there a you know 500 of these what we say but how do you think of that when the IDE is already AI IDE so I use cursor right yeah there already like fix in chat that pops up And they want obviously that button to go to their chat Yeah yeah yeah Versus like you guys are on the left side and it's like, just do this here. Do you feel that in a way the IDE VS Code extension is more like for the people not using this like AI first tools and using the features like most people, you know, I'm sure GitHub is like eventually going to have something good to put in VS Code. How much do you think about VS Code extension just being maybe a stepping stone to the thing you cannot talk about that you don't talk about? And then the bifurcation of the TUI versus the fully async. You're not looking at anything. I think we're not trying to maximize our revenue, our user adoption literally today with the state of today's models and today's tools because everything is changing so fast. So, yeah, we're not trying to fight cursor for who's going to win the right to have users fix with our AI or their AI. Frankly, it doesn't really matter to us. I don't think that that interaction is a really important way that people are going to be interacting with AI in six months or 12 months. I don't think we learn anything from that. And we just said we're not going to do it. And users, some have definitely asked for that. And the other thing is we have to figure out what do users actually want? and they say they want a lot of things. And in the case of customers, a lot of times they'll say they want a lot of things. They'll say that they want bring your own key. They'll say that they want model choice. They'll say that they want a subscription for $100 a month or pricing to lock users out if they spend more than $30 in a day. But actually what we've seen is they want the very best coding agent. Not everyone, not everyone, but we're focused on the ones that want the very best coding agent. And when we tell them how that thing will slow us down, then that starts this conversation where they'd rather not have something they might use 2% of the time if that means that the tool is worse. And we alone among the entire industry, it feels like we are being really honest and really bold with that. And I am really concerned just for the rate of progress overall, that a lot of these other tools that are great, like CloudCode and Codex and Cursor and so on, that they've forgotten what made them great and what made them grow so fast, which is building the very best product. And they built it in a way that's too overfit on the current capabilities. And so they're just going to peak and then it's going to be a slow fall. And zero of the software business model works if that happens. You need to have growth into the future. So I think it's best for our business, but also I think that we're trying to push the whole industry to just be radical about the changes that are coming. Yeah. When you said the best coding agent, I'm always like, is there a market for like the mid coding agent you know like there's a i think the model choice is a great example of like why would you want a model choice i think pricing i guess is like the only thing that people bring up but i think to your point it's like you already pay engineers a lot of money yeah like the cost of like sonnet 4 versus sonnet 3.5 it's kind of like minimal compared to like 150 200 300k once you do taxes and benefits and all that that you pay to employees so So yeah, I think we're like in this part of the market almost where like people are not maxing these things. There's absolutely a market today, literally today. Someone will pay a monthly fee for that cheaper AI product today, but they're not going to be paying that in six months. It's going to be a different product or they're going to be paying for something else. And if you have that much churn as a product, you simply cannot build software in that way. but a lot of people get tempted by that and they hear a lot of users ask for it six months ago it was still the game of oh a new model got released and then everybody would tweet out it's already available in their editor or whatever it is their extension right and i think that's kind of over like it's just people realize that well the benchmarks are one thing right oh this is the best model turns out it's not in this editor but it feels different than this editor so the whole like you know the models are the thing i don't want to say that's over but it's becoming less important and people are not now also waking up to the fact that it's not just the model it's the system prompt it's the tools it's the harness um the scaffolding around the model so i can give you the choice to use Gemini 2.5 in AMP, but without the system prompting you into it, without, you know, what I called before, like going with the grain of the model, the models are trained in different ways. So you want to optimize the tool and all around it for this specific model. Without that happening, it doesn't make a lot of sense. You get the wrong signal. I can drop you in a new mod right now and have it available in 10 minutes, but that's not what you're after, right? You want the best possible version of this model in this tool. And, you know, that's, I think, become more important, less like the model selectors and whatnot. Why do you mention the models at all? So you have Summit 4 for the agent, you have O3 for the Oracle. We don't. We don't show them in the product. We don't mention them all at all. We put it in the manual. We have like an owner's manual because people kept asking us. Well, but even then it's like, why does it matter that they ask? because you might not now it's like if you want to change it tomorrow then it's like you gotta tell people you change the model that's like where do you think we are on like the slope of like hey like you guys should forget at all about what model is even running what the difference is so i think we're going towards the future where the model will become an implementation detail to some sense and we will end up on a different abstraction layer and for example you ask like when would i use a mid model, right? When you put it like this, it sounds obvious like who wants to use the shitty version of the better version. But, you know, we're thinking actively about this. There's models who might not be as smart as Sonnet 4 as the main agentic driver, but it might be 10 times as fast. And that doesn't mean that you think, well, now I need to go fast. Let's use this. But I think there's different modes of working in your day-to-day work with this model in a different harness or in a different configuration can then be another way to do or get things done versus talking to a you know an agent in a back and forth so in that sense like we've seen this with like planning modes or people use different models but it's still like pretty clear that it's a different model whatnot but i do think it will be pushed more and more in the background that people will choose or have different ways to interact with models and the specific model or its version will not be as visible anymore. Yeah. And I know Cody was using star coder for inline edits, at least as well, but Yunga said publicly, so I'm not leaving anything. Does this still seem interesting to you to figure out, Hey, is there something in open source that we can use and maybe fine tune to like make better? Or is it, are you still like, we just want to be at the cutting edge and you know, that's maybe in the back burner. So first it took people eight or nine months to figure out what 3.5 Sonnet was capable of from when it was released last June. And this was around the time we were building AMP and Cloud Code came out. And you realized that, wow, like a tool calling agent is incredible. And at that moment, everyone, all the smartest people in the world also realized that billions of dollars of money went into training new models and harnesses based on that. And now it's September 2025, and we're reaping the benefits of all that investment. And you have so many more models coming out. You have the open source models like Quintry Coder and Kimmy K2, and they're moving so fast. You have XAI's models, you have GPT-5 that came out, and we're still figuring out how to use these things. But it would actually be an incredibly pessimistic outcome if all those smart people and all that money were not able to build anything that was better than Sonnet. So we, in our internal team right now, and this could change, we have about half of our internal team using a different model other than Sonnet as their main way of using AMP. And that's a huge change. In the past, we had done that only to test and begrudgingly, but now we're using it. And there's a different way of interacting with an agent that's not the linear chat transcript that actually means you don't feel like you're getting a cheaper mid-model. You feel like this is a different way of interacting where that speed is really beneficial and it's more constrained. So things are changing so fast. Is the GPT-5 codex only being available in codex make you nervous about future availability of like cutting edge models and like does they put more emphasis and like figuring out maybe like an open source strategy they make it available to api customers it's delayed and if they were doing that i really think that for the most part i take these model houses at their word and they wanted to get it out to their first party product as quickly as possible because they honestly need to gather more data and they're iterating in public so yeah i I would love it if all the model houses perfectly coordinated with us before they released anything. But I know that would slow them down. And I don't want to slow them down like that in the same way that we want our customers to give us grace and help us iterate in public. Yeah, I think there's an interesting just dynamic in the market. Like when Cursor switched from Sonnet to GPT-5, it's like the default model that was like, you know, 200 million of revenue for Entropic that kind of went away and like moved on to GPT-5. So there's kind of like, okay, we're all friends now, you know, but maybe later that's going to change. But yeah, it's an interesting. The other thing also is that, you know, if you're building an agent and you're not at one of the model houses, you can use multiple models from different providers, right? So which is what we do. Like we, when you use AMP, you're using a model from Anthropic, you're using a model from OpenAI and you're using a model from Google. and we're also very close to shipping like a fast open source model that we can use as a different sub-agent in there too and you know when you put it like this it seems silly to say we only use one model of like this family because they all have different strengths and weaknesses i think we are one or two months away from a possible news cycle that is the foundation model companies have spent billions of dollars in capex and hired like crazy and now you know they're no longer the best in this realm and there's a huge stampede away from them. That's very possible. And I'm not saying anything new. Just imagine last May when people were counting Anthropic out before Sunnit came out. Things change so fast here. Yeah. Yeah. Yeah. And I think OpenAI, obviously with Johnny Ive and some of that, it's moving more in a consumer fashion as well. So it's been interesting to see the big push on Codex. I would have imagined them to go more towards education, kind of like big, I know they have a lot of big enterprise contracts for like chat GPD for your enterprise kind of thing. So yeah, you guys, I think are in a good, in a good spot because you have both like the source craft, the trust, like you said, but also like AMP. I see a lot of great stuff on Twitter. You know, people are like, I just put all my AMP agents running. I came back. It's great. It's like, I think it's now on that wave of like, okay, this is like one of the best tools out there. Like if you're like a serious engineer, you should probably use AMP, at least in some capacity, and then make your own choice. How difficult is it to think about what goes in your harness versus like what people should build? So you have custom commands. You've done a great job on like the tooling where like people can put like executables as tools instead of having to define like an MCP server. It's like, yeah, how much of it you're like, hey, we're just giving you the tools versus how much you want to be opinionated. With things like, I mean, I think of like compacting conversation is like maybe one of the key commands that people have. And like in clock code, you can give a custom prompt to compact. Like what's that discussion like? Yeah. The main assumption, again, everything is changing. We've got to be able to move fast. That means what you want is, I don't use the picture of a harness often. What I use is like a scaffolding. Like you want to build a scaffolding around the model, a wooden scaffolding that if the model gets better or you have to switch it out, the scaffolding falls away. You know, like the bitter lesson, like embrace that a lot of stuff might fall into the model as soon as the model gets better. Right. Because then it can remember more, whatever. Why invest three months in like a separate apply model when the next generation, you know, 0.7 version or 0.8 or whatever version of this model can now do all of the edits on its own. So that's, again, the bigger thing. And with that in mind, we really try to restrict a lot of the features that we add around them all. And you can do a lot of stuff like we could be busy all day adding stuff in our clients and whatnot, making the product more complicated, but we don't want to. So that's the first thing. The other thing is we're living in strange times. We're living in strange times from a product development perspective where basically I think the old triangle of design, product and engineering, it's kind of changing. It's not a triangle anymore. I don't know what shape it is, whatever, but it's not a triangle anymore. And the reason for this is because you can't build a roadmap. You can't say this is what we're going to build in the next six months. People don't know yet how these models can be used to the full extent. Everybody's figuring this out on the go. That's another thing. The other third thing there is, we just talked about as well having coffee before coming here, is that the only UI basically is like a text UI. And you can use this in the wrong way. And the example I used earlier was if, you know, you buy Gyra, for example, but you use it for your shopping list. Atlassian is happy about this, but that's not what they built the product for, right? But you can use it in the wrong way and still get results. The problem with LLMs and a lot of the models is that you can use it in the wrong way and it looks like you're getting results. You know, like you can use OpenAI, ChatGPT to look up serial numbers or, you know, technical specifications for a camera or something. And it will tell you this, you know, but it might be wrong or 99% of the time or 98% or 95% of the time it might work, but in 5% it might not work. So having non-deterministic LLMs as the heart of your product is something unprecedented that we have in software, I think. so with that in mind a lot of the features what we see you know where people build like elaborate workflows like i have my custom slash commands and they trigger custom sub-agents and they in turn trigger custom mcp tool calls on behind which again another model is doing inference again and taking the input and blah blah blah i think a lot of this will and has resulted in hangovers where people realize oh like this looks like it's a deterministic workflow it looks like it does the thing that i wanted to do but actually i can't use it if it only does it in 98 of the time so that's something we're really conscious of where i think everybody's experimenting everybody's sharing their experiences you know the thread boy tweets about what to prompt where and how but you have to be super strict about not giving users a false sense of what the product can do and how reliable it is because i think it's dishonest in some way and it doesn't lead to good results and you know just as an example i think of the last three months i would say we are we're ahead of the curve like using amp eternally like we're ahead of like the mainstream agentic adoption but like say a month or two where we've tried a lot of this stuff and then realized oh this wasn't the best use of our time or the tokens and now you see a lot of other people waking up to this famous on Twitter, Armin Ronaker, the Python developer from Austria. He's done a lot of good stuff with Cloud Code and shared a lot of his learnings. And you could see that the way he tweeted was super excited, like a lot of things. I can now do this and this and this. And then a month later, it's like, oh, maybe, you know, having eight remote control agents that I control with my phone and let them run for 20 hours. Maybe that's not as productive as I thought it would be. And yeah, it's something that we're super conscious about. What are those things? What are like the failure modes that you heard from customers where it's like, hey, we tried AMP and it just didn't work at doing X, Y, Z. Is there a collection of those that you guys use as almost like a North Star as you keep building? I think like one of the things is the whole vibe coding stuff where people just use it and, you know, they're like, hey, I spent 10 bucks in tokens and it didn't build me the fall app or something. the failure mode of outsourcing the thinking but not the typing which i think it should be the opposite you still have to know engineering you still have to know how to program you still have to know your application and its architecture how it's deployed and then basically use the agent to do the work that you would have done but you have to know what the desired outcome is and whatnot like that's a common one where people just you know hands off the wheel agent you go and write this for me and then turns out a couple hours later oh actually this nobody understands it's spaghetti code amp it's different from the products it competes against so we've had one head-to-head loss with amp where we lost against the you know usual players and the reason why is one of them discounted their other product 100 for two years the other one discounted it 85 for two years, which is just crazy. And we wouldn't want to do that because are we really going to learn from that? And then how is it going to be used to be used in a different way? So usually the way that we might lose is there's some other product that would go to 80% of the devs in a company that is like the base layer. Sometimes that's copilot or cursor and AMP is more expensive. It's more powerful. And they'll give it to that 20% of devs that they trust more. And in a previous world, any software company would say, oh no, we need to get 100%. We don't want our competitor getting in there. But actually, that means that we're able to even more focus on being bold and crazy because all those devs can always fall back to a cursor or a copilot. So we actually really like that kind of deal. The other thing there, I think a bunch of questions already touched on this is that talking about segmentation or market or the ideal user, again, everything is changing. so what we try to do is we try to you know build a tool for people who are at the frontier or at least curious about it and want to figure out how to use these agents in the best possible way and that's based on the assumption that if you build for the mainstream user who not you know mainstream sounds like i don't know it sounds bad but what i mean is what i mean is if you build a product for somebody who does not know what a good prompt looks like you will fall behind right now because you will spend time and resources building stuff like the prompt enhancer and like blah blah blah blah blah but then you will end up building this and you miss the next step change that might happen so the way we think about it is we build for the people who already get that a lot of stuff is changing but we want to leave the door open if you're open to learning new things and you want to learn how to use AI and agents in your workflow, please come with us. We're happy to have you. But if you're skeptical and you think prompt engineering, that's a bullshit term. I don't care about this. We're not right now building a product for you because we would fall behind. Yeah. So prompt enhancer, that's a bullshit feature that doesn't actually work. The theory behind it is nuts because what helps LLMs is not tricks and phrasing your prompt in a certain way. It's fundamentally information that you have in your head that you can bring into the prompt And if you don have that in a prompt enhancer LM cannot magically conjure that up It cannot narrow the search space for you Custom subagents The way that we disqualify that as something we wanted to build at this point is because you look at all of the tokens that you're sending to the model and it's so many more. It's so much more convoluted. We don't think that these models are trained in a way that would support this use case and the output of this going in here. it's so much harder to debug. And MCP is another thing. MCP has done a great job in getting products to expose the verbs that agents might want to interact with. Although in most cases, they do not actually get the right verbs exposed. But as a user facing technology, it is such a common failure mode where a user will go and add in some MCP servers. Auth is a huge pain, but let's say they get over that hurdle. Then they have, I don't know, 50 tools exposed that often are too low level granularity. And it takes a ton of tokens in the model. It makes everything slower and more expensive. They're often misused and it's just not a good experience. So, you know, there's all of these things that we've said no to and other tools are bringing them in and they're saying yes to all these things. I think it feels like they're making progress in the meantime and people retweet and people talk about how they're able to do these amazing things. But just the simplest example that seems so obvious, and frankly, it confounds me that more people don't do this, you make it so that my Google Docs and Notion and Linear and GitHub issues are all accessible to my agent. The vast, vast majority of developers who use AMP or Cloud Code or anything else, they don't have all those context sources set up. That seems like such a slam dunk. So we built that, we ripped it out. Before we would move forward with that, We'd have to get an answer, even for our own usage. Why are we not doing that? And it's frankly still puzzling to us, but we're not going to touch that until we get confident about that. And to come back to the example, you mentioned Compact. We have this in the product, but again, the hand is hovering over the rip it out button because I think Compact is such a alluring thing where people think, oh, you know, I ran out of context. I hit that button. Now I'm back to the start. But you lose signal, you lose data. And it's something where are the miles really good enough? Is compacting good enough to really glance over this that the user doesn't have to worry about it? Or is it something where you would have to somehow make it clear to the user that, hey, look, your conversation has 50 messages back and forth. If you hit compact, this is all going to become blurry. You know, you're going to compress it and you lose signal, you use fidelity. And then you put it in a new context window. Are you sure this is the right tradeoff? And some users are, but again, like it's strange times because now we have like this thing at the heart of our software, this, this, you know, all from outer space that can do sometimes whatever it wants. And it's, it's strange to build on top of this. And it's strange to educate your users about this, that this is the thing, right? Like imagine, you know, the end of the nineties PC era, you had to build Microsoft Word. And then you say like, well, at the heart of this new personal computer, the Pentium 3, whatever, There's a weird op from outer space. And sometimes if you bold text in Word, it actually makes it italic, you know? But that's the situation we're in. Like, that's the fact. Like, it doesn't always bold the text. I mean, it underlines it if you reach 150 tokens or 150,000 tokens or something. How do you teach this to the user? Yeah, and, you know, we're in the church of context engineering at the Chrome office. And when we had Jeff on the podcast, they talked about the context draw out paper that they did. And they mentioned specifically encoding, for example, showing previous failures was like not helpful at all to the agent and so i think when you're compacting conversation there's almost like you know if you have a long conversation it usually means something went wrong along the way and you had to like go back and forth and like a bunch of things that didn't work and you're keeping those in but i've been trying to figure out what's like what's that gonna look like in my mind it's almost like if you take the idea of linear which i use I give to my agents just to get because then I have a canonic prompt for one issue because often you have to restart because it's like it just goes too much wrong down the wrong path. A lot of people don't restart. A lot of people just try to keep going. Yes, that's bad. But how in that in that case, what can you take from that conversation as a learning and put it back in the upstream issue so that then the issue is like either more descriptive or as like more information that It's not compacting, but it's almost like how you would do as an engineer. It's like you're doing it in your mind, right? You get an issue and then you start working and then you kind of update your mental model. It doesn't really work for agents, but people are not doing this like small increment in the initial issue. I would say in this case, it's still you cannot outsource your thinking, right? Like in this case, I don't think you can expect right now a model to say out of this conversation, this is the most important thing. Let me put this back in the linear thing. I maybe, you know, if you phrase it like this and automate it like this, it's always a perfect conversation. Maybe it works. But I think in this case, you still have to be mindful of the context. And what we encourage users to do, for example, in AMP is to start a lot of small threads and be really, you know, do context engineering and be really strict about what goes into context and what doesn't. And the other thing that I think touches this on is, you know, where a lot of CLI tools, for example, have super verbose output and Bazel, sorry to call this out. I'm not a big Bazel fan, but you could just call Bazel out. Super verbose output. So then the natural assumption is, oh, let's hide this from the user. You know, like let's abstract this away and summarize the output or whatever, or whatever, just the exit code or something. and then you get into this dangerous territory where what you see is what you get is not true anymore and in the context what you see is like some other thing in the context and that can lead to issues. But for me the meta thing here too is everything is changing. That means we're seeing this CLI tools right now are also adopting to being used by agents. So they're changing the output too. So if you focus on the fact that Bazel will always be verbose and build something for this issue you might be outdated in half a year where somebody is like no no no we have a basal agent wrapper and now this is not an issue anymore yeah one model that i have is if you are relatively on the cutting edge of using agents and there's some persistent problem like this it feels kind of out of band like how the model itself will update its memory or will update the linear issue the model needs to be trained in order to do that better if it's something like your own coding conventions, that's different. But if it's something fundamental that feels like about out of band from the agent, the model needs to be trained to deal with memory better or to accept the fact that it might have an incorrect view of its own history if you go back and edit it. And we're feeling these pains right now because people have only been using Agenda coding tools for a matter of months. Most people have been using them for like less than three months. And if we're only feeling them now, it takes a little bit of time for a team at a model house to go and do a fine tune of one of their really big models, or they've got other big models, the new revisions that are being trained, and they can only fit a certain number of experiments like this in. They're probably going to get half of their approaches wrong. So you can only do so much. And that's Thurston's idea of going with the grain of the model. And I mean, you've seen this, I'm sure, where a lot of users are going through this lesson where they, let me just add this MCP server that does everything I wanted to do. And then two days later, it doesn't use it like it never calls the tools and it's like yeah it wasn't trained to do this and you can sense you know like they have different philosophies in the model houses i think anthropic is from what i can tell working a lot or training a lot towards using memory like storing information whatnot jet tpt obviously has this open ai so if you give it a memory thing yeah it might use this but then you have the issue of well if i give it this other custom-made mcp that we build internally and our processes don't map to anything that OpenAI and Anthropic have seen or trained for, it won't be used and you won't get good results. And it's super strange, right? Yeah. I wrote this article for the GPT-5 release about models self-improving for coding. So I basically asked GPT-5, what are tools that would be useful to you to be a better software engineer? It's like, well, you know, give a list of like 10 tools. And I'm like, okay, implement sent them, wrote all the tools. And then I asked it to do the same task I'd done before, but with those tools. And then it goes through the whole task. And I'm like, which of the tools did you use? And it's like, oh, I didn't use any of them. And I'm like, what did you not? It's like, you know, to be honest, I don't really need the tools. I can just do this task, you know? And I think that's like a good metaphor just for like the trend of the models, which is like, hey, they're going to use less and less of this like custom made tools to fix today's issue. I think the things that we can bet on, and I'm curious to hear your thoughts, is they're always going to have some sort of test runtime. I don't think there's going to be a world in which the model is not going to run tests and say, I'm sure this is going to work. The other one is there's always going to be some sort of infrastructure as code to then handle the deployment side. So I think whenever there's going to be some runtime issue, they're going to need to understand where they're running. So I think you can put them in a box, having an actual Docker file and whatnot, it's helpful for them to explain what they have access to. What do you think are other things that you don't expect the model to have in the model that you want to still expose to it? So we can assume it's going to test. We can assume it's going to have some definition of its environment. Are there other things that come to mind? I think test is a big one, and there's many different kinds of tests. So we had subagents in AMP, you know, among the first that come out with this conception of subagents, which is a separate context window, more curated set of tools. And I think there's a lot of potential to take a tool like test. And right now you invoke it by the bash tool and you have some complex invocation. Too often it'll run all of your tests, which is noisy and it takes a long time. If you're in your editor and you've got something nice set up, you can hit like a hotkey and then it'll only run the tests that you need, you know, at your cursor. So giving the LLM a tool like that seems to have a lot of potential. And then that could even potentially be a smaller model, a fine-tuned model for that task. It could be multiple based on what projects or stack you're using. And that could eliminate a lot of the confusion. Even with a good agents.md guidance about how to run tests, I still see with AMP, and I think we've tried to make this really good, it only gets it right maybe 90, 95% of the time. Sometimes it'll run the wrong testing or it won't escape it correctly. And I think we can eliminate that with a subagent. So there's so much more potential to go deep in areas like that. And then for every language, it's a little bit different. So handle all those cases. Do you feel like that will just be built by each company on their own? Or do you think there's like a same default that you guys are going to build for that, that is going to be effective for most code bases and test structures? This is where scale helps. And we have a lot of scale. So, you know, increasingly we're able to see in this framework for this standard Go unit test package, that's easy. V test in JavaScript, that's easy. And once you start getting the more of the long tail, then, you know, it might have to just fall back to a really good model. But I think that we could probably make something that's optimized for some of these more popular unit testing frameworks. And it's a combination of deterministic stuff and non-deterministic stuff. Because right now in my VS Code, I can hit Apple T if I'm positioned in a test file inside of one of those test blocks, and it's only going to run that one. So, you know, even that is a benefit. And now I'm mostly bottlenecked by Playwright. Yeah. It just takes a long time, man. But the crazy thing is the vast majority of devs who are building web applications with coding agents do not have Playwright. And if they have it, it is set up in such a shitty way where it cannot really log into their app. They don't have any pattern for that. So even something like that, that's another example of a subagent that's go and try this basic end-to-end testing flow described in natural language with the running application. And wouldn't it be great if it could also do it in parallel? So, you know, there's all these ways that you can improve. That's a great example. And I think touching on this, we have coding agents. They are productive. They add value. We cannot assume that everything around the agent in dev tooling or code bases will stay static. So I think people are already adopting their code base to be better used by agents or they're adopting their tooling to be better used by agents. They're more descriptive help text or whatever it is. So I think, I don't know, we should have a counter, but everything is changing. I don't know, I'm saying this again, but we cannot build right now with, oh, this is the tool that's going to stick around, given that all of the code bases and all of the processes and all of the dev tools will stay the same. We have to assume that this stuff will change too. And we have to stay nimble. So we have to make like short bets or small bets and try and get us, you know, in small steps forward, but always be reactive to this stuff that, you know, if people, again, let's not use Bazel again, but I think Playwright is a good thing where the feedback loop is incredibly important to working with these agents, like that the agent can see whether what it's doing is actually working. so what we've seen people now do is well instead of having the client log and having the browser log and having the database log let's have one unified log because then it's easier for the agent to just look at this log and make sense of it and then it turns out it doesn't have to be nicely formatted it can be verbose you can just have like json line outputs and whatnot because the agent can understand it much better than a human can and i think that's just this a little preview of more things that we will see where you're like, wait a second, this is not made for human consumption anymore. How can we optimize this for agentic consumption? And then maybe the game changes. And there's some things that now we get, for example, in my vitest suite, I have a knock to record HTTP calls. So whenever, especially for inference, like you can't really knock, we do a classification, things like that. You just need to see what happens. And then we just save the whole interaction. and then the model can actually see what the API returned in much detail. And it can reference it back in the future. So when you add a new feature, it can look at the test and it can see what the API usually returns. And it's like, oh, okay, it's going to have that key and the content and things like that. I think there's more of that to be done. I think there was maybe also a time in which having console logs was really bad. And I think there's maybe not going to be a console log that is like only funneling to like not the actual console in the browser, but like some way for like the agent to see all of the details of like everything that is happening. What I haven't figured out is like, how do you instrument that? Because you cannot put a whole bunch of console logs that go somewhere else in the code because then you're also polluting the context window of the model, right? So you need some other way to do it. But I think, yeah, the more you log in, the more the model can kind of like self-iterate. And you just described like five approaches that seem absolutely worthwhile to go explore to improve how coding agents work. Somebody do it. We can do some of it at Kernel Labs, but we cannot do all of it. So somebody help. Again, like the world around us is also changing. Jose Valim, the creator of Elixir and, you know, contributor, co-contributor Rails. Can't remember the name, but basically they have a new framework tooling out that is... Phoenix. Yeah, it's for Phoenix, right? but it's the the name of the i can't remember but it's it's about well what if you build a framework for an agent to what if the agent is integrated into the framework so that you can if the application fails to run you can ask the agent that has access to all of the context and that's going to be more and more i think like a lot of you know developers will build stuff because they're fed up with like copy and pasting stuff around so we're going to see this in developer tools well i mean rails was like one of the first frameworks that i know that in the error page they had a cli that you could like use the local context you know and i think like more of that like in next you have the copy to markdown yeah whenever you have an exception you can copy the markdown put it in there that's the first sign yeah yeah but i think there should and in their docs too you can like copy the markdown but then it's like you can only copy the markdown the whole page and it's like well you know maybe i only want to do this section or like i want to do one two three i don't know i think that's why the mentally fires of the world staying less all these companies that do kind of like api docs and api generation from docs are like getting a lot of interest i think you'll get more of that but it's hard to get people to move over you know i'm sure you see it with like some of the source graph customers it's like how am i supposed to re-instrument this whole code base that is like 15 years old and like It's true. But what we have said is we explicitly are building AMP for the people that do want to move. And that's been so liberating. And I think that that's the great thing about what you see in the market today, which is like you have all these companies that are like so AI first and like just use it and do great. And then you go on Hacker News and it's like, I've never got a single good result from AI. And I'm like, well, obviously that's not true. And like maybe the extreme is definitely true, though. I think to me, that's kind of like the thing is like the people that are spending $100,000 a year on AMP with two people, obviously they're getting value. It's not like they love burning money. Yeah. But the people that are negative, to me, that's not always true because it's easy to be negative and like it doesn't cost anything, right? To put a comment that is bad. And so what's going to be the thing that forces the rest of the market to be whatever, man, let's just get on AMP and like make that work. They just have to see this work once or twice. You know, we've been in developer tooling for a long time with Sourcecraft. And it's always been hard for the last, say, 10 years to get a company to adopt a developer tool that does not immediately fit into their code base. Because the code base, that's the standard. Everything else has to adapt to our code base and our processes and whatnot. What we're seeing now with agents is as soon as somebody has seen what it can do, they have such a multiplying effect or they bring so much value that people are willing to adopt the code base for this. Like the first time in how many decades where people are like, maybe our code base is wrong. Like maybe we should change the way we develop code to make more use of this. So I think people have to see this and then the agents will pull them along or like the, you know, the value that this brings will pull it along. Yeah, I'm curious. So I was on the board of a company called Launchable, which was founded by Kozuke and Kawagoju, Bill Jenkins. And the idea behind Launchable was like, well, instead of running all of your tests, we'll use machine learning to figure out what tests are impacted by your PR and just run the small subset of them. and i think like what we found then the company got bought by copies but it was like in a lot of companies who go in there and they're like oh well how can we trust it though let's do a poc and then you do the poc and it's like it works great for the subset well you know work for the subset but like is it gonna work for like the whole test week and then you do a whole process and i think with coding it's like for some companies it's like they see a work on one task and they're like it's worth trying on every task and then there's another subset of companies that are like, well, you know, it works a little bit on the front end, but it doesn't work on like my Java service back there. So I'm not going to use it at all. I haven't quite figured out what's going to be the market pressure to make those people move along, you know, but it's like you said, it's like for some people it needs to work once, maybe for some other people, it's got to be one task that always feel my one task that I always use. We have built this kernel jam product, which is like an MCP playground and tester And I have a task which is like add yellow mode which is let a user toggle between auto running which sounds easy but it actually quite hard without llm work to like stop inference to approve a tool and then run it again and every model was failing until gpd5 codex and codex cli was like the first time i got it in like one shot it made the whole thing and i wonder if everybody should build some sort of like four or five tasks that are like okay if you can actually do this end to end, then I'm like, I'm in. But I feel like people are still in denial of like, that's going to work. They don't want to have the conversation at all. If you look at that early adopter, the laggards, that chart of technology adoption, there's a reason why the early adopters are the tiny little start of the curve, you know, 3%. And it feels like so many of these arguments are people saying, well, what if we made a product that was for the early adopters, but somehow made the laggards also adopted early. Why aren't we going after that big market? It's the vast majority of the area under the curve. And it's like, because they fundamentally do not want what you are building. And maybe they should, maybe they're going to realize that, but you're not going to make them realize it. Or if you waste your time trying to make them realize it, you're going to be trounced by hopefully people like us that are only focused on the early adopters. It's a total mindset shift. And if you are just focused on building something for early adopters and you literally do not care and you set up your entire business and product to not care, not have to care about the people that are laggards, you can do a much better job. And that's what we're experiencing now. Let's talk about the outer loop, because I think that's kind of like the next step, at least for me. It's like, I think the coding agents themselves do great on a task by task basis, but then there's like, you know, PR review, which GitHub is like so slow and so clunky and it's so like order by file versus like, I think we should get to a world, which is like more semantic It's like, hey, you know, these are really like the 50 lines of code that matter to look at and everything else is like, yeah, it's fine. You can like skim through it. How do you think about that when you want to, especially when you think about async agents, you know, there should be an easy way to spin them up, which I think is fairly clear. But then I'm not sure if there's yet an easy way to catch up on what they're doing. You know what I found when I use conductor, like VibeCamp, and it's like I spin out five, six of them and I'm working on them and I kind of jump between them. And then my wife is like, let's have dinner. And then we have dinner and I go back and I'm like, what the fuck is going on here again? It's like, which one is doing what? And it's like, it's hard to like just at a high level, see what each of them is working on, where it's getting blocked. Have you guys seen anything that works there? Have you been thinking about building any tools in that space? I agree. Right. I feel this too. I think, you know, with our internal experiments, I think, you know, for example, this idea of, well, I just spawn 10 agents and they work and I control them. I think Stevie is doing this and he has like a whole workflow around this and it seems to work for him. But for me, I guess I'm a one tasker in my mind. Like I need to, I can't do this. Like I cannot control five agents at the same time. And then when I do it asynchronously, I realize that I need to be really strict about how I review what they've done and that I also don't jump between them. And then it's also, you know, making sure that you don't miss anything. Like I spun up so many agents and then haven't checked back on them because I forgot that they actually run. So that's something you need to build in the product. But yeah, I don't think it's figured out, you know, like it's a there's so much to do. So, yeah, it's wide open. We think of it right now, like if you're playing chess, you can play one board at a time or the people in New York City Central Park who play against 10 different tables at once and they go and they sit down in front of the table, they get oriented, they make a move and then they go. And that's what we're trying to build. And it turns out, even if you've got a coding agent running in your editor in the CLI and then it makes a big diff, you've still got to understand it. And it just becomes even more important when you have a lot running in the background. So we want to make it easier to orient yourself with what's the change. And there's a lot of stuff that is not in the realm of coding agents that would help, like having a deploy preview consistently available. So you could just click and click through it. And then we want to make it fast for you to make a move and then, you know, get on with your next thing. Yeah. Or, you know, just UI. So at a glance, you can see, I don't know what it is yet, but at the first glance, so you can see what the agent actually did without having to go and read through the emoji summary. Finally, we have it and blah, blah, blah, stuff like this. But to come back to your question of like the outer loop, I think, and, you know, if Bian was here, he would talk for a long time about this because he's passionate about it, that the inner loop has changed a lot in that, you know, ride, test, review and whatnot. It's that you now review a lot more code. And what effect does this have? For me, for example, we don't do any formal code reviews on the AMP team, but it doesn't mean that code isn't reviewed because we use, you know, AMP to write 80 to 90% of our code base. But that means everybody should review the code that the agent wrote. So it's reviewed by at least one person, right? And that's not reflected at all in GitHub yet. Like GitHub is still based on this other mode where you tag somebody, But then it's like, well, I actually went through two agents to produce this code and I reviewed it three times. Do I now tag five other people? And right now we're stuck in this mode where people would say yes, but I don't think it's going to hold that much longer. Yeah. The other thing I noticed is like merge conflicts. Like I used to have very little because it's like, you know, I know what I'm working on and if I'm doing multiple tasks, I know how this is going to impact that. And I'm going to build towards it versus the agents, especially when you run them in parallel. It's like they just start to change whatever is convenient to them. And then it's like across them, they're like changing the same thing. And so one thing we've been thinking about building is like, you know, how do you do better cross-agent orchestration of like these changes? So I built for like the GPT-5 post, it's like task manager. There's like CLI first and basically any agent can like append what files they're touching. And then they can read what files other agents are touching and see what those diffs are to like implement them back. but then i think the question is like well maybe what they're doing now doesn't end up being the final thing and now you're wasting all these tokens like reviewing all these changes before review i think at this point it's like it's get well designed for this future world that we're going into you know there's like i think everything is back on the table i think maybe you know five years ago it was like you know there was like a couple yc companies doing oh we're like a new version control system and i'm like look man i'm not i'm not really interested in listening at this the agent same with programming languages it's like you know when um chris latner even started working on mojo it's like okay because of ai i understand why you need to build a superset of python and i think now with agents it's like maybe clear why typescript should win because type checking is like very good for the model to like do self-improvement what are like the other things i think the interesting flex here is people assume that coding agents meet the bar of writing the exact same kinds of software to the exact same standard. And that is not necessarily an assumption that end users, consumers will apply. If they have software that's much faster, cheaper, much more personalized, if they can conjure it up on their own, then yeah, you're going to tolerate if the loading state of this thing doesn't quite work correctly. So changing user demands and standards is an interesting thing that you can flex here. Yeah. What do you think about that? You know, we've been thinking about enterprise software moving more towards user generated content, which is like, hey, you know, expenses are like a great example of like, you know, all these expense tools where there's so many companies when like the core action that you're doing is like take one line of expense and tag it with different things. But then you have to like set up all these categories and whatnot versus like just generate it for my company and for like each team separately because of different things. And it's like, to me, that feels like more and more of that will become true. And then the real value is like, you know, what's kind of like the underlying data store or like data stores that you're feeding into this. And I know some enterprises are building already built kind of like internal like lovables, basically, where each employee can kind of like create a simple tool. And then they connect the tool to like internal data stores. And they might be the only users of it. There's nobody else that does it. And I'm curious how you guys think about. I know that Bold.new, for example, now has clock code integration. Like, where do you see the line move between like software engineers build software and like, obviously, AMP is like a great tool for that versus going more upstream, which is like any non-technical people can also plug into the code and like build things on top of it. That feels in a way very different, but also very similar in like the challenges that you need to solve for. I think this idea of non-technical is the wrong way to look at it. There are always going to be people that are good at unambiguously specifying what they want out of a computer. And we've had non-coders, including one of our board members, who built something with AMP that replaced like 250K a year piece of software that he used for a lot of their internal fund tracking. He maybe took one computer science class. He hasn't really coded, but he's a really smart guy, and he knows how to unambiguously specify what he wants to his CEOs, certainly, and now to a computer as well. So if you can get people like that, a tool that's really powerful, they don't think of themselves as a non-technical person. I think that's just such a bad mindset. So we want to build for the power user, and if that person has not been a coder, but they can pick it up really quickly, that's great. Again, we're completely focused on the people that know how to and want to get the very best out of this and that want the agent to win, that aren't trying to be like, oh, you know, nada, hey, it didn't do this thing. Tell me when it does. Yeah, we had this at the start a lot where people, whenever you have like an AI tool, I think there's a natural tendency by engineers to get it in a gotcha moment. You know, like, oh, I asked this and it didn't know this. And it's like, are you trying to get something out of it or are you trying to get it to fail? And, you know, it's not it's not worthwhile to build for somebody who doesn't want to fail. Yeah. Actually, if you fast forward how the world is going, you're seeing already over the last few years, companies have really slowed down their growth in engineering headcount. This is a global phenomenon. You're seeing engineers like here on the AMP team and other companies that are using agents really heavily. They're cutting out the middlemen. They're putting the people who are building the product closer to the customer because you can go and hear an idea from a customer. Literally in the meeting, you can kick off an agent to go and build it, and then you have a first draft of it. So overall, the person who's using the coding agent is getting so much closer to the problem. They're also going to share more in the rewards from solving the problem because without needing to share the profits with everyone else, there's naturally more to go to them. So I think if you fast forward this, it's not that the firm or big companies are going to completely go away, but you're going to have people that have an incredible vision in their head and that are so close to the problem and have an incredible incentive to go solve that problem. Equip them with a coding agent. If you're going to build a coding agent that those people want, that is way better and more valuable. And you're creating more value. You're allowing more new things to be created in the world than if you were building a coding agent that is for the median developer that makes them 30% better. So that's who we're targeting. And I don't think that that will necessarily look like vibe coding. Vibe coding is this really unproductive thing to discuss because everyone has a different definition of it. And too often it's having the agent write code with poor feedback loops and poor quality control. And I don't think that that's valuable, but it's giving that person the ability to build something truly great, really fast when they're so incentivized and they will have every desire for it to work well. And I know we're getting close to time, but a couple of things I want to touch on. So Thurston, I was reading through your blog. You left Sourcegraph a year and a half ago, then you joined back. Good job, Quinn. Bring him in home. Thank you, Thurston. But when you wrote a post about leaving, one thing you wrote is that when you first joined in 2019, one thing that Quinn told you is like, hey, Sourcegraph is your playground and you have skills and talents. And I want you to use those skills to move the company forward. How do you take this idea of like the power user getting close to the customer and like how people are going to build teams overall? Like there used to be engineering and product, like you were saying, the triangle that's kind of going away. What are like the type of people that you think are going to be most successful? Like how should people think about structuring teams? It's like obviously you're doing this with AMP in a way, right? You're like building a sub team and sub product within a larger company. Any tips that you have for other founders and executives? Thurston is incredible and AMP would not exist in any way without him. He has strong internal constitution of how he uses it and what's real and what's not. And it's so easy to get carried away with the hype, the possibilities, especially when you see other people, a lot of other smart people who are getting carried away by it. Thurston has this incredible ability to stay grounded. And that, with everything changing so fast, with it being such a hype cycle right now, that's really important. Also, just these first principles thinking like how we've completely rethought how we build everything in AMP based on how should we actually do it rather than what has come before. Thorson is the rare person who's been at bigger companies who's seen how Sourcegraph, how we build enterprise software and, you know, not the Google way, but in a different way and has taken the parts of it that work and not the parts that don't. So all of that combined with someone who's an incredible engineer, incredible writer, communicator, that's a really powerful combination. So find those people. And then what I said when he rejoined is he is the dictator that made him feel really uncomfortable, as you can see. I hope you cut to his face. But that's exactly what you have to do. And had just put so much trust in people like that. And that also shows everyone else at the company, that they can do crazy stuff, that they can go way beyond, they can take it to the extreme, they can make mistakes. And that's still okay. Because we're not trying to build something that's going to, you know, go really big in the current state. AMP is growing incredibly fast. But the most important thing is we're building the coding agent God, that thing in the future. And that's something that we're all in search of. So none of the mistakes, none of the successes in the month to month timeframe really matter. It's all about getting ourselves in the right trajectory and you got to do crazy stuff. So equipping Thorsten to do crazy stuff and to take the ideas that he has and make them scale up with all the reach that Sourcegraph has. That's been my goal. On the first principle thinking, how do you think about that? And so there's the word of evals and there's the world of vibes, right? Yeah. How do you approach it? Like, how do you look at the product and you're like, okay this is good this is bad this is what we need to improve is there something formal that you guys use internally or is it mostly you as the dictator uh directly two-part answer i think the first part is to also answer the other question a little bit is what i've seen become more important or the shift i've seen is that you know i set the triangle um of you know pm designer engineer i think as an engineer or any of the three you now need to know a lot more about the other parts like as an engineer you cannot see yourself as the person anymore who types out a spec or turns a product prd into code i think you need to be aware of business you need to be aware of a product you need to know and have some taste for software otherwise i think the value of your work will diminish over time because the pure typing out of code for most of the code you know exceptions being a john carmack and you know whatever for most of the code i think the value will diminish and we've already seen it's like compare github contribution chart today it's value to save two years ago right and to come back to the the second part like vibes and you know whatnot I think we don't have any set evals. We don't. And this was controversial up until a week ago, I think, when I think Boris from, or two weeks ago from Anthropics said they don't have evals for the coding agent too. But we don't and we haven't had them. I've built evals before. I fine-tuned models before. I know that they're good. I love evals. I was addicted to LLM as a judge. I wrote about LLM as a judge. But for a coding agent who's supposed to work in many different code bases, who's supposed to work with many different types of prompt, who's supposed to work with many different types of tasks, it's a time investment that we cannot afford with everything changing and having to stay fast. And if you ship 20 times a day, you will get a lot of good feedback. I swear to you, I could tune my system prompt a little bit now. And then I would say by this evening, people on our team would go, why does it call this tool so often? Like, what's going on? What did we ship? And that's incredibly valuable feedback. And that's incredibly valuable, you know, when people dog food the product and use it all day. And how do I make these calls? I don't know. Like I think it's experience of like, I think about software a lot. I love using software. I listen to a lot of business podcasts. I read a lot about business. I listen to a lot of software podcasts. I read a lot about software. And then I try to project like, what does the business need? How can we get growth to 10x? How can we get our users to 10x? How can I use my engineering capabilities to serve as a function of the business to reach those goals? How can I organize the team or get the team to help me reach those goals or together reach those goals? And, you know, it's hard to explain, but it's like, I feel like in this year, truly here at Softgraph, like everything I learned over the last, say, 15 years of my career is coming together in the sense that all of the hours spent listening to the Acquired podcast to help me as much as, you know, reading Hacker News for how many hundred hours and writing code for how many thousand hours, you know. And with code being now this tool that you can wield much easy or much fast or much more often, I think it's become much more important to how do you want to wield it and when and for what reason. I think the hard to explain is a great explanation why, you know, you just cannot one shot create these things because there's a lot of implicit preference. Awesome, guys. Anything to wrap? Call to actions? Are you hiring? Like who should reach out to you? Request for startup? What should people build that is going to be helpful to you guys? Yeah, I don't know. I don't know. We're always interested in talking to fellow engineers who are interested in agentic programming, figuring new stuff out. We want to hear from them, like what works and doesn't work. We're always willing to hire people with exceptional talents who are fully in this and realize that, you know, programming is changing a lot. And I don't know. What else? If you want to come on this journey with us and see where coding agents are going, then come along. Yeah, use AMP. Send us your feedback. And we are just so excited. We feel like kids in a candy shop. Just that we get to go build the future of coding. Feels like the final boss. Yeah. Nice. Thank you guys for coming on. This was fun. Thank you. Thank you.
Related Episodes

⚡️Jailbreaking AGI: Pliny the Liberator & John V on Red Teaming, BT6, and the Future of AI Security
Latent Space

AI to AE's: Grit, Glean, and Kleiner Perkins' next Enterprise AI hit — Joubin Mirzadegan, Roadrunner
Latent Space

World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI
Latent Space

⚡️ 10x AI Engineers with $1m Salaries — Alex Lieberman & Arman Hezarkhani, Tenex
Latent Space

Anthropic, Glean & OpenRouter: How AI Moats Are Built with Deedy Das of Menlo Ventures
Latent Space

⚡ Inside GitHub’s AI Revolution: Jared Palmer Reveals Agent HQ & The Future of Coding Agents
Latent Space
No comments yet
Be the first to comment