Inside Cursor: The future of AI coding with Co-founder Sualeh Asif

Gradient Dissent • Lukas Biewald

Tuesday, April 29, 202549m

Spotify Apple

Gradient Dissent

0:0049:36

What You'll Learn

✓Cursor was built by founders interested in scaling laws and language models, with the goal of creating an end-to-end coding assistant.
✓The team experimented with various approaches, like document-based coding and next-action prediction, to find the most useful features for users.
✓Cursor's founders were early Vim users, but switched to VS Code due to the popularity of GitHub Copilot, which they saw as a killer feature.
✓Cursor's success is attributed to its focus on building a reliable and useful product, being first to market with key features, and leveraging user feedback to continuously improve.
✓The founders' background in competitive coding provided valuable insights that shaped Cursor's development, as they understood the need for efficient and powerful coding tools.

AI Summary

The episode discusses the story behind Cursor, an AI-powered coding assistant co-founded by Sualeh Asif. Cursor was built to leverage the capabilities of large language models to enhance the coding experience, with features like next-action prediction and repository-wide editing. The founders' background in competitive coding and early adoption of tools like Vim provided valuable insights that shaped Cursor's development. The episode highlights Cursor's focus on building a reliable and useful product, iterating based on user feedback, and staying at the forefront of AI-powered coding innovations.

Key Points

1Cursor was built by founders interested in scaling laws and language models, with the goal of creating an end-to-end coding assistant.
2The team experimented with various approaches, like document-based coding and next-action prediction, to find the most useful features for users.
3Cursor's founders were early Vim users, but switched to VS Code due to the popularity of GitHub Copilot, which they saw as a killer feature.
4Cursor's success is attributed to its focus on building a reliable and useful product, being first to market with key features, and leveraging user feedback to continuously improve.
5The founders' background in competitive coding provided valuable insights that shaped Cursor's development, as they understood the need for efficient and powerful coding tools.

Topics Discussed

#Large language models#AI-powered coding assistants#Coding workflow optimization#Product development and iteration#Competitive coding and coding efficiency

Frequently Asked Questions

What is "Inside Cursor: The future of AI coding with Co-founder Sualeh Asif" about?

What topics are discussed in this episode?

This episode covers the following topics: Large language models, AI-powered coding assistants, Coding workflow optimization, Product development and iteration, Competitive coding and coding efficiency.

What is key insight #1 from this episode?

Cursor was built by founders interested in scaling laws and language models, with the goal of creating an end-to-end coding assistant.

What is key insight #2 from this episode?

The team experimented with various approaches, like document-based coding and next-action prediction, to find the most useful features for users.

What is key insight #3 from this episode?

Cursor's founders were early Vim users, but switched to VS Code due to the popularity of GitHub Copilot, which they saw as a killer feature.

What is key insight #4 from this episode?

Cursor's success is attributed to its focus on building a reliable and useful product, being first to market with key features, and leveraging user feedback to continuously improve.

Who should listen to this episode?

This episode is recommended for anyone interested in Large language models, AI-powered coding assistants, Coding workflow optimization, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

In this episode of Gradient Dissent, host Lukas Biewald talks with Sualeh Asif, the CPO and co-founder of Cursor, one of the fastest-growing and most loved AI-powered coding platforms. Sualeh shares the story behind Cursor’s creation, the technical and design decisions that set it apart, and how AI models are changing the way we build software. They dive deep into infrastructure challenges, the importance of speed and user experience, and how emerging trends in agents and reasoning models are reshaping the developer workflow. Sualeh also discusses scaling AI inference to support hundreds of millions of requests per day, building trust through product quality, and his vision for how programming will evolve in the next few years. ⏳Timestamps: 00:00 How Cursor got started and why it took off 04:50 Switching from Vim to VS Code and the rise of CoPilot 08:10 Why Cursor won among competitors: product philosophy and execution 10:30 How user data and feedback loops drive Cursor’s improvements 12:20 Iterating on AI agents: what made Cursor hold back and wait 13:30 Competitive coding background: advantage or challenge? 16:30 Making coding fun again: latency, flow, and model choices 19:10 Building Cursor’s infrastructure: from GPUs to indexing billions of files 26:00 How Cursor prioritizes compute allocation for indexing 30:00 Running massive ML infrastructure: surprises and scaling lessons 34:50 Why Cursor chose DeepSeek models early 36:00 Where AI agents are heading next 40:07 Debugging and evaluating complex AI agents 42:00 How coding workflows will change over the next 2–3 years 46:20 Dream future projects: AI for reading codebases and papers 🎙 Get our podcasts on these platforms: Apple Podcasts: https://wandb.me/apple-podcasts Spotify: https://wandb.me/spotify YouTube: https://wandb.me/youtube Follow Weights & Biases: https://x.com/weights_biases https://www.linkedin.com/company/wandb

Full Transcript

You're listening to Gradient Dissent, a show about making machine learning work in the real world. And I'm your host, Lucas B. Wald. Swale Asif is the CPO and co-founder of Cursor, one of the best loved and most exciting and popular AI products out there. It helps you with coding, helps you use LMs to do coding. I use it all the time and I really love it. And I was just excited to ask him about how he built such a great product. I found his answers super interesting, and I hope you enjoy this interview. All right. Well, thanks so much for taking the time to talk. I guess maybe this is a softball question, but I was really interested in just hearing the story of Cursor, like how you started it and what the moment was where it really started to take off. because now it's one of the most loved products, I think, out there. And that story comes from, we had been really interested in sort of scaling laws, and back in college, I had gone on and worked on a sort of search engine type company with a friend, and there we were really bullish on language models, because it felt like language models could really compress all the world's information and there should be this end-to-end index of searching the internet instead of sort of many of the heuristics we've coded in over the years. It felt like you could sort of, that should be the end-to-end way of doing things. And so scaling laws, doing the search engines, training large models at the time. I think ColdPilot was the first really big moment for us where it was this project that was truly magical. It was fast. It felt like it kind of knew you. But then ColdPilot did not improve much over the coming year or two. And for us, when we saw Jipping 4, we thought the ceiling for what a really, really great product was possible at that moment was really high. and then it was like pretty clear that like as the models got much better like you know scaling loss progress um models get much better the product that can be built in the future is even higher ceiling uh and that was like you just it was this sort of super attractive thing to go do it and you know we're all the coders at art and we wanted to be building things that we use every day. And, you know, Cursor was originally built for ourselves in many ways. It was, and it was sort of fun seeing that, you know, everyone else really liked it. It was definitely built for ourselves. And we were sort of experimenting. So a lot of the early culture of the company was experimenting with various different ways of using the models. Should there be a document that you're sort of typing things out and the model is coding things? Should there be, you know, what, if you want to do this, you know, next action prediction of like, you're in a location, what should be the edit? Maybe you should be, the model should be telling you where to go next. You should be able to make edits over your entire repository. And some of those things have taken a year, year and a half, several iterations. And some of them, you know, we've continued building on. So now some of the core parts of the product is this next action prediction thing where it predicts your next edit at the card's location and then where you should be going next, and that people really, really love that feature. And then we're sort of working our way towards you should just be able to make any edit you want across the entire repository, like code-based wide. And obviously there's some hurdles along the way that we'll talk about. Some easy, some, you know, sort of still quite difficult. Like, model still struggle with what exactly the architecture of the repository is. If you, you know, ask what is the architecture of the repository, that is really quite difficult because it requires sort of looking at potentially billions of tokens, tens of billions of tokens and say, asking the question, what is really going on? As opposed to like, you could like list the function needs, right? But that doesn't really tell you what is exactly going on. Well, Tolu, I want to dive into that as much as you're comfortable sharing. But I guess I wanted to ask you, one of the surprising things that I learned in my background research on you is I think you guys came from using Vim, not VS Code. Is that right? All of us were really early users of Vim. We did eventually had used VS Code. probably the last one to, there was a couple of us Amon and Arvid probably were the last to switch over from Vim to VS Code and the trigger there was GitHub Copilot Oh, I see so GitHub Copilot actually pulled you over in the end So I had switched over before, but then you know, Amon and Arvid only switched over after GitHub Copilot became, it was just the killer feature, right? In some ways it was the killer feature Totally, and why doesn't something like Vim actually have something like what you guys built? Like, it seems like a lot of smart coders, you know, like to use it. Is there something about it like a graphical interface that lends itself to this kind of structure, like coding with an AI? I think for us, VS Code is, for one, it's pretty clear it's the most loved platform on the internet. Or for coders, it's the thing that sort of is the ips of factors. It's the default. Totally. and we wanted to sort of incrementally evolve it towards the world where you're starting to automate coding. And the cursor of, you know, one year from now should look very different from the cursor of today, which means almost by default, it should not look exactly like VS Code. But, you know, in looking very different, you wanted to start from a place where you didn't want to have a text box to code because, you know, coders still want to type characters, right? You want to be able to edit your entire repository at a higher level, but at some point, if you find that there's a change that you can quickly execute in 10 keystrokes, we want to let you be able to dive into the details. At any point in time, you're maybe editing some pseudocode representation maybe a year from now, right? Like humans are editing a pseudocode representation and that's really quick to edit and the model is sort of working for you in the background, but you're writing some kernel and you want to go in and talk about some of the indices, it's much easier to do it by hand. You always, I think, developers will want this ability to go in and, you know, unless we truly believe that everything is going away, it's like you really, really want the fine-grained control. Yeah, yeah, that makes sense. One thing that strikes me from what you were saying earlier about observing that Copilot was really great and there's all this opportunity and how do you work with these AI models is I think a lot of people, other people thought that at the same time. You had this idea that I think many people had, including a bunch of YC companies and other products that I saw. It seemed like Cursor emerged as the winning one among these. It seems like there was great product execution here, which I'm always really interested in. Do you have a sense for what you were doing differently than your competitors that made your product work so well? Was it like certain decisions or was it like a process? Why questions are always really hard? I don't know. It's very hard to tell exactly what we did, right? I think there was a bunch of things where we always tried to push the ball as much as possible. We always wanted to be the most useful product at any moment in time. It's like at the frontier. It's very easy to overform us and under deliver. And a lot of what we have tried to do is to... We didn't ship the agent until we were very confident it was something that was really useful. and we had probably done three HA prototypes before that we did a shift because some version of the model would just lose track and you could make something that could help you in the short term and really hurt what people think of as a reliable product in the long term. Maybe that is part of it. It was always... But then also I think we've been first to a lot of the inventions that people really like. So, you know, more recently, the ability to jump to the next location that should be edited is something we've had for closer to eight months, 10 months, a year. And we hopefully will release a much more upgraded version of it soon that will be quite a bit better. And only recently, you know, other people have tried to do that. So we've always tried to be like, think of what's coming and at least have a prototype app as soon as we think it's something that's really useful. There's the tab to jump feature saves. There's the apply features, yeah. And we also like, you know, we've done this at scale also. So I think that that has benefited where, for example, for our custom tab model, we do something like 100 million requests a day and quickly growing it. And I think part of doing it well has been able to do it reliably for lots and lots and lots of people. Do you think that any of the data that you have or the feedback that you have from users is part of your success? Or are you more making the decisions through your own experience? The data has definitely been enormously useful. I think you know the feedback loops that people consider obvious are indeed extremely useful you want to be the company that feeds an extremely good product that everyone loves and then that definitely helps in making the next version even better it helps in small ways it helps in training models it helps in yeah where the small ways are you know you understand how how people are using your product, what is the most important thing to ship at any moment, and then in big ways in training models and improving just the core workflows. For example, technically speaking, one loop in the apply use case is you train your first version of an apply that is quite a bit bigger, and you then deploy it for all users. You get lots and lots of data, and then you can distill a slightly smaller model. That gets faster. The people use it even more. and you then distill an even smaller model and you can keep compressing the models down because you're sort of generating the data that allows you to do that. But also, yeah, it's this feedback loop. And then some of the things get faster. So for now, up to 1,000 or 2,000 line file, apply feels effectively instant. And that's what we wanted to feel, right? We wanted to feel like apply is this deterministic, you know, to figure out some deterministic algorithm to place the blocks, but that's not actually what's happening. It's a model that is actually group writing the entire file. And a lot of, you know, a lot of the improvements have been making the model smaller, there's obviously improvements in just making the inference much faster of it when doing these speculative edits. So yeah. For something like the agents that you talked about like you had some iterations where it wasn useful enough to ship how did you know that it wasn good enough to ship How did you think about that? I didn't use it on a daily basis. I think these things are really quite easy to figure out. If you're coding 10 hours a day in Kursher, like you boot up the editor, you're making the improvements, and you're seeing it on a daily basis, if the devs themselves can't, don't use it every single day, it's probably not something that everyone else will want to use. I mean, there's obviously corner cases to this thing where we're not the perfect coders, but a thing like an engine is such a general feature that if you're not using it, it's almost certainly not useful. That actually leads me to another question I had, which is you and your co-founders have this background in sort of competitive coding, right? Like, you know, does that, do you think that's an advantage for you? Because I could imagine that that might sort of put you at sort of the forefront of like wanting to be efficient in coding. But I could also imagine that you might have idiosyncrasies in the way that you want to write code that might be different than your general user. I think we're not only competitive coders. Like we did competitive math and code because, you know, that's sort of part of the background. It's always really hard to distinguish, you know, what part of your identity is the most important. But many of us had worked at software companies before, Stripe and the like. And you had some idea that production coding was very different. And people had actually rebuilt products. I think Michael had spent quite a bit of time building these high-performance games. And we had done modeling work. So we had seen quite a wide variety of coding. sort of bringing back to like does competitive programming really really affect how you did coding on a day-to-day basis like not really it was just I think we knew what engineering was we were sort of doing day-to-day engineering and you could see if the agent was helpful and in this case it was very clear that for example early iterations were not really that useful it was very slow I mean And one of the most important things that I changed there is the length of the context windows that you can do on every single keystroke on every single request. You know, when the model started out, you would start doing these 4K, 8K context windows. And even if the models, you know, slightly supported them, the models were not very good at using the large context windows. And now that you can sort of easily do, as the cost curve has gone down for the language models, you can do requests in the order of like, mid-hundreds, like 50,000 tokens, 60,000 tokens reliably, that has enormously helped. One intuition to have here is the model can't even feed your current file. It would not be very useful, let alone read the rest of your repository or do searches or lots of things that you expect a basic agent to be able to do. That wouldn't work if your 8K tokens, what can you even fit in there? another interesting thing that you guys said in your interview with Lux Freedom is that you kind of wanted the experience of using a code editor to be fun which I thought was kind of a cool idea like a little bit surprising right it seems like such a utilitarian thing it kind of reminded me that when I I remember when I switched in cursor from my default LM from Sonnet to 01 I actually think I started coding a little less like I was actually having a lot less fun because the latency was higher. It's kind of took me a little while to realize that, but I actually, for some reason, it's not had lower latency and it made it just a lot more fun. And I was like, you know what? I just need to go back to the LLM where I was just enjoying writing code. So I do actually kind of relate to what you're saying, but I'm kind of curious how the idea of a fun experience shows up in your utilitarian-feeling application. I think it just, I mean, there's always this sort of end metric, right? The end metric is how much we are enjoying using a model. And it's been very clear that we enjoy using Sonnet more than a one. And part of it is, there's a few things. So one is, I think Sonnet is extremely, is even at scale, like reliably quite fast. And I think we want to ship models that are even faster. that are better than Sonnet, that are much longer context windows, that could make edits reliably over a much larger set of your codebase for exactly the same reason, because it becomes much more fun. In some sense, it's this hard-to-pin-down feeling, but in some sense, you know what really affects it. You will get bothered if you have to explain to the model again and again what you're doing, or you will get bothered if... the model doesn't really understand that, like, you had some easily viewed file to open, and the model doesn't see it. And that's sort of annoying. It's just straight up annoying. So you can turn it into some methodical thing that you can, like, track down. But some of the inventions are just, like, you know, wouldn't it be more fun if blah, blah, blah happened? Like, wouldn't it be more fun if you were coding and the model would just, once you started doing it in factory, you could tap, tap, tap, share the entire thing, like 10 tabs. What would that take? And then once you think like, oh, you know, 10 tabs would make me feel really, really happy, you can then sort of reverse engineer the exact thing. Like the modeling work that you would have to do. So like what size of the model you want to train, how much time you want to spend sort of pre-training, post-training, and RLing the models to be able to consistently do the same behavior again and again. And you could, another example, right? Concrete example is you could always over-train the tab models to be annoying. So part of this, you know, if you were to only worry about making sure that every single time it does the edit, you would over-put it. Like sometimes you really want to like be writing against in kernel. You want to spend some time thinking and you don't want the tab model bothering you. And that's the thing you would only care about if you're making it fun and enjoyable, as opposed to something that like, yeah, something that's like, you know, obviously just always over predicting. But this is a pretty subjective experience that you probably couldn't pull from user data. So like, how do you work through that internally? Do you ever have like a difference of opinion with yourselves around what's the more fun approach? I think some of these decisions are subjective, but I think if you think it out, they're not always that controversial. Interesting. At the end of the day, you're trying it out. There's always some intuition where you might like over-trigger in some direction. but for the most part, I think there's not that much argument or is Sonnet more fun or is O1 more fun. I mean, Sonnet is arguably better. Hopefully, there'll be more models that are optimized towards keeping you in the flow. I think you need two categories of models. You need the category of models that is RL towards being fast and super large context windows and just make edits across your entire cloud base and make you feel like you're breezing through things. And you want a category of models that are trained for being extremely careful for reviewing every single small thing before they make the edit. Maybe do a bunch of research. Did they make the edit in the background for you? And then come back to you with the PR? And in that case, the thing that would be fun is if they're more correct than not. And then FAST is not the only thing that's fun. it's being correct or how they write out, like how they prove to you that they're doing the right thing. I guess as you kind of build a bigger brand and you build trust with users like me, why are you even asking me what model I want to use? Like I'm sort of aware of the different models, but I would sort of trust you more to know what's going to be fun and useful for me. Like why are you even exposing that? I think you're kind of right. Yeah, I think you're kind of right. We've been... Part of building the trust is always showing exactly what we're using. And I think you're probably correct that we should have a default mode and you should use the default and you should feel happy. But there's always, if you're the kind of person that wants a new mode and wants to perfectly fine-tune every single thing, you should be able to do that. And then there should be the simple defaulting. So there should be a release in a week or two that fixes all of this for you. Oh, here's something I've been wondering about myself quite a bit. Do you think there are best practices in changing the structure of my own code base or the way that I should code to make your product work even better? For example, we have one engineer that's been letting the LM put in notes inside the code base of helpful things to kind of help it understand the code base. One of the things that we've been sort of speculating about, we don't actually have a really correct solution there, but this idea of maybe there should be a readme.ai.nd in every folder, with the idea being at any point in time, if you ask changes around any folder, the model should be able to look up what's the nearest place where there's an architecture written down that it can... But sort of on the technical side, the thing to understand is the models are much faster reading tokens than humans. And like, what is a magnitude faster than sort of ingesting these tokens? But humans have, for example, like some small things memorized. So there is obviously small differences between how we code. Where the model is starting from scratch every time. So cursor tab in our code page is named CPP for being Copilot++. And the model will always sort of needs to be reminded that whenever you're searching for something that says Copilot++ or something, what you actually really need to, or whenever I say cursor tab, you should actually search for Copilot++ or something like that. So there are these facts and rules that are quite important. I don't want the default to be, so A, it would be better if everyone sort of changed their way of coding. I think the obviously better approach is we just figured it out. We should just spend all the time and energy we need, all the compute we need to really nail down the architecture that you have. Really figure out all the facts and rules. I don't know if I have any interesting controversial ideas for how that should be done. Someone was joking that maybe we should email you 10 rules in the morning and you'll just like yes and no on the 10 rules and hopefully we'll build up a corpus over time. Like you want a system that allows you to add rules and then prune bad rules. Like sometimes there will be, like if you just ask the model to look at a PR and give you some rules, sometimes it will come up with bad rules and you need a way of pruning them out. So like what is the minimal set of rules such that, you know, all your PRs become much easier? Does the model need to look at all of the rules? I mean, we're still sort of figuring it out. But I think there's something important at the core of this that is both in terms of like how humans would change and also in terms of what we should change just to like make the defaults much better. Because not every single person will change. Of course. But for example like do you think smaller file sizes are better because the model can more easily navigate the code hierarchy Or do you think that creates complexity There always some trade So the funny joke is that sometimes people will sort of keep adding to the same file more and more until the model can't edit it anymore. And then you just ask the model to refactor that file for you. Because you use sort of, you know, in cruiser terminology, you know, compose during the file more and more. and it seems pretty clear to me that there is obviously some advantage of the model seeing all the context development to the current tasks in the same file and also that for future tasks it'll be easier if the file was smaller. I think infrastructure-wise we will also make it possible for you to sync all of these files to a remote server so we will have a big enough copy of your code base at some point. So right now we're extremely privacy conscious and that means like we try to make sure that we never store any code past the life of your request. Ideally in the future, we can store at least some part of it in a private way that allows the model to very quickly do reliable edits. So you shouldn't have to sort of make these round trips for making every single small edit and that feels quite bad. What else? You actually, you were telling me that you run infrastructure. Also, can you talk about what the interesting infrastructure trade-offs are at Cursor? We've built lots of different pieces of infrastructure. There's sort of the traditional company infrastructure, but then there's also a lot of things. The one that we've been sort of very public about is our indexing infrastructure. We've spent a lot of time optimizing and running at quite enormous scales, like billions of valves per day kind of infrastructure. and for that we run our own inference so for all the models that sort of embed your files we run you know an enormous amount of just you see like really large pipelines so like if you're some big company and you have like 400,000 files or 400,000 or 500,000 files you want the ability to while the user is coding effectively feel like it's being instantly synced across to the server while the model was using the embeddings to search the code base or edit the code base, etc. So scaling that has been quite a challenge. I've seen there's been this broad category of databases that are being built on top of S3, and we're a big believer in this approach of you should build your database slash I don't think there's like the usual term is sort of separation of storage and computer disaggregated storage databases. But so the classic example of this is, we use TurboPuffer. The TurboPuffer stores most of the vectors on an S3 sort of path. And then they have a write-ahead log and you sort of write to this write-ahead log. The write-ahead, when there's some compaction process, it compacts the write-ahead log. back into the database. And then there's sort of new challenges we've been dealing with with this indexing infrastructure. So we've been thinking about, is there a way in which you can support shared code bases? So if you, you know, all the people at weight and biases have a really big code base, hopefully in the future, you will be able to spin up background models editing your code base. And so, you know, we want... thousands, if not tens of thousands of sort of clients that are connecting to that code base. And we don't want to have 10,000 copies of the Waste and Biases code base for most of which are not being utilized. So can we have a shared truck? And then every single person can have their sort of branch off that trunk. That sort of architecture is still, you know, we're working on it. It's not exactly easy to do because, you know, how do you easily branch this vector database? At the end of the day, you want to be able to query both the trunk and your section and merge them in a way that you still get the correct top K chunks. That's not trivial. So when I fire up cursor, it's like quietly indexing all the files that are in my project? So we try to, you know, yes, exactly. So when you fire up cursor, it quietly indexes every single thing, whether, as long as you both A, allow us, and, you know, if it's default turned on. One really popular cursor use case is like you open up a GitHub repo, you clone it, and then you fire a cursor in that GitHub repo, and now you can quickly ask questions about it. And we try our best to make it effectively instant to index these really, really large code bases. There's still, obviously, if you clone like LLVM, which is 120,000 files, that will take us a bit longer. For example, an interesting infrastructure question for the listeners or whoever you guys, like wondering about is like, should you, how should you allocate these token capacities? So we at any point in time have a fixed number of GPUs, which means we have a fixed amount of token capacity. Could you be, you know, you want to index LVM or weights and biases and that's a really large code base and then there's a bunch of people that have a number of small code bases. Should the number of small code bases always be allowed to go through and you should be slow or should you take a lot of the capacity in the beginning and everyone else gets like a smaller chunk in the hope that you shouldn't get a really bad experience. And that kind of question is still sort of hard to run scale. Well, how do you think about that? Currently, we try to keep both sides relatively happy. So you can boost up your capacity up until the next thing. But I'm still looking for better answers. I think there's probably, we haven't spent that much time thinking about it, but hopefully there's a really good answer to how do you make people happy. There's no serverless GPUs, right? There's no great serverless option. Because at the end of the day, the amount of compute we're spending is still fixed. You know, Lucas indexes, the amount of compute is just like, the amount of compute to index your code base plus the amount of compute for every single other person that we're indexing. So in an ideal world, there'd be this phenomenal serverless thing where you could boost up your capacity, and then people can use that capacity and it will get boosted down again, which is what would happen in CPU land. And that sort of infrared has not been built for GPU land. Is indexing the main thing that your GPUs are doing? Because you're also running lots of models too. Yeah, so we run the tab model. Indexing is a very small percentage of our GPUs. I mean, we run the tab models and hopefully we'll be running much larger models in the future. and yeah, they far and away dominate most of the compute cost. I see. So the model running, the tab models. Yeah, so tab models, like hundreds of millions of calls per day. The big models we're running have, you know, thousands of requests. Okay, so without giving you the details, you know, thousands of requests going on. So we're scaling up these models as fast as we can. and they definitely take a far more compute. It kind of makes sense also. They're like LARP. One intuition to have is again you're doing tens of thousands of tokens of inference per keystroke per person, which is both really cool and also really scary if you're running the inference for it. Obviously caching really helps but it's still scarier. scarier than running a server. Has there been any surprises as you've scaled up this ML infrastructure? I mean, you've got to be one of the fastest scaling ML companies ever. Have there been any pitfalls? I don't know. What's that experience been like? Smooth? There have been glitches, but I think, again, the team is really, really talented, and we've sort of gotten over it. Nice. What about, I mean, we're talking like, you know, maybe two weeks after DeepSeek came out and then obviously caused investors to like change their mind about NVIDIA stock. Did it like update your beliefs at all? It's really weird to me because I think we're like both on the LexPod, but also before that, we've been pretty public about using DeepSeek in many ways. And we used to use their 1.5 series models and then switched over to their V2 series models. So it was a big shock to me personally to me that everyone was sort of going, hoo-ha, like this is some new thing. They've been producing phenomenal work for a while. Their models, I used to joke that they were one of the three or four or five companies that you would trust to produce good models, where the numbers wouldn't feel like they were juiced up in a way that there were certain models that felt like their numbers had been a little bit too juiced by juiced up, I mean, they were really high for evaluations, but then if you use the model in practice, you would never like using the model. It's just very specific to the evaluations. But DeepSeq, I felt was very honest about things and has been producing really good models. So we've been running DeepSeq V2 model for 8 or 10 months now, probably 12 months, something like that, on our own inference and that's the model we've scaled up to hundreds of millions of calls. Interesting. How did you choose it? Was it like just the best? Oh, we knew they were, yeah, it was the best. They had been producing extremely good open code models. We have our own sort of post-training stack and we do RL and stuff, but for just picking a really well pre-trained base, DeepSeq does a phenomenal, phenomenal job. The data they train on is really good and the model is both quite knowledgeable, quite smart. and also quite cheap to run for the tab in particular. And, you know, I think in general, I'm really excited about DeepCQ 3. I think DeepCQ 3 is actually a really well pre-trained base for a large model. And I suspect it will be very, very useful for making these custom applications. So, I mean, you've obviously launched, you know, like agents and it's pretty cool, but it's also kind of, you know, contained and how many iterations step stage and will do and things like that. Like, where do you see kind of agents, you know, going in the near term? Like, obviously, inference is getting a lot cheaper. It seems like you'd go much broader if you wanted to. Like, what are you thinking? I think we're super focused on it. I think as people have been getting better at doing RL, the model are getting better at both thinking and also being extremely coherent. So I think one of the things that I just talked about, the models have gotten over producing tens of thousands of tokens of output, which they were not before. I think they would sort of go into delusional mode after a couple thousand tokens immediately. And now they've gotten quite a bit more coherent. And that comes from doing RL and really, really good post-training. And I think agents were bottom-backed by that particular aspect of coherency. One of the things that makes the Sonnet experience really magical for using in an agent is that it's so coherent over such a long period of time, like over tens of tool calls. And, you know, I suspect as the task at Arvind Hardware, you would need to be coherent over hundreds, if not thousands of tool calls and working on it. One of the things that we... I think tentatively, like, again, like back to the mission of the company, the mission of the company is sort of what is the We want to automate as much of coding as possible and while still having the developer in the front seat And automating coding in the short term involves you know allowing developers to just like, in the cases where they want to sit back and let the model code and doing that, but in the cases where they want to drive the editor to like, make code, like, I don't know, you're doing version bioses and you want to like switch your gRPC thing to some other TLS package in Rust, you should just be able to tell the model, I want to switch my gRPC thing to use Rust TLS instead of something else. And the model should just get it and be able to make these large-scale code-based changes. And that requires the model to have some agent-type things because you're never going to sit down and write out exactly the spec of your code base. The thing that the agent really helps with is you don't have to sit down and explain like, yeah, we are 1DB, we make this, you know, we have a backend that is written in Rust and Go, the Rust hooks up to going this way. And, you know, for our library, we use this and models should just go and figure it out. You know, my own experience of playing with agents, which is much diminished compared to yours, is that, you know, when it breaks, it's kind of the challenge to debug. Like, have you built any systems internally for just even looking at like, okay, what is the agent doing? Why did it get in a weird loop here? What's happening? How do you visualize that? We're building our own infrared for now. I suspect that there will be phenomenal products in the future that will make this much easier. For now, same thing with building prompts. So we used this internal library called Preempt. And the way we built Preempt was it was well-suited to our own need to design. And I think for the same user agent infrastructure, we're rebuilding our own infrastructure in the short term. And I suspect in the long term, there will be somewhat phenomenal dev tools that will come up to make it much easier to both inspect the chains, be able to stop at any point and restart the chains, be able to debug them in production when something weird goes wrong, all sorts of things that you would need to be able to run like a production system at scale. is the agent evaluation like more also like a it sounds like it's more of a vibes based approach than like specific metrics yeah it's pretty clearly vibes based i suspect it'll be vibes based in the short term and become as as we get better shipping these it'll become more and more sort of um daily metrics and you'll be much more operational with it when you look at like something like a devin or like these sort of like completely automated like no programmer like approaches do you view that as like competitive or interesting or like what what is your your take there and in the medium term if if you can actually take your hands off and let the model drive your entire editor or drive let the model drive the entire editing process uh i am totally open to it but in in the case where it's like not really useful and kind of boring and not really that fun. Like, we just wait. We just what? Just wait? We just wait until it gets good enough. Like, we keep training the models, and at some point, it will get good enough, and then it will need really fun to use. I think in general, over a one to two year timeframe, I expect that the way people will code will change. And I don't think people, I think in the short term, that seems really scary, but I think It'll be this gradual process and it'll be extremely natural to everyone coming in. That the way coding is changing, I think, for example, the change from not having a co-pilot to a co-pilot was extremely natural in retrospect. It was not something that was scary to anyone. It was this thing that predicted the other thought and you were like, wow, this is phenomenal. And you just started losing it. and then the change from going from sort of the scope palette to this you know four crown agent interface where the model does edits across multiple different files and you're like oh i want to switch this to use rest tls and i want to you know make sure that you always use http2 and blah blah blah like the model gets it and it reads all the files and it makes the changes and you can immediately review the changes very quickly and tell that are correct and that was also pretty natural. I don't think there was any way in the middle where people felt disoriented and I think the sort of going to background things, it'll be all these things are always more gradual than one would expect. You would have expected in 2020 that if I said the way you'll be coding is you'll sort of start talking to the computer and it'll make changes to random files and you'd be kind of freaked out. You'd think, oh, it's going to add all these bugs. it's going to be impossible to review like I really enjoy coding why the fuck am I doing this you know all these things would have seemed scary and yet you know five years in four years into the language model you know language model journey of products like things feel quite natural so like 2021 Skopal and 2025 are in now and at any point in time you know making the change has not felt very disoriented which like maybe maybe in one chapter would have, but right now it's not really that disorienting. Well, it feels like a lot of fun to me. I mean, like, it's like, I guess like when I like connect the dots from like 2020 to now. It's gotten better. It's like, it's, it's, it's gotten better, right? It's like, it's not even gotten like. I guess where like I'm going, it's like, you know, when I, when I look a few years out, it's, I have no idea, but it's hard not to see that like a world where you, you wouldn't really be doing anything that looks like programming a few years out, right? Or where do you. More people will be coding, more people will be making much more difficult things, like things that are considered much more difficult, be it lower level things, be it larger projects, even for their side projects. I think people are usually very conservative with their side projects because they're like, ah, you know, I probably won't have that much time. I think people will get much less conservative with these side projects. I'm generally just extremely optimistic in the medium term. Yeah, yeah. Do you feel like at all, like, I mean, I guess first of all, like, I mean, don't you think it's a totally different world where everyone can do these like monster side projects easily? Like, it seems like software is a very different feeling. Even doing a software company seems like it might be hard to have a protected advantage as much, right? When it's easy to build this stuff? I can't philosophize over that. I'm not really scared of people having medium-sized... I tend to think of these things as like experimentation becomes much more natural. I think a lot of the things that... large changes are usually scary at companies because a large change requires changing so many pieces and changes so much time that you want to plan out everything up front. And then planning is really hard because you can't really foresee like how your production system will look if you do X, Y, Z. Then everything becomes much more scary and then you add more meanings and it becomes more formal and then everything just becomes worse and worse over time. And I understand it, right? If you're doing monkey, your database transition, and boy, do you want to plan out every single small detail. And then you want to argue over every single small detail. But if you can start sort of prototyping these things really quickly, maybe it becomes less talking, more coding. You know, you have much cleaner concrete artifacts. If you're in PyTorch and you want to do a small API change in PyTorch, it'll take a year. You probably want to debate the hell out of it. if you're in PyTorch and you can have a prototype in three days, maybe you should just argue over the prototype now. Is that how you do things at Cursor? Hopefully more and more so, yeah. I mean, there's still things that are scary, but definitely I think I found myself thinking it's just much better to argue over the code. I suspect that change will continue. Awesome. Well, I guess one final question, if something comes to mind. When you think, if you were sort of outside of Cursor and kind of like fresh eyes into this, you know, kind of world of AI applications and LMs that kind of working for so many different things, is there something else that kind of excites you that you wish you had time to think about? personally for me I've always wanted sort of like a really good reading experience both I like to spend my time sort of free time either you know reading or spending time even reading code bases I think it's sort of this underrated aspect of coding that like you know all of us produce these some of these artifacts that we've poured our you know many years of our life into reddit someone has poured their life into reddit and i really want to go read and understand reddit what were the hard decisions what were the easy decisions uh and uh i think both for reading books for reading papers and for reading code bases we haven't discovered the the final optimal ai tool i think hopefully cursor will contribute to at least reading code bases but maybe someone makes it easier to read books or to read papers, I'll be really happy. Reading papers is still quite an arduous process. PDF viewers, I don't love the current PDF viewers still. You click a thing and it'll jump into the final thing. It feels like a lot more primitive than it should be. I've recently been reading papers by just pasting them into one of these chat apps. and things are getting better. I think in general, it feels like there's a lot of low-hanging fruit in lots of different areas of life. Okay, I got to ask, what are your top recommended reading code bases? Well, as I just mentioned, Redis. Redis is quite good if you haven't read it. It's relatively small and still it's still quite fun. Probably that's the one I'd most recommend people because it's a thing that is used by everyone and it's just really, really well written. SQLite for sure also if you haven't read SQLite. Again, very well written. It's this coherent document by a very small number of people. And then I think mostly I recommend like software that you use. You should try to go read this offer that you use. I mean, some things are harder, but like, I don't know. If you're a fan of Ghosty the Terminal, maybe you should go spend a weekend trying to read Ghosty. Or like, if you're a fan of PyTorch, maybe you should go look into why PyTorch does what it does. I think there's a lot of choices that you can sort of criticize on the outside. And people underappreciate the tremendous amount of work that people say on the PyTorch team have put in to make PyTorch like really, really easy for you to use. And this magical experience where all the just sort of gradients flow naturally that has taken like many tens of thousands of engineering hours. I don't know if it's in the hundreds of thousands or the millions, but it's like a lot of engineering hours. Interesting. Well, thank you so much. I really appreciate your time. Thanks so much for listening to this episode of Gradient Descent. Please stay tuned for future episodes. Thank you.

Share on X Share on LinkedIn