The rise of AI agents

Gradient Dissent • Lukas Biewald

Tuesday, February 25, 202549m

Spotify Apple

Gradient Dissent

0:0049:09

What You'll Learn

✓Most enterprises will have thousands or hundreds of thousands of AI agents working for them in the next few years
✓Agents are being used for a variety of tasks, from simple use cases to more complex ones like automating video editing
✓Key factors for successful agent deployment include active support, technical integration, and a clear understanding of use cases
✓Challenges around tool usage and pricing models as agents become more prevalent
✓Crew AI has seen rapid adoption, with 40% of the Fortune 500 using their platform

AI Summary

This episode discusses the rise of AI agents and how companies are using them to automate various business processes. The guest, Joe Mora, CEO of Crew AI, explains that in the next few years, most enterprises will have thousands or even hundreds of thousands of AI agents working for them. These agents are being used for a variety of tasks, from sales and marketing to back-office automation and even complex tasks like automatically editing live video footage. Mora outlines the key components of a successful agent deployment, including active support, technical integration, and a clear understanding of the use cases. He also discusses the challenges around tool usage and pricing models as agents become more prevalent.

Key Points

1Most enterprises will have thousands or hundreds of thousands of AI agents working for them in the next few years
2Agents are being used for a variety of tasks, from simple use cases to more complex ones like automating video editing
3Key factors for successful agent deployment include active support, technical integration, and a clear understanding of use cases
4Challenges around tool usage and pricing models as agents become more prevalent
5Crew AI has seen rapid adoption, with 40% of the Fortune 500 using their platform

Topics Discussed

#AI agents#Enterprise automation#Agent deployment best practices#Tool integration#Pricing models for agent-based services

Frequently Asked Questions

What is "The rise of AI agents" about?

What topics are discussed in this episode?

This episode covers the following topics: AI agents, Enterprise automation, Agent deployment best practices, Tool integration, Pricing models for agent-based services.

What is key insight #1 from this episode?

Most enterprises will have thousands or hundreds of thousands of AI agents working for them in the next few years

What is key insight #2 from this episode?

Agents are being used for a variety of tasks, from simple use cases to more complex ones like automating video editing

What is key insight #3 from this episode?

Key factors for successful agent deployment include active support, technical integration, and a clear understanding of use cases

What is key insight #4 from this episode?

Challenges around tool usage and pricing models as agents become more prevalent

Who should listen to this episode?

This episode is recommended for anyone interested in AI agents, Enterprise automation, Agent deployment best practices, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

In this episode of Gradient Dissent, host Lukas Biewald sits down with João Moura, CEO & Founder of CrewAI, one of the leading platforms enabling AI agents for enterprise applications. Joe shares insights into how AI agents are being successfully deployed in over 40% of Fortune 500 companies, what tools these agents rely on, and how software companies are adapting to an agentic world. They also discuss: What defines a true AI agent versus simple automation How AI agents are transforming business processes in industries like finance, insurance, and software The evolving business models for APIs as AI agents become the dominant software users What the next breakthroughs in agentic AI might look like in 2025 and beyond If you're curious about the cutting edge of AI automation, enterprise AI adoption, and the real impact of multi-agent systems, this episode is packed with essential insights.

Full Transcript

You're listening to Gradient Dissent, a show about making machine learning work in the real world. And I'm your host, Lucas B. Wald. All right, this is a conversation with Joe Mora, the CEO and co-founder of Crew AI, which is one of the leading AI agent platforms. He is on the forefront of agents and their applications to enterprise. So we talk about how agents work on the Crew AI platform. We talk about which companies are using agents successfully. We talk about what tools those agents call most frequently. And we talk about how software companies are actually reshaping themselves for an agentic world. I hope you find this conversation useful and enjoy it. All right, Joe, thanks for taking the time. I really appreciate it. Can you start by describing the problem you're trying to solve with Crew AI? Yes. And by the way, thank you so much for having me. I'm very excited to be here today. So the problem that we're trying to solve with ClearAI is that three years from now, most companies, especially enterprises, are going to have thousands, if not hundreds of thousands, agents working for them. And either that's going to be thousands or hundreds of thousands legacy code bases, or there's going to be something they're going to be able to manage in an actual asset. and we are building the control plane that allows them to do that. Interesting. So what do you expect these agents to be doing? I think it's going to be a little all over the place. I mean, we're already seeing that there's no clear winners. Within these companies, it's very much cross-horizontal. So there are people that are doing backups, automation, people that are doing coding, automation, people that are doing support. So it's a little bit of everything. And if anything, that just makes the challenge a little more exciting to me, just because if you prove like this is doable in one specific horizontal, it's very easy for you to now start talking about all the different horizontals that you can expand to. So I think it's going to start with what we call low precision use cases where you still have humans in the loop, people are reviewing things, but I think that you gradually kind of like evolve into more decision making and things are going to get very interesting by then. And what does the control plane do? Well, when you think about the life cycles for these agents, right, they start with planning them. So a lot of people are focusing nowadays on beauty, but beauty is just one component, right? And there are frameworks out there. Cray AI is one of them, probably the best one, if you ask me, and many other people out there. But it's not only the beauty. You got a plan, and then once you build, you got to deploy, and once you deploy, you want to monitor. And once you monitor, you want to interrate on this. And that is not even mentioning things like authentication, scoping, access, marketplaces, and everything in between. So what we're trying to do is across their entire stack and make sure that these companies are equipped to build, deploy, monitor, and interrate those agents. So what is working today? What are companies actually doing with agents right now? Nice. Well, I got to tell you, it's kind of impressive. If you look at the more simple use cases, there's a lot of people doing sales and marketing and all that. Then you have more advanced companies that are doing some back office automations where they're automating more custom processes within their businesses. and then you start to getting into kind of like some of the cutting edges and you find companies trying to tackle very complex problems like automating entire lifecycle kind of like code creation or trying to automatically fill RRS forms, what is something that you don't want to get wrong. And then there's like some folks that are more into, even more on the cutting edge when we see kind of like a media company using query AI agents to automatically edit files. So you have live footage of games streaming on TV and then agents that can track the ball automatically editing, coding, adding captions and sound and pushing that into social media. So there's a little bit of everything. I want to say it's very early days. A lot of these companies are early on their journey and starting with kind of like simpler use cases that they can then scale on. what what kinds of companies are successful deploying agents like what are they doing because you know when i talk to most companies they're interested in agents like there couldn't be like a hotter topic than agents right now but most of the companies that i talked to haven't yet successfully um gotten agents to do anything except maybe like a little bit of um coding sometimes chat support sometimes a little bit of like sort of internal support that that's those are the places where I see things working, but it seems like you're probably talking to companies that really are like nailing this. Why is that? Well, I think there's a combination of things, right? And it's funny that you asked that because that automatically becomes some of our qualification criteria on deciding what companies we engage more deeply with, right? Because as you said, it's a very hot topic right now. So everybody and their mother wants to talk about AI agents. And because we're kind of like the leading platform, everyone comes to talk to us. So what we got to do as a business is kind of like being able to filter out, like who actually is real and check a bunch of those boxes so that we can go deep with them and really bear hug them. So usually the signals that we look for, for the companies that translate into successful use cases is there is this active support. So there is this active, they're kind of like saying like, hey, we really need to embrace this. We know this, like where their business is going. There's usually technical support as well. So there's like, even if the buyer is not a technical persona, there's usually like someone technical within kind of like the project and the scope that is going to unlock those internal integrations with kind of like homegrown systems and whatnot. That really unlocks kind of like the power for more custom use cases. And then there's just like an understanding of like what use cases they're actually one to pursue. right like when like for example if a company approaches us asking what they should be doing and what we're seeing other people do that is usually not a good signal because there's not necessarily like a clear ping or something that they're trying to do and that they haven't spent too much time thinking about that so those would be some of the signals that we look at but now on the day-to-day there's a lot of kind of like starting small and then expanding into other things but we have for example more advanced customers There are like Fortune 500 companies that actually started with simple use cases, and now they have a lot of their pricing flows automated. So they adjust prices using agents on the runtime, in marketplaces and other spots, kind of like agents monitoring competitors and a bunch of other things throughout their company. So very interesting use cases that we're seeing out there. Interesting. I guess, do you have a specific way that you define agents? I feel like everything in AI, once it gets hot, the sort of definition expands. Would you consider a RAG application an agent, or does it have to be more advanced than that? I think it's not about how advanced it is, but I would not consider that. And it's funny that you say that, because I think we're talking with some folks from Gartner, and I think they're referring to this as agent washing. Companies that are stamping, this is agentic now and trying to pass by that. I think that is in the short term detrimental for the industry but in the long term I think things are going to consolidate and figure themselves out. Now my definition is that agents require agency. You can have workflows and you can have AI help with those workflows. I think at the point where the AI is actually controlling the flow of what is happening next, I think by them you have an agent because the AI is actually guiding the process and they have the agents to choose between A, B, or C. So like a RAG application wouldn't count because it always is the same tool for the lookup and the same tool for summarization. It kind of needs to be able to choose its own path. Interesting. Okay. Exactly. It's kind of like if this, then that, right? Like if this, then that, like that's not necessarily an agent. Now, if you get that rank output and you do something with it, and depending on what you're doing, like different things might appear and you don't necessarily know what that might be, then you have an agent. And now what about tool use? I mean, tools obviously are like a critical component for adoption of agents. What kinds of tools do you see used most frequently and where is this going? Yeah, I think tools are kind of like what make these agents very useful like near the end of the day, right? You got to have like tools that allow these agents to connect with kind of like internal data or external data or whatever you want. I think there's a lot of kind of like the simpler one that you got to have. Tools that are going to help you with research or with scraping. That covers a lot of use cases out there, especially some of them are simpler ones. Now, again, where Vennu really gets unlocked is when you can tap into internal data. And a lot of the times that is kind of like some of that bigger enterprise companies that we all know of, like Salesforce or other CRMs out there, SAP and a few other things like that. or again, internal groom systems, like those systems that have been running in these companies for many, many years and they need to tap into this data either through an API or through a data lake. So I would say those are some of the things that we usually look for in terms of tools that really unlock value. Do you see companies like, for example, a Salesforce modifying their APIs or even changing their pricing model due to agents using them as a tool? Like, for example, what does a per seat price mean in the context of an agent? Do you think if I have thousands of agents and they all access sales price, I'm going to have to buy thousands of agent seats for them? I don't think the seats is the way to go. I mean, we're seeing some companies experimenting with that. I think LinkedIn is doing something in the end of the lines. And I have heard that it's a tough problem for them to navigate at any other company out there, right? Because do you charge less for an agent seat? And what if this agent's like so good that actually like you don't need an actual seat? Like you're basically kind of rising your revenue a little bit. So it makes for all this kind of like interesting kind of like problems that you got to navigate. Now, I do think there's going to be endpoints that are more focused on agents and then it'll be kind of like optimized for agents. That said, I think that would happen further down the road than most people believe. just because it's so much easier for you to try to get agents to comply with the inputs and outputs that we humans already use because then you do that once and you unlock the entire internet instead of trying to do the opposite and you have to translate the entire internet into something that the agents can do. So I think it's going to take a little longer, but there's definitely going to be kind of like some pricing models updates based on kind of like agents now consuming data instead of humans. Now, your website says that 40% of the Fortune 500 is crew AI. Can you talk about how you got that adoption? And also, can you talk about what verticals or industries have the most adoption of agents in crew AI at? Yeah, I got to say, the adoption is insane. A bunch of that happens through open source first and then eventually migrating into enterprise. So it interesting because when I first created this project I was not expecting it to blow up the way that it did And I very thankful for it I think we have an amazing community now And if anything, I still remember being actually in here where I am now. I'm on Shack 15 in San Francisco. And I remember being back in Arun on the other side. And I remember someone from Oracle approaching me. Back then, Cray-I was not a company yet. It was just an open search project. And I remember this person approaching me and saying, like, hey, we are Oracle. We're using Cray-I in production. Can you help us? We need some help. And that, for me, was a big aha moment because I was like, all right, so this big company is actually using this open search project and they need help. If I'm going to provide them with the help that they need, I don't think I can do that through an open search kind of like side gig kind of thing. I mean, I didn't have resources. So my boy, if I turned this into a company. So it was very organic. I think me being so enthusiastic about agents overall and just being a lot of like doing a lot of being in public also helped to drive some of the adoption. And then a lot of the educational content that follow up that, I think also helped penetrate a lot of those big enterprises. And then nowadays it's funny. Like we have, for example, major banks reach out to me and I learned that their CTO has been using Cray-I open source. And they're like, all right, I guess that's happening. But in terms of verticals, I think there's a little bit of fabric thing. There's definitely a few verticals that are moving faster. I think GCP is impressive how fast they're moving. I think finance is now also moving a little faster. But again, they move faster in terms of interesting, but it's a highly regulated industry. So it takes a lot of more conversations to get them to start deploying some of these things. But I would say those two are industries that have been pressing me. And also some insurance companies as well. They're moving fast and faster. Interesting. So finance and insurance is actually moving the fastest. And GCP. And GCP. Interesting. What about in terms of volume of usage? So, you know, your website says over 100 million multi-agent crews have run using crew AI. So what have been the sort of like longest running crews or however you define that? Yeah, I got to say, we need to update that number. All right, well, what's the number now? Well, I can tell you that January alone, I was pulling this because we had basically the presentation yesterday. And I pulled it up and January alone was over 50 million agents. So that was, yeah, that was insane. So I like lifelong time now? I don't know. I don't think we got to a billion just yet, but we might be closing in. But what I would say is, yeah, it's insane to see some of the scale and how fast things are accelerating there. Sorry, I missed that. I forgot the question. The question was, what are the longest running agents? Oh, longest running agent. I got to say, a lot of that, there is one that we're co-building with a customer now that will be a long running one. I think more than the long running ones, the one that I have seen that are more impressive are the crews that are basically groups of agents that had a lot of agents on it. So I remember seeing at once a crew that had 21 agents on it. And that for me was impressive. Like, all right, I didn't see that one coming. How does that work? Like why so many agents? What was it doing? Yeah, so this one was great. It was actually, it was a company that basically sells reports to bigger companies, like those bigger kind of like consumer-facing companies around their competition and their market positioning. So this company would basically charge them a lot of money and they would put a huge research on, here are all your competitors, here's what they're doing, here's like photos of their actual stores, Here's like how they're positioning products on their stores. And in a bunch of like other like researchers, a lot of like market slash competitive analysis. Now, some of those things, you've got to get people on the streets and you can replace that just yet. But a lot of that was research, was like digging and digging and digging and digging a bunch of materials. And you kind of like know the sources that you want to check, but you also want to do some exploratory. And that was basically what this use case was doing. So the reason why they had so many agents is that they went very specific and specialized on each agent and what they want each agent to research. So if one agent would be searching kind of like the financial aspects of the market and competition, there would be one dedicated agent to that. If they had another agent to research something else, like branding, positioning, and marketing, there would be a specific agent to that. And at the end of the day, they would produce like tens of patients' lungs, kind of like final reports. Then that's kind of like that deliverable back to the customer. Interesting. What about the different levels of autonomy with the agents? I mean, I know for autonomous vehicles, they talk about the five levels of autonomy. You've talked about human in the loop. How do you think about that? How do you set up agents to actually go back to humans and engage with them? how do you put in checks and balances to make sure that the agents are producing the intended results? Yes. So I think that's a great question, by the way. I think there's a lot of different ways that you can do it. So it depends a lot on how much precision you want to have and how complex is your use case. For most of the people where they want to have a lot of agency, they want to have these agents to really go out and figure it out, there's a few different ways that you can put check and balances. so that might be number of requests that agents can make and how much time they have to get the work done and those things kind of help them to just like not go haywire, right? And then there's a few other things that you can do like you can add programmatic kind of like what do we call it, guardrails so you can call programmatic guardrails so as an agent finish something instead of just saying like I'm done and you got to get this output through this programmatic guardrail that it might be actual just Python code, if you want to, that might check, for example, I don't know, let me think about something silly. How many times the word but shows up, right? And if there's a lot of but like showing up throughout the thing, you're kind of like, no, this is not good, you go back. So this would be like when, like a few of the options that you have. Now, if you're talking about more complex use cases, so for example, I mentioned the one about filling IRS forms for taxes. That's something that you want to be way more careful about. And by the way, those forms, they are, in this use case, it was a form that was around 60 pages long. But fear not, it comes with instruction manual. And the instruction has 720 pages long. Yeah, so for that use case, again, you want to have so much of agents, but not so much. So what we would do is you would have a flow. So we have a kind of like all query I flows that allow you to kind of like intertwine agents and regular code together. So you would do basically with regular code, get each page off the form, extract all the questions from that page. Now, those questions one by one go to a group of agents. And these agents would then perform a reg on an internal database. They would check other few data leaks, and they would create the answer to answer that specific question. And in the process of doing so, they could look at the instruction manual. So the agents are always working in the scope of one single question. So they're not like going crazy and hallucinating about all those different questions and all that context. And then that is done, and they go to the next question. So this would be one way for you to have way more control on this kind of like more complex use cases. and at any point in time with career AI at least you can force human the loop so you can say like hey there's going to be a human the loop right here and then the agents can like stop and wait for you to get back to them. I guess that's a good segue into talking about your career AI platform I mean you've you've given me some really good examples of agents and why that would be useful but can you describe you know what parts of the solution you leave to the ecosystem or other vendors and what parts of the solution you actually do for your customers? Yeah. I think there's a lot of parts on the vendor side. One, doing anything AI, not even agents, anything AI is very complex. It's not only doing an API call, but only yet to so far. And I believe there are problems out there that are big enough that entire companies should exist to solve them. And we've already seen that, right? So there's big clusters. I want to say one is all the things about fine tuning. Like that feels like complex enough that you could have entire companies just to help you get that right, right? And I mean, you folks have a lot of background of that, right? If you go back, your folks are OGAI, all the things from hyper parameters like tuning and everything in that, like that requires a lot of work for you to do it right. And if you're just doing like chip or notebooks on your local computer, you get lost very quickly. And you did. I still remember not being able to replicate something and being just so mad at myself. Like, I don't remember what was the hyperperimptors that I used that I can't get the same ratio. But yeah, again, sorry, derailing the conversation. But I think fine-tuning is a good example. I think that's something that we don't offer right now, maybe in the future if I ever get there, but they have so much on our plate right now that I think this is a big enough problem for a company to solve. So that would be one example of it. But sorry, what do you do, actually? So that's probably how I should have asked the question. Yeah. So there's a few things. On the platform, again, we try to cover all those different stages of the agent building deployment and monitoring and interrating. So you can go into our platform and you're going to be able to, one, configure everything for your company. So that will be not only inviting the right people by setting up the right permissions and roles, setting up your LLM connections, and you can bring any LLM. You can use a private proxy if you want to. So you can connect all those things in there. Now, once that you're set up, you go into building. Like, all right, I want to build these agents now. So now in the platform, you can not only use the open source, but you can also build with no code. We also offer like Rue Studio, where you can basically chat your way into an automation. So you basically get this automation going, the agents created, and you can deploy right away, or you can go back into code. So we're talking about setting up, planning, and building. Now, once that you build them, you need to deploy it. And you can deploy it right in there as well. That automatically becomes an API, and that's production graded. So I'm talking load balancer, auto-scaling, SSL, like all those different things. So you have this in an API now that you can integrate it for other things. And then you've got a bunch of metrics based on that. And those metrics are more agentic. So you get to see not only prompts and that, but you can also go into the quality of the outputs, the hallucinations and how much there's happening in there. And you can also set up custom metrics if you want to kind of like track that and set alerts based on that. And then we go towards like the end kind of like of the life cycle where you want to now integrate on this. And what that means is wow a new model popped up like O3 MIDI And I want to test all my agents on that to see what are the ones that perform better So we offer ways for you to basically with two clicks kind of like now you run other agents with a new model you can see how they perform, you can change that, you can choose and pick and choose the ones that you want to use that versus not. So that makes it easy for you to basically constantly interrate on this agent. So the TLDR would be kind of that. There's a lot that goes into all that, and then there's a bunch of things in metrics, a bunch of things in the iteration, but there's even like other features that comes into that as well. What about memory? I feel like that's like a big topic right now with agents. Do you help with memory though? Yeah, for sure. I think like memory, when people ask me about memory or reg and things like that, yes, all those, they must exist and your agents work better with that. I'm just finding that all that is becoming table stakes very quickly. Like, yeah, you gotta have it. No questions asked. Like there's many different ways that you can build them. I would say it's easier to build them when you're using Crew because you integrate it with any other vector database if you want to. You can integrate it with a data source that you want to. So that makes it extra easy. But yeah, I think memory, we do have all the agents in Crew AI open source in an enterprise. They have short-term memory, long-term memory, and entity memory. and then there's a fourth type that we added recently that is kind of what I call, I think, user memory and that is basically memory that can preload in your agent. So you can give them like a preload set of like documents and PDFs and things like that that goes into their memory from the get-go and gets stored in there. But all the short and the long and the entity memory that gets populated autonomously as they're doing their work. So how does short-term memory, long-term memory work differently? Oh, yeah. So, short-term memory is doing the execution. You have, like, a few agents working together. So, this memory is kind of like a sandbox where these agents can share information with each other. They can always delegate specific work to each other if you enable that. But they have kind of like this common place where they put, like, some of their learnings and some of the things that they're doing that these agents that are working together can tap into. But that gets resetted and never went. So you do a run and then boom, everything that's cleaned up and a new run happens. Now, the long-term memory is where they store learnings that are coming from multiple executions. So because in crew, we force you to say what is the expected output that you want to have from each task, we basically have something to compare against. So we can actually get the real output that you got and compare with what you expected to get. and you can see what were the things that the agents did right or wrong and autonomously created rules for them to follow in the future. Saying like, hey, you didn't got this thing right. So let me create like now a validation for that. And that just makes sure that over many executions your agents got better and better. And do you have ways of monitoring automatically when the agents run for a long time to make sure things haven't gone haywire? Yeah, you can not only monitor, but you can set specific kind of like hard stops. You can say like, hey, you can only run for like 60 seconds. And then what happens is as it gets to that 60 seconds, we do a final call saying like, hey, all right, you're done. Your best answer right now. And that either kind of like goes through or kind of like it might blow up and fail. Like, all right, you just have to not go over. So there you go. And now what do you see in terms of like open source versus closed source adoption? Like which is more popular? or do you have like specific recommendations? If I came to you and said, I don't care, I just want the best model, like where would you guide me? You're talking about open source models? Models, yeah. Well, I gotta say, one, I'm a huge believer on open source. I have been living and briefing open source for many years. I had quite a few projects throughout the years and I'm a strong believer that in the end of the day, open source is gonna probably win. Whatever weenie means here, right? I don't think meaning is like, is not necessarily closed models, right? Not going to exist anymore. I think it's just, they're going to be more accessible and easier for people to run those things at a scale. Now, the models that are most used are closed source models nowadays. Those are the most common, both in our open source, in our enterprise products. But we are seeing things take up on the open source. There's a lot of people that are running agents locally using things like Oyama. It's very usual for us to see people doing that as well. And it's more common in enterprises when you're talking about highly regulated industries. So, for example, finance and insurances, they usually want to self-host their models because they want to run the whole thing in kind of like an airtight container, right? So no information is going out just because they need to be very mindful about their data. Have you started to see R1 taking off in your metrics? I've got to say, I'm getting a lot of mixed signals. Yes, there are people using it, even in highly regulated industries, what I think would be a hard no-no. But I think it's interesting to see some of these companies actually starting to play with that and see how that goes. Now, if that really is going to take off in the U.S. specifically, I'm not sure. I'm not sure. I think it's going to depend. Now, what I'm grateful for, our one, is just the idea that people now, it's almost like they believe that open source can go even further than what I think most people thought. So that's already inspiring a bunch of people and trying new things. And I guess in no time, we're going to start to see a new influx of models, especially cognitive models coming up from open source initiatives. So I'm very curious to see how that's going to play out. But yeah, even the distillations that we're seeing now with crossbreeding are one of the other models, I'm finding them to be very interesting. How would you describe the boundary today of what agents can do and what they can't do? Hmm, that's a good one. I think in the end of the day, I think there's more that agents can do than most people would think of, but it's not enough for people to be worried about losing their jobs or anything like that. I don't think we're close to that just yet. But there's a lot that these agents can do. I think where, if you take one step back, Like the value that companies or people in companies get from these agents, it's basically a math formula in my mind. Whereas how complex is what these agents are trying to do times how much autonomy they have on doing it. So if you can automate your most important process in your company and they can do it completely autonomously, that is a lot of value. Right. And that's why you see a lot of people aiming for kind of like co-development, because that's a very crucial process in the company. And if these agents can do it autonomously, that adds a lot of value. I think we're not on that top quartile just yet of like those most amazing processes with like no handholding. And I don't think people are ready to have no handholding in like the better half of the processes so far. Now, what we are seeing is agents that work with humans, and that seems to be taking more crash, like more, that seems to be like basically speeding up and getting more adoption, where we basically see where before you had four people doing something, you now have one person that is overseeing agents, and the other three people got reallocated to do more interesting stuff because they're doing kind of like busy work. So I think right now it's going to be a lot of efficient gangs. so repetitive tasks that went out to me. I think we're going to see an influx of the agents going into decision-making probably in the next couple years. And that's where I think we're not going just yet, and I think it's going to get very interesting once that we do. But now, what problems are hard for agents? Because if you just were dropped down on planet Earth right now, you'd be looking at this like, wow, these agents can do you know, kind of Putnam-level math problems. They can, like, pass the LSAT. Like, you know, if I had, like, an employee that could do that, I'd be like, wow, you know, you could probably, you know, do a lot of the things around here. Like, where do you see them break down? Yeah, I think, like, they're, again, what we're seeing these models, like, really excel, especially cognition models, is on the math problems is because they have a hard, right answer, right? So they can do reinforcement learning based on that, and that's why you're seeing a lot of these models to cease on that. But the real world, especially in the business, is way more nuanced. So yes, in the end of the day, you can say, well, that was a right choice. But at the point in time where you're making a decision is way more nuanced, where there's an element of kind of like thinking of bad or kind of like deciding what you believe is going to be the best. So I think these agents, they're getting very good due to some of the techniques of reinforcement learning to answer problems that have tough, hard, yes or no, or a specific number kind of like questions, I think they're not as well versed on nuance just yet. I think you can use some of that, but I think it's harder for people to believe it, that this was exactly the right choice, right? So I think that's kind of like where we're seeing the gap now. Even though if you had the human that was amazing, it's kind of like doing those math problems, you would assume that that would translate into kind of like logical capabilities that would lead to better decision-making and nuanced situations. But that's not how this model works. They work better in that situation because they have that reinforced learning that optimize them for that. But that doesn't necessarily mean that they are better on these nuanced situations just yet. And do you see that changing with like 01 and 03? Like, do those feel like big improvements in that regard or not yet? I think not yet. I think there are amazing models, right? There are amazing models, definitely very capable. But especially for agents, we're not seeing they take off that much. At the end of the day, I think, like, the same engineers' first principles apply here yet. And that is, what is the minimum that I can get away with, right? And that kind of helps you to make sure they're optimizing for speed and cost and all those things. So I think a lot of people are still running agents on, like in case of OpenAI, like 4.0 Mini, for example. And if that'll get you what you need, they might be better off in a bigger model that will be more expensive and a little more slow. Now, I want to talk a little bit about your architecture. I think you sort of visibly took Langchain kind of out of your stack. Can you talk about why you did that? Yeah, sure. I mean, there's a few different things, right? I think on the early days when we started to create AI, I think using Langchain was a good way for us to get access to a bunch of tools from the get-go. And because those tools would help a lot of those people kind of like take some of those actions, I think that was kind of like, all right, let's do Langchain here because that will help us with some of that. Now, as we start to grow and we start to build more and more logic in these agents, it started to get very hard to keep things in sync. And some of the decisions were starting to diverge and how we were taking some of the decisions on the framework and how they were changing some of the code on their side. So it got to a point in time where we were overriding like 90% of the code that we were importing and that was one specific piece, the class that we were importing. But we were overriding so much of that that every time they need to update a version, we need to go to crazy rebates. So I was like, all right, this is not a fly. We need to cut this off. This is not working And I think it was the best decision that we ever made because it helped us with basically growing to other things that were constrained before Because we had to overwrite certain methods in a certain way but now we can do way more So yeah, that's one of the more technical reasons why we decided to go that route as well. But there are also commercial reasons that helped us kind of like make that decision. And some customers that were having issues with that dependency specifically. So how that kind of like ended up contributing to the decision. Are there integrations that are important to you that you have that you're definitely going to keep or integrations that you want to add over time? Yeah, I think the overall tools ecosystem is something that the more you connect to all these other things, the better. So, for example, you still can use any link chain tool with query AI, but now we don't depend on that. You can also use any limit index tool with query AI, and we don't depend on that. And then you also have the Cray-AI tools that you can use itself. And you have amazing companies out there like Composio that has a lot of tools. I think they have like over 300 tools that you can also use with Cray-AI. So I think we're finding a lot of value that these kind of dependencies really help. So we don't need to rebuild things over and over, kind of like you build it once. So I think that's something that we're not going to change anytime soon. If anything, I want to go deeper on some of those integrations. what um what ai tools do you personally use day to day i'm uh i'm a i don't know if it's a hot take but a lot of people are like on the windsurf versus cursor kind of like uh kind of moment i'm a cursor guy i love all right sir me too it works so well for me i try windsurf and i think i think honestly this is not a gig on windsurf i think they're doing amazing work. But the thing is, I was doing, I'm using cursor so much that I just got the hang out of it. I know how that thing ticks. And I think like moving to something else was just like, just make my life a little harder. Like, all right, I'm not ready for that. But a lot of chat GPT, Entropic, like the Stonet models have been in and out, at least on the UI. It has been working on the, I have been using a lot of the coding side of things, but other than the coding, the actual models interface, I don't, I don't use it that much anymore, but a lot of chat GPT, a lot of Cursor, a lot of Cure AI. So you use Cursor with Sonnet? We use Cursor with Sonnet. I'm using some O3 mini now as well, and I do find a few use cases where I like it, but Sonnet's just like, I don't know what black magic they're doing over there, but that thing works. Do you have Is there any lesser-known features of cursor that has made your life easier lately? I think, honestly, I use a lot of what people use. The one thing that I start using that I think most people don't know is that you can add custom cursor rules. And I think people don't realize that, and you can do that per project. So you can create a file that I think is called .cursor rules, and you can throw strings in there to dictate custom rules to preload into Courser given the project. So for example, if you're doing a Python project, you can add a bunch of like Python rules in there. They're going to be embedded into your prompts. So that has been something that I didn't know about and that I started using and now like everyone in the teams does and I really like and I didn't know about it. Cool. Any other tools that you love right now? Let me think about tools that I love on the AI. I got to say, I tried perplexity for a while. I was not a huge fan. I'm now starting to use more of the deep search stuff. I'm finding that to be pretty good. I think OpenAI did a good work in there. But I think that would be the bulk of it. I use it some stuff. Oh, no, I use some stuff for video editing as well. There's a tool called Descript. Amazing video ad. I mean, honestly, the best video editor ever. Cool. I think this might be a hot day for people that are really into video editing. But for me, I'm just trying to produce content. I want to make sure that things look good. It's easy to cut, easy to caption and everything. The script is just like Chapskis. Nice. All right. How do you respond or think about concerns about AI replacing human jobs? I gotta say I think in the end of the day um what I tell people when people ask me about it is that we don't like to assume that but we have way less control over our futures than we like to believe it like honestly you don't know what's gonna happen tomorrow like no one knows everyone's kind of like going through this the first time so what I would say is you got a couple options once that you get to that. Either you can choose to focus on the things that are in your control, or you can be sorry about the things that are not. So if that's not something that is in your control and it's just not, well, then what is? And in my mind, the choice that people that are worried about that are gonna make is like, well, maybe I should learn how to use these tools. And again, that might not be enough, or that might, but at the end of the day, I would rather be someone that knows how to use them than not, just because I think that statistically that would put you in a better position. So what I try to reframe the conversation in there is like, hey, maybe this is the way that I should be thinking about it. Do you think that technical skills will help with building agents, or do you think over time, agent building will completely move to a no-code type of mode? I think long-term, no-code is going to become more and more popular and probably get a lot of the market share. That would be my take now. Just because it's not that people are going to get lazy or anything. It's just that there's a lot more people that don't know how to code than people that do. And eventually, if you want this to be running on every company out there, you're going to need to support that super well. Now, I do think that coding will still allow more custom ability, and that's what is going to lock most of the high impact use cases in the end of the day. Similar to coding, if you pick a way. Can you build a website with no code? Yes, you absolutely can. Is there amazing templates that are going to make that super good? Yes, there are. But if you want to do something that is super custom, do you need an engineer? Yeah, you do. What do you think about your product? Do you think it'll ultimately be aimed more at a no-code audience? I mean, we do have no-code tools now that allow people to build in. I think we're going to keep investing on that for sure. Just because I think there's a broader audience that we're trying to serve that goes beyond regular engineers. And not only that, I'm finding myself using a lot of the Crew AI studio just because that makes me faster, similar to like Cursor and whatnot. So I want to say that, yeah, we're probably going to keep investing on NodeCode as well. Interesting. Well, what applications have worked well for you inside your own company? Well, in my own company, the thing that we use the most is Crew Studio. So Crew Studio is basically, we took a very different approach than most NodeCode out there. I have an opinionated view that I don't love Node UIs. I know that's kind of like what a lot of people kind of like use for kind of like a no code. I find that it makes for beautiful screenshots, but if you're dropped out of nowhere in a page with 30 nodes, it's impossible for someone that doesn't have like a technical knowledge of it and understanding what is going on with ease. Now, what we decided to do to kind of like offset that, and you can go into app.create.com and create a fee account and test it out, is we have what I call a gradually evolving UI. So you start with a chat interface, similar to custom GPTs back in the day. So you start to chatting your way into what you're trying to build. And that is kind of like building that mental model in your mind. Once that you're ready, there's a button that pop off that you can click that's kind of like, all right, generate me a crew plan. And once that you click on that, the UI evolves into a tabular view where you can see our agents, you can see our tasks, you can still change them, but now it's more structured. It's not like plain text. And you can still keep chatting if you want to, but once that you get there, you get a new button to generate the crew. And that takes you to a node view. So by the time that you get to the node view, you kind of like, you have a full understanding of how you got there and what you're trying to build, it's easier for you to customize it that way. So that's kind of like our approach to it. And then from there, you can download the code if you want to. But for you personally, inside your company, what's working? What are you actually using Cruise to accomplish? Oh, sure. So we're using a lot of things. We're generating marketing content automated. We are doing a lot of meeting prep. So agents that are automatically researching people, companies, everything along the way. Anything that is related to PR reviews, we have crews reviewing every single per request across the company, open source and closed source. That has been pretty good. Agents are doing a bunch of support. So that is also working pretty well, writing responses and all that. Agents that are picking off basically recording and transcripts of meetings and following up with next actions, presentations and such. That is also pretty good. And one that I love is one that happens with onboarding. So as these customers onboard the platform and they put their names and emails, we kick off agents that research them, but they don't stop there. Based on the research, these agents infer what they might use agents for. And then they get those ideas and push them to two different places. One is our CRM. So now every marketing engagement that they get is super custom and what do we believe that they might want to build. and we also inject that into the product itself. So the product self-customize based on our agentic hypothesis of what agents they will build. And then it just works and they're left wondering like, how this app knows that I'm trying to go that way? And that's kind of like some of how the sausage is made. Interesting. Do you have any predictions for 2025? Any new applications that you think will become possible with better LLMs or better tools? I think we're going to see some fine-tuning coming back. I think people have not been talking much about it. I think we're going to start to see some fine-tuning coming back, especially because smaller models, I think they're going to start to trend again. Some of the things coming out of the R1 is like, hey, can we reuse this to make small models amazing and what that would look like? I think that's going to be a trend this year where we're going to have smaller models that are very, very capable, like better than like 70 billion parameters models nowadays. And they're going to be able to fine tune those with ease to use them to run agents. All right, well, Joe, thank you so much. This is a real pleasure to talk to you. Thank you so much for having me. It was a lot of fun. I really appreciate it, Lucas. And this was great. Awesome. Thanks so much for listening to this episode of Gradient Descent. Please stay tuned for future episodes. Thank you.

Share on X Share on LinkedIn