Back to Podcasts
The Cognitive Revolution

The Customer Service Revolution: Building Fin, with Eoghan McCabe & Fergal Reid of Intercom

The Cognitive Revolution

Sunday, October 5, 20251h 34m
The Customer Service Revolution: Building Fin, with Eoghan McCabe & Fergal Reid of Intercom

The Customer Service Revolution: Building Fin, with Eoghan McCabe & Fergal Reid of Intercom

The Cognitive Revolution

0:001:34:04

Episode Description

Today Eoghan McCabe and Fergal Reid of Intercom join The Cognitive Revolution to discuss building their AI customer service agent Fin, exploring how they achieved a 65% resolution rate through rigorous optimization and custom model training rather than relying on base model improvements, while pioneering outcome-based pricing at $0.99 per resolution. Shownotes brought to you by Notion AI Meeting Notes - try one month for free at: https://notion.com/lp/nathan Sponsors: Linear: Linear is the system for modern product development. Nearly every AI company you've heard of is using Linear to build products. Get 6 months of Linear Business for free at: https://linear.app/tcr AGNTCY: AGNTCY is dropping code, specs, and services. Visit AGNTCY.org. Visit Outshift Internet of Agents Claude: Claude is the AI collaborator that understands your entire workflow and thinks with you to tackle complex problems like coding and business strategy. Sign up and get 50% off your first 3 months of Claude Pro at https://claude.ai/tcr Shopify: Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive PRODUCED BY: https://aipodcast.ing CHAPTERS: (00:00) About the Episode (03:43) Keeping Up With AI (09:56) Evaluating Models and Evals (13:04) Incumbents vs. Startups (18:54) Product Risk and Judgment (Part 1) (19:00) Sponsors: Linear | AGNTCY (21:34) Product Risk and Judgment (Part 2) (23:42) The Klarna Layoff Story (Part 1) (32:11) Sponsors: Claude | Shopify (36:13) The Klarna Layoff Story (Part 2) (36:14) Driving Resolution Rate (45:00) Intelligence Isn't the Bottleneck (50:10) Closing the Automation Gap (56:20) Human vs. AI Accuracy (01:01:03) The Nuance of Speed (01:04:48) Considering Paradigm Changes (01:09:31) Outcome-Based Pricing Model (01:19:12) Casual Hacking and Insights (01:26:05) AI Adoption and Ambition (01:36:00) Outro

Full Transcript

Hello, and welcome back to The Cognitive Revolution. Today, I'm excited to share my conversation with Owen McCabe and Fergal Reid, CEO and Chief AI Officer at Intercom, makers of FIN, the AI customer service agent that's been a market leader since its launch some two and a half years ago. Regular listeners will know that Intercom has recently been a sponsor of the podcast, so it's worth noting that this episode was not part of that sponsorship deal. On the contrary, because I've been an intercom customer for years at Waymark and also noticed that leading AI companies and past guests Anthropic, Gamma, and Lovable all have testimonials on the FIN website, I wanted to understand what's really working and what remains a challenge for a company that's been among the most successful at creating practical business value with large language models. And I'm glad to say that this conversation really delivers. With a diverse customer base of more than 400,000 businesses and Intercom's ability to measure successful resolution rate differences as small as a tenth of a percentage point, FIN is one of the most intensively tested large language model applications in the market today. And as you'll hear, Owen and Fergal are both remarkably candid, both about what they've learned and about what they still don't know. One perhaps surprising finding that stood out to me, especially considering how much the AI discourse tends to focus on new model releases and frontier capabilities, was Fergal's assessment that intelligence is no longer the limiting factor for customer service automation. On the contrary, he says that GPT-4 was already intelligent enough for the vast majority of customer service work, and that model improvements have only contributed a few of the more than 30 percentage point increase in resolution rate that the fin team has delivered since launch. The vast majority of gains have actually come from better context engineering, which they've achieved through many rounds of careful optimization, of retrieval, re-ranking, prompting, and workflow design. Of course, we cover a lot more than that, including the fact that most customer service teams are currently underwater, which means that for now at least, FIN is allowing companies to support more customers and beginning to affect their hiring plans, but generally not yet leading to layoffs. How Intercom thinks about the importance of speed and how they balance the desire to be first to market with the critical need to maintain customers' confidence. The culture of awareness, engagement, and constant experimentation that's allowed them to deliver that 1% improvement month after month for 30 months in a row. The intricate workflows that power Fin and why Intercom is now training custom models for some tasks, including a custom re-ranker. How Intercom dogfoods Fin and why their resolution rate, while above their customer average, is actually still quite a bit lower than top performers. Virgo's observation that no matter how sophisticated your offline evals, the messiness of real human interaction means there's no substitute for large-scale A-B tests in production. How the 99 cents per resolution pricing model, which they pioneered, while initially unprofitable, has created strong alignment between Intercom and their customers and has become profitable thanks to improved success rates and lower inference costs. The 2x productivity goal that Intercom's CTO has set for their technology teams in light of AI coding assistance. And finally, how their vision is now expanding from service agents to what Owen calls customer agents that can work across the entire customer lifecycle, including sales and onboarding. Bottom line, if you're building AI products, you'll find in this conversation a bunch of valuable insights from a team that has brought real rigor and sustained discipline to the challenge of making large language models work reliably for businesses and their customers at scale. This is Owen McCabe and Fergal Reid of Intercom. Owen McCabe and Fergal Reid, CEO and Chief AI Officer at Intercom, makers of FIN. Welcome to the Cognitive Revolution. Thank you. Thank you. So I'm excited for this conversation. My company, Waymark, has been a customer of Intercom for years. And so I've been following what you guys have been doing with AI with interest, both intellectual and applied over the last couple of years. And you've done some really interesting stuff and been in some ways really innovative leaders in the market. So excited to dig into all of that with you. I thought first question, just because AI is moving so fast and obviously you guys are running and sitting in the leadership position of an 1100 person organization that's distributed across all the time zones of the world. How are you going about keeping up with AI? What is it that you're doing, you know, mix of hands on sources, whatever, to stay current and make sure that you know where we are in the in the development of the technology? Yeah, I mean, I'm sure Fergal and I will each have different answers. For me, I rely on Fergal and Dez and others. Dez is one of my co-founders and runs all of R&D and many people who are very close to the action. You kind of have to pick your battles a little. when you have those trusted, longstanding relationships that I have with Dez and with Fergal, they know what I need to know, and I know I can trust what they say. And so that's a big, big part of it for me. I actually got really disinterested in tech news many years ago. I found it to be super boring and very, very samey. So I don't follow all the fundraising announcements and who the latest hottest companies are. And so, yeah, that's kind of what works for me. I myself have, I guess, a technical training, if you like to call it that. I studied computer science, but I never practiced. And so, I can pick up these things pretty damn fast. And funny Funny enough, I graduated college in 2006. I studied AI in 2004. Machine learning and AI was the thing I specialized in. Truth be told, they did not anticipate this moment, but it primed me a little bit. But maybe Fergal can tell you how he learns the things he teaches me. Yeah, absolutely. Just to say, look, we've been in this game for a while. had AI products in production, I think shortly after I joined, I remember working with Owen on the first brief of what later became Resolution Bot. And so we have those working relationships and we have a certain organizational or institutional knowledge in the company for a long time of like dealing with and communicating with these things. And so that kind of gives us an ability to communicate rapidly inside when something new happens or changes. It's is just that shared context and vocabulary and state in the leadership org. I would say that in terms of the external environment, it's really difficult. It's really a full-time job to kind of keep track of everything that goes on out there. And there's so much hype, there's so much rubbish, and you almost have to test and verify everything yourself. So anytime there's a new benchmark or some new model, it sounds amazing. And you're like, wow, this is really good if it bears out. But we now have to go through a period of like testing and analysis ourselves to be kind of sure that there's something really interesting here. So it's busy. Of course, there's, you know, there's like Twitter or X. There's like papers. Just, you know, you kind of learn who the top labs are and people doing the top work in this space over time. But yeah, it's busy. It's hard. I don't know anyone in this space who isn't struggling to just keep track of the biggest developments. I would add one thing that Fergal did well is that we have a 50 plus person team under Fergal that we call them the AI group. And there's real scientists and researchers amongst that team. And part of what he has done well is create space for experimentation. Whenever there are new models and new technologies, it's not long before someone on the team has hacked together a new version of FAN that experiments with it. So there's fundamental research as part of that work rather than keeping up with what's happening on the side. I think that's important. I completely agree. We have a very experimental and scientific mindset, which enables us to kind of quickly integrate information. So if something changes in the external world, we get to validate that. And, you know, it's pretty common where it's like, hey, some new model has dropped. And, you know, the next day or five hours later, we have like the results of it on back tests. And then we can kind of integrate that and like think, do we change anything here? And is this promising enough? So, yeah, we definitely have that pipeline and that kind of that setup of people to kind of to do that. We've invested in that and it's paid off numerous times. Yeah, that's cool. You mentioned it's a full time job. It's funny you say that because I've basically found a way to make it my full-time job just to generally keep up with what's going on with AI. Have you actually hired somebody with that specific job description to basically say, like, your job is to go track external developments and synthesize them, report them to the team? Or is that kind of a distributed responsibility across the team? It's definitely a distributed responsibility. and I would say it is a core competence of the AI group and it is a core competence of any AI org in today's world. Otherwise, you're just going to get outdated. You're going to fall behind. It's a fast game. You have to move fast. So, no, we distribute that throughout the group. It's not any one person's job, but it is absolutely a core competence of the org. Yeah, gotcha. What does that eval stack look like when you say, you know, you can take a new model, plug it in, backtest it within a day or so. You know, and you could break that down across a lot of different dimensions. Like, we always welcome shout-outs to vendor, you know, companies that you particularly like, frameworks, also just conceptual stuff like to what degree do you trust LLM as judge paradigms, you know, to help you accelerate the evaluation. um you know a lot of companies there's been this argument recently that we don't need evals we can just do it all on vibes but it sounds like you're not uh not on that side of that argument i i mean evals is a tricky thing i would say we're a little skeptical about evals and you know six months ago it was this sort of meme going around like every product team needs to specialize in evals and be great at evals and like people started bigging up evals to be just like amazing thing that will solve all your problems and you should always be skeptical of that so you know um i'd say a number of things i'd say thing number one is yeah we're pretty good at like backtesting we've always had a good backtesting framework for fin for our core things that we're interested in from an nlm you know does it hallucinate much uh how often does it answer questions that you know we we have we have a setup where you know given sort of a rag style framework and with like known good answers to particular questions, hey, does a particular LLM that we evaluate, does it do a good job at finding the right content? So I would say like we have pretty mature back tests, but we also have this kind of battle tested wisdom of like always test in production, always test with an A-B test at scale in production. And we can move quite fast to that posture with a new piece of technology. And the reason we've done that is because we have seen so many times, hey, there's something that's new and it's very exciting and it does well on our bank tests. But then in production, it actually underperforms. Just the real world of humans and the real messiness of human communication is like so messy that you can't build a perfect eval for it. You can build like something that's good enough to tell you to signal here, but you have to test in production. And like we are quite sensitive to changes of like a tenth of a percentage point in resolution rate is something that we care about and we sweat. And so we have to test in production in order to really see those things with like massive at scale, massively overpowered A-B tests. And we really consider that to be an edge. We consider that to be something that we can do that like maybe smaller competitors or competitors with smaller numbers of customers just can't. because we have this very large deployment of fin with many very large diverse apps. And so we lean into that. And anytime we've kind of gotten to that, you know, get too far into the evals, you get too scientific, you've got to just test the production. Don't fall in love with your offline style. Yeah, I want to come back to that notion of scale that you mentioned as an edge because that ties into some macro theses that I'm developing and maybe want to test against your worldview. Before going there, though, how much pressure do you feel in terms of, you know, from customers, from, I don't know, other stakeholders to be really quick to market with kind of qualitatively new AI capabilities? I think one thesis that I've had is that there's probably enough time for incumbents to implement the best new stuff before a brand new startup will spin up and rebuild everything and eat your lunch. And we've got quite a – people are testing all parts of that spectrum, right, with Apple kind of bringing up the very rear of being clearly last to market, but they can hope, I guess, still maybe to be best one day. Where do you guys try to be in terms of the bleeding edge of capability, somewhere in the middle, not last but later but best? Like, what's your sort of philosophy of that? And what do customers demand or expect from you? I think we feel pressure from anyone in this space who would want to do the job that we want to do. The larger the incumbent, the more they have these incumbency advantages where they have prior customer relationships. and some degree of lock-in. Apple has got the ultimate lock-in. And I've been pissed off with Apple for years now, not just on the AI stuff, but every third new interaction on my iPhone has a bug. I mean, at present, if I try to type in a contact's name in iMessage, it just doesn't return a list of names. and I'm not going to leave anytime soon anytime soon because I use the entire Apple ecosystem and if I thought that that insanely annoying bug in iMessage slowed down my day wait until I try to integrate Android and all the Google products with my iMac and other Apple things they'd have to push it and I would eventually switch at some point but it's not about to happen soon We don't have that same level of lock-in. And you could compare us with someone like Salesforce. So the bigger guys obviously have substantially more lock-in. Salesforce, for example, they have large customers. Those customers move slower. Salesforce have multi-year contracts. And they have a broad platform and so many other things that work with it. with it. And they also have multiple products that they sell and people become like a sales for a shop. And if you kind of remove one thing, it makes the whole commercial contract less, you know, attractive, et cetera. So they have a lot going for them. So they do have more time, but what happens is that there's a certain momentum in categories. And once a company kind of breaks out, particularly with the early adopters, it's very hard for the incumbents to kind of show up anew and kind of take away that sheen. Once a new disruptor has broken out and they've established a brand, they've got momentum with their technology, it's untypical for the incumbent to just completely unseat them. It's worth looking at the Dropbox story. And there's probably nuance in it too. So if you recall, many years ago, this is coming from the guy who just told you he doesn't follow tech news, but this is back when I guess I did. Famously, Steve Jobs offered to acquire Dropbox and Drew Houston said no. And my understanding at the time was that Steve Jobs said, you know, we're going to build this. And Drew still said no. And it took them many years, but they did build it. And now iCloud exists and it actually works. It works just fine. But Dropbox still exists. I'm not saying that Dropbox are totally killing it. I don't know if it's their fault, though. It could be the whole category. But the ways in which we reduce all of these narratives or concepts down to like winning or losing, catching or falling behind is too blunt. And so if I was to kind of summarize it all, I would say that like the bigger the The bigger the incumbent, the more established they are, the more time they have. Salesforce and others will probably catch up. But I think that the newcomers, once they actually catch some momentum, they will have established themselves very early in the market. And then some of them totally run away with it. The incumbents can never catch up. And I would say that it's like kind of a meme to say this, and I've been saying it a lot, but I do think that this time is different in that it's not just, hey, let's build ABC instead of XYZ. It's more like building AI requires a fundamentally different culture, set of talents. You just heard Virgil talk about the fact that keeping up with the changes and using a highly qualitative and scientific approach was of fundamental importance. The older companies need way more of a fundamental reset to catch up than just deciding to build something else. So I would be relatively bearish on any income and catching up where there's now some significant momentum in any of the AI categories. Yeah, interesting. Hey, we'll continue our interview in a moment after a word from our sponsors. Build the future of multi-agent software with Agency, A-G-N-T-C-Y. Now an open source Linux foundation project, Agency is building the Internet of Agents, a collaboration layer where AI agents can discover, connect, and work across any framework. All the pieces engineers need to deploy multi-agent systems now belong to everyone who builds on agency, including robust identity and access management that ensures every agent is authenticated and trusted before interacting. Agency also provides open, standardized tools for agent discovery, seamless protocols for agent-to-agent communication, and modular components for scalable workflows. Collaborate with developers from Cisco, Dell Technologies, Google Cloud, Oracle, Red Hat, and 75 more supporting companies to build next-gen AI infrastructure together. Agency is dropping code, specs, and services, no strings attached. Visit agency.org to contribute. That's A-G-N-T-C-Y dot O-R-G. So how does that lead you to make product decisions when there's sort of a cool but immature perhaps new feature or extension of the product that you could offer versus holding off until maybe the next generation of model gets a little bit better? How do you think about how much risk to take in terms of things not necessarily working as well as you might dream, but being there versus the risk of somebody else beating you to be the first to deliver that? I think in previous years, we've had a very strong product inclination. In our previous years, we would just try and build all the sexy things first. but now we know that while we're super product oriented and we're very careful to make sure that in our category that we've been leading which is you know service agents or kind of ai customer agents um we wanted to make sure that we we'd always be the company that people could trust on to get the new innovation we've realized that if a smaller company no one has heard of or even a heard only a little about builds a sexy shiny feature before us, it's going to be okay. So for example, in our space, we have these agents that can talk on the phone and by text and chat and email, and they're incredibly effective at that. I mean, it's surprising and shocking. An obvious thing to do that will be really sexy and cool and that a big part of me would like to build before anyone else is 3D avatars and little talking heads in your app, but not just a chat bubble, but an actual little digital AI person. It'd be fucking cool. It's just so exciting to me as a builder. But does it really move the needle today? Not really. People are still absorbing the other stuff. Might someone else build it first? It probably will. Will that hurt our business? Not at all. So a part, there's a lot of art and judgment to this and knowing what the customer wants and needs and where you're at. Yeah. Um, I'm just business results. The, I look back, it's been 18 months since Klarna made a bunch of headlines by saying, you know, that they're going in all, all in on AI and, you know, look at these amazing results. Um, again, this is February 2024, right? They said that they had cut 700 full-time jobs. They were a little cagey about that. I think they said that nobody lost their job because they were actually working in an outsource firm that was able to reallocate those people, but they were employing 700 fewer. They cite millions of conversations, improvements in all the metrics, faster resolution, higher resolution rate, et cetera, et cetera. Then that became a big debate. Like, is that real? Are they really good at this when other people aren't? Is there some steak oil there that were being peddled ahead of an IPO or whatever? How would you look back on the last year and a half or so and tell the story of what the real impact has been? How consistent is that car in a story, even if it was leading relative to the broad customer base that you guys serve? How does the impact that your customers are seeing line up against that story? Yeah, I think Fergal will have some fun things to say about this, but I'll first start by saying the things that Fergal won't say. Like, I never, you know, shit on other technology companies. It's really hard. And I don't know Klarna or the Klarna people at all. So I don't know anything about their story. But as a man who likes marketing moments and opportunities to tell brave stories, you know that looked like a good one And you know they say it takes one to know one And that looked like a bit more of a show than a reality. Like, I for one, and maybe I'm completely wrong. And I'm totally open to being called out. I for one don't believe didn't believe that story because I didn't see anyone else do it, to your question. And I think I heard that they backtracked on that. I could be wrong, but I think they announced that they backtracked on that and they hired people or they didn't fire people, I don't know. But that's about Klarna. The fascinating thing about this moment in time is that, and this has happened in all disruptive technologies, is that, at least in our category, but I expect this to weigh in every category, that the new economics and the accessibility and ease of deployment means that before it replaces a bunch of humans, it actually increases the supply for the things the humans were doing that one could never afford to deliver in the past. And so, for example, in our category, service agents, agents, people are deploying their service agents to their free customers that they never gave support to before. They're responding to their customers more quickly, which means that the customers will ask more questions and they'll get more service. They'll put it on email addresses, or maybe they'll deploy a chat feature that they never had before. And so they're just net doing more service, which means that their customers are more satisfied, more effective at doing the things that they want their customers to do. et cetera. I will say that for us as a prime customer of Fin, we deploy it in all the ways. We certainly have dramatically slowed the rate at which we hire service agents and haven't substantially grown our team since we launched Fin over two years ago. And so I think that that's where the first human disruption is going to be, where it's going to eat the supply, the future supply that was going to come from new headcount. But I'll finish by saying this, which is that we have this metric we look at resolution rate. It's the percentage of customer queries that Fin can resolve according to the customer. And that resolution rate has been increasing on average by 1% every single month. We did a couple two percent months in the last few months. And it's currently, I think, in the low to mid-60s. I'm looking for Fergal's nod, which means that it's some years before it gets to even like the high 80s or 90s. And that assumes that people deploy it to all of the places where they get requests and they don't typically do that. And then the other dynamic is that even as the the resolution rate increases and we do a higher percentage of work, each additional point is slightly harder work. All the easy work was done a long time ago. And so even if it takes 10 months to add 10 resolution rate points, that's probably not 10 points of work. And so even if in a number of years, when we're in the high 90s of resolution rate, it still might be 20 or 30 or 40% of the work left to humans. And so I'm trying to lay on the idea that even when it starts to disrupt the work and doesn't just serve on demand, there'll still be substantial human work required. And I think that that's something that people in AI and in this space and have been commenting on it have got wrong where they've flipped the future really fast. Imagine this brave new, if not scary world. And I've just not realized that all technology adoption takes time and that disruptive technology serves on demand and that it takes a phenomenally long time to kill the disrupted categories. Like if text disrupted email, if email disrupted letters, if letters disrupted fax, guess what? Fax still exists. So that's a very, very, very long way to say that. I don't see these big layoff things happening and I don't see them about to happen. Yeah, if I may be coming on that as well, like definitely agree with what Owen said there. And like absolutely, Finn is resolving like a really meaningful, sizable chunk of the overall volume of people's businesses. Most support teams are like underwater. Like I haven't met a support team that isn't underwater by like 30 percent versus the capacity they wish they had. And so when you come along on day one, you resolve 30, 50 percent of their queries, maybe 30 percent of their workload. You just take them from being like underwater to being like roughly a party. And then, you know, this is always the good news story. This is the way we used to hope it would play out before it happened. And we're like, oh, we kind of hope it doesn't really, you know, have massive job losses. And we hope people move up the value chain. But that's what we see. People move up the value chain. Now, there's one major exception to that, which I would say is BPOs. If there are customers that have outsourced their like tier one or their frontline customer support to BPOs, they do very frequently deploy fin and instantly get rid of the bpo but most of the time the internal team like pivots and goes up the stack and just isn't replaced you know we saw that internally in intercom i think we we had like our support team needed to grow by 25 percent the first year of fin that was the projected head count and it never happened it just stayed static the first year we deployed fin and then fin has continued to grow and improve its resolution right so yeah it's complicated i do also wonder if there's like an economic downturn or something like that like i think the support teams of the world are starting to realize that this is valuable and it's real but i think the cfos haven't kind of cracked the whip yet and maybe if there's like a downturn or something like that you'll see then you'll see impacts then you'll see a much harder conversation about operational efficiency post ai but right now it's a good news story and we don't know the future Hey, we'll continue our interview in a moment after a word from our sponsors. plays a critical role in the production of this podcast, saving me hours per week by writing the first draft of my intro essays. For every episode, I give Claude 50 previous intro essays, plus the transcript of the current episode, and ask it to draft a new intro essay following the pattern in my examples. Claude does a uniquely good job at writing in my style. No other model from any other company has come close. And while I do usually edit its output, I did recently read one essay exactly as Claude had drafted it, and as I suspected, nobody really seemed to mind. When it comes to coding and agentic use cases, Claude frequently tops leaderboards and has consistently been the default model choice in both coding and email assistant products, including our past guests, Replit and Shortwave. And meanwhile, of course, Claude Code continues to take the world by storm. Anthropic has delivered this elite level of performance while also pioneering safety techniques like constitutional alignment and investing heavily in mechanistic interpretability techniques like sparse auto encoders, both internally and as an investor in our past guest, Goodfire. By any measure, they are one of the few live players shaping the international AI landscape today. Ready to tackle bigger problems? Sign up for Claude today and get 50% off Claude Pro, which includes access to Claude Code when you use my link, claude.ai slash TCR. That's claude.ai slash TCR right now for 50% off your first three months of Claude Pro that includes access to all of the features mentioned in today's episode. Once more, that's claude.ai slash TCR. Being an entrepreneur, I can say from personal experience, can be an intimidating and at times lonely experience. There are so many jobs to be done and often nobody to turn to when things go wrong. That's just one of many reasons that founders absolutely must choose their technology platforms carefully. Pick the right one, and the technology can play important roles for you. Pick the wrong one, and you might find yourself fighting fires alone. In the e-commerce space, of course, there's never been a better platform than Shopify. Shopify is the commerce platform behind millions of businesses around the world and 10% of all e-commerce in the United States, from household names like Mattel and Gymshark to brands just getting started. With hundreds of ready-to-use templates, Shopify helps you build a beautiful online store to match your brand's style, just as if you had your own design studio. With helpful AI tools that write product descriptions, page headlines, and even enhance your product photography, it's like you have your own content team. And with the ability to easily create email and social media campaigns, you can reach your customers wherever they're scrolling or strolling, just as if you had a full marketing department behind you. Best yet, Shopify is your commerce expert with world-class expertise in everything from managing inventory to international shipping to processing returns and beyond. If you're ready to sell, you're ready for Shopify. Turn your big business idea into cha-ching with Shopify on your side. Sign up for your $1 per month trial and start selling today at shopify.com slash cognitive. Visit shopify.com slash cognitive. Once more, that's shopify.com slash cognitive. Yeah, I have done two ad reads for Intercom over the last half many months. I don't know if you guys are even aware that you've been sponsors of the show, but thank you whether you were or weren't. In that time, I think it's gone from 56% advertised resolution rate to 65%. And I guess I'd like to dig in a little bit on, first of all, how does that published resolution rate compare to your own resolution rate as Intercom? Second, what has driven that? You said it's 1% a month, and obviously there's a lot of work and testing and whatever. But in terms of the tailwinds that have made that possible, what are they? And then what do you think is kind of most likely to come next that will take you from 65 to 75 over whatever the next however many months? I'll just answer the very first part of that question, Fergal, and then would love you to jump in. I think we're in the high 60s for Intercom. And that's pretty damn good because Intercom is this sprawling product. like so many features, frankly, too many. So the fact that it's actually able to provide that level of coverage is incredible. And we have deployed it in all the places. So a really large chunk of our customer service is now done by Fin. And we certainly are one of the biggest deployments. But we actually have customers in the high 80s and in 90s. So if you've got a kind of narrower set of questions that people might ask, it can get very high. Over to you, Fergal. Yeah, absolutely. And then in terms of the actual, you know, the process by which we get that resolution rate up, like it's a lot of work. And, you know, it's really weird how there's sort of a scaling law or Moore's law like phenomenon where it has really consistently improved at about one percentage point month on month. And we have this thing internally where so often we're like, okay, well, we have three more things to try over the next cycle, over the next six weeks. I'm not too confident. And about like two of them work and one of them doesn't or one of them works and two of them don't. And we get this net increase of about a percentage point month on month. We have this like ever growing machine in the AI group always trying and testing more things. And that's roughly what it nets out at. And like, you know, that is this constant process of like optimization, trying to like, let's refine the retrieval model. Let's go and work on the re-ranker. Let's go and change the prompts. You have so much stuff to do there that kind of slowly gets it up. Only a very small amount of it. You mentioned tailwinds. Only very small amount of it has been like core LLM performance. that's been like a couple of percentage points over the last two years as the overall performance has gone from about 35 percentage points at launch to about 65 now and so like so much of it is just this testing and optimization process we do and you know we have the data on that we have the a b tests and then we also have the sort of the cohorted view we can see that like recent customer cohorts get a similarly high resolution rate to customers that have been with us for a long time There's always variance within those cohorts. Some customers have a very mature setup where they've done a lot of optimization work with all these product features to help you optimize your help center content. But overall, most of it is just improvements in the core engine of Fin that we have invested very, very heavily into. So the only kind of tailwinds would be the space as a whole where there's more and more AI that's available. And sort of the future is really, you know, for us, we've done a lot of work recently on investing in our custom AI models. We're sort of training our own models in-house for the first time. We've trained a custom retrieval model, a custom re-ranker model. We're very pleased that our custom re-ranker model, it beat out, you know, one of Cohere's top models, which we previously had used. And, yeah, we're really happy with that. So we really think that's the future of our investment in VIN, is kind of taking all the data we have from all these different customers and using them to help Finn learn and make Finn better overall. And that's been working for us. It's been delivering resolution. And we're really excited about continuing to invest in that. Do you think that will continue going forward? Because that's, I would say, a quite different story from the most common one that I hear and, frankly, that I tell, which is that, to take my own company, Waymark, for example, We help small businesses, increasingly bigger and bigger businesses, because the quality of what we're able to deliver is improving quite fast as well. But it started with local brick and mortar, very small niche online retail. We help them create video content. And you look back at what we were doing three years ago when we first brought an AI video generation thing to market versus today. It's dramatically improved. I would say a lot of credit goes to the team. Certainly, like we've tried a lot of things, figured a lot of things out. But I would say probably two thirds, if not more, goes to the fact that the models themselves have just got so dramatically better during that time. Everything has gotten easier, you know, just the quality, even when it comes to something like generating a voiceover, you know, it went from like robotic to still mostly robotic to now like very expressive. Right. And we had to figure out how to prop that and how to adopt it and how to get things to sync up time wise. but there were fundamental advances that were just like, wow, you know, we're just the giants on whose shoulders we're standing are getting taller, you know, at a rapid rate. So, yeah, I don't know. How do you see you're telling a quite different story where that seems to be less of the lift for you? How do you think about that? And do you think that will continue into the future? Yeah, it's definitely a different story for us. And there's probably a few different reasons for that. one reason for that is we built the first version of Fin on GPT-4 and we were very early to that we had advanced access to GPT-4 for a couple of months and it really took us across the threshold that we wanted for accuracy and quality but it involved using this big beast of a model we used to have to run it on dedicated hardware and we had to really architect around it quite carefully so I would say we have gotten some improvements due to improvements at the model there in terms of our architecture. Our architecture is a little bit simpler and it can do like more powerful things. But yeah, in terms of the actual resolution rate, yeah, it's only a few percentage points that have been just due to like pure model improvements and it's almost all to do with the like testing and iteration around it. But like, you know, that said, fin is a different architecture as well. the architecture has changed as we've gone so yeah so it is it's a complicated story but uh we'll be we'll be highly confident in that almost all of the improvement is in like you know what you might call the rag layer or the ai layer outside the core models and core models have definitely gotten better and that's been great for us it's been great for everybody but uh you know that that is that has improved like to reduce the cost a bit and it's improved reliability a little bit but it It hasn't driven core resolution rate that much, but it has given us a platform to be able to go and, like, you know, provide more flexibility to our customers. We had a feature called Guidance. Guidance is where customers can go and they can, like, you know, make Finn more personalized to their brand. So, as they say, talk in this tone of voice. And, like, features like that would have been harder to achieve with the models and the architecture a year or two ago. So yeah, the reality is multifaceted and it's complicated. But I would say if you were to look at the focusing on resolution rate, which is the core metric that we care about, that we build for, that our customers care about, I would say the story is a bit simpler and the story is less about the underlying models. I think video would be very different than that. GPT-4 is a really powerful model. You know, GPT-4, which we had two years ago, is a very, very powerful model. It was a very big model. It was a beast of a model to run. and so yeah we could be the talk through the whole history gpd4 turbo which a lot of you've forgotten about was like much smaller and much more efficient um but like very similar in terms of power into gpd4 and yeah and then you know moving to sonnet uh we got like a couple of percentage point improvement which we care about a lot but against the backdrop of the sort of like 35% to 65% that we've had over the two years. And the actual models is a relatively small part of that. Yeah, that's fascinating. So would it be accurate to say that intelligence is not the bottleneck? And would that also suggest that you are able to use smaller, faster, cheaper models? Like this might suggest that you are a haiku user today or a 2.5 flash as opposed to needing to maximize intelligence. It sounds like you're maybe more emphasizing the cost. Yeah. If you look at like core fin, so again, it's complicated. But if you look at like core fin, there is a sense in which customer service is a task that is less cognitively demanding than like, you know, the maths Olympiad or something like that. And so a given level of model intelligence will saturate that task earlier. And so, yes, I would say that a large part of customer service probably was saturated by models of GPT-4 level intelligence. In which case, you spend a certain amount of time optimizing that. And we have went and we have trained our own small models to do parts of FIN. So for example, one part of fin was summarization. We summarize the end user query before we go and do ragad. And that was worth doing because sometimes end users, they put a lot of random stuff in their query and you don't want to pass all of that to your like your search pipeline and your embeddings. So you do a canonicalization or a summarization piece first. That's always been very valuable for us. We used to use initially GPT 3.5 turbo to do that. And over time we use different models. We used GPT 4.1. We used Haiku for a while. Didn't have great experience with Haiku. Recently, we've switched that. That model is now a combination of a proprietary encoder-decoder model that we have trained ourselves, and then a fine-tuned version of Quen3 that we have fine-tuned to be really excellent at that summarization task that has made it cheaper and lower latency and more predictable and more reliable and higher quality than we were able to get from third-party LLMs. So yeah, so absolutely, we are taking some tasks for which model intelligence is efficient to saturate the task, and we're placing them with small models. They happen to be small models we've trained ourselves. Smaller third-party models have never really given us the exact trade-offs that we want. They become less steerable, and there's all these different complicated set of trade-offs. But yeah, that's absolutely what we've been doing. We're very excited about that. And that's a big investment we made recently in FIN. Then there are other parts of FIN that are real frontier challenges. So, you know, a bit part of that is like tasks, right, or procedures, where FIN is like interacting with external systems. We have a very high reliability bar to hit there. And yeah, for that, we need frontier models. We use Anthropics, like excellent sonnet models for that sort of task at the moment. And yeah, it's great. or like the hardest part of our answering prompts, we're still using third-party models as well. So it's nuanced. Our product is pretty big at this stage. FIN was always 10, 15 prompts, and then we had a different architecture for email. Now we have FIN voice, which is a whole different story. So really, we have this big cloud of AI services at this point, and so you end up in a nuanced discussion of each one. But yeah, speaking about core FIN, that narrative I had is correct. Cool. That's excellent. And just the kind of thing people tune into this podcast for. I always hear that people, you know, they want these little nuggets of understanding. It's a pretty AI obsessed audience. So, and I think this is like a very timely and sort of thematically relevant conversation too, because there's this broad question around like, well, what's missing from AI that's going to take it to the next step? Or if we imagine a drop knowledge worker of the future that it does seem like all the frontier companies are racing to create what will that have that the current things don have And if we can take just raw intelligence mostly off the table if you're at high 60s today and you've got 30 percentage points-ish to go to fully automate your own customer service, and it's not intelligence itself that's going to close those 30-point gaps, what are the other gaps? how would you taxonomize, you know, that which is missing? And it may not all be on the AI side too. I mean, I guess with you guys, I would assume it would be more with your customers. I could imagine you might say, well, they got to give us more information. They got to actually use the features. But I assume you guys are using all the features you have, you know, pretty well to the fullest. So what is missing that you think will need to come online to get you, you know, climbing the rest of that 30%? Look, I would say that like, there are definitely intelligence bottlenecks still. So like the core task of giving this article, answer the questions from this article, as well as a human customer support rep, we're kind of there. You know, the models are like really, really good at that stuff. They're intelligent. They're human competitive at constrained tasks like that. However, the more complex tasks of like given the whole bunch of external systems and a very vague, fuzzly defined policy about when to do a refund and when not to do a refund. and needing to do that reliably with common sense, not 99 times out of 100, but 100 times out of 100. That's still a frontier task. That's still a task where the whole ecosystem is leveling up. And so, you know, a bit like how with informational queries, we have to go and use RAG and we have to tune them quite well to get to the performance we needed. It's going to be the right way to do that, the right way to attack that is going to be to use the models as a building block and then to go and tune the envelope in which the model works to be able to give the right performance. So we need to do that. That's thing number one. And then thing number two is this huge task of actually deploying these things, all the human factors, convincing the security team, making sure it's secure enough, penetrating the organization. And this cuts back to what Owen says earlier. People in the discourse around AI, it's very easy to fall in love with some metric on a back test and to not see all those messy human factors of deployment and adoption and penetration. And, you know, the more valuable something is, the faster it will penetrate. That's absolutely true. Amazing breakthrough technology penetrates fast, but it's still a process. And so that process needs to roll out. So that's going to take some time, yes. Yeah, I would also add that the more systems a product needs to touch, the slower it will penetrate the organization. And so in some senses, the informational queries for service were nicely isolated. And so it's easy to pick that up, switch it on. But this future imagined knowledge worker that has to collaborate with many different teams and individuals and use different systems and pay attention to permissions and talk to external stakeholders too that just sounds like it's a lot way harder adoption so the the first thing that comes to mind when you ask that question is kind of fergo's answer at a higher level which is that like the the the recent trajectory of the actual foundational models tells me intuitively that it's not going to be the base models themselves that just show up someday and they're good to go, but rather there will be companies that deploy them to specific use cases and build, as Fergal calls it, the envelope around that to do this work highly effectively. I mean, if someone built the sophistication of system that Fergal and the AI group at Finn have built for a range of different use cases, it would probably be ready today. I don't think that the technology itself is actually holding people back. It's the raw, hard work needed to point these things in the right direction and help them be effective and all of the work around the R&D to help companies adopt them. That is the difference between where we're at today and companies having, you know, real AI knowledge workers. I completely agree with that. And I would also say that, like, you know, I do think there's something maybe missing from some parts of the discourse. You know, when people talk about, like, the models are going to get so good, they're a country of geniuses in the data center, that sort of thing. You know, like certainly there needs to be changes to the model capabilities to really achieve that. And so like, you know, the models are missing a whole bunch of key capabilities today. They're missing memory, right? They typically run in a stateless fashion. And yeah, you can go in, you can put memory outside and that kind of works, but it kind of doesn't. They're also, they're missing like what someone might call a system tree. They're missing the ability to like grind on a problem and to like learn about that problem. Right. So if you have like an intern, never mind, you know, some PhD level, you know, genius or whatever. If you just have like an intern out of school and you go and give them a task, they'll do pretty bad at the task the first time in a way that like probably Sonnet won't. Sonnet will give you consistent performance. but like the intern can learn and they can you know you can really learn in a task specific way over time and like the models they just can't do that there are research prototypes where you know they're trained with reinforcement learning to go and like update their weights when they hit a certain point in a maths problem after doing a lot of chain of thought or something that's all like none of that's deployed yet and so you know i think krauser just did something interesting with like a live reinforcement learning system. I think people are going to be looking at that. But even, you know, that stuff is, you know, it's risky. And so we'll see. There are direct fundamental capabilities that are still missing from the overall intelligence layer. We'll need that. In the meantime, companies like us will have to, like, build around that. And even after we have those, as Owen says, you know, there's going to be a ton of work in building the envelope to make the system actually work for a business, you know? if you had some like crazy genius you wouldn't want to give them bring them to your company and let them just do everything you know you still want to train them you know sometimes if they're they're a real genius they need a lot of training you know there is a value if you want to have someone come and do your customer support do you want someone who has like 10 years of experience doing great customer service or do you want a crazy genius like it's not obvious that you want crazy genius to do your customer service you know so yes there's a lot to do here Yeah, I think that's really interesting. On the refund point that you mentioned specifically, you said not 99 times out of 100, but 100 out of 100. I wonder if that is – do you endorse that way of thinking for customers? Because one thing I've noticed when it comes to self-driving, for example, is Waymo is way safer than a human-driven taxi. And it's like almost 90%, maybe not quite, but like getting to sort of 90%-ish reduction in accidents and in injuries. And yet we're just starting to deploy. So I wonder how often you see that essentially happening with your customers where, to quote Biden, who used to say, don't compare me to the almighty, compare me to the alternative. How often do customers just fail to realize how accurate or inaccurate their humans are, how consistent or inconsistent they are amongst themselves? Again, at Waymark, to take a personal example, we had tried to evaluate aesthetically the image assets that our small business users would bring to the platform so we could make intelligent recommendations. We found that it was really hard to even establish agreement among people. Directionally, yes, there'd be correlation, but to say that everybody would rank them the same, far from it. I always say you could tell the top end of the curve and the bottom end of the curve, but in the middle, you wouldn't necessarily even know which way it was up or down. So all that to say, is there a way in which measurably, if people measured, the AIs are as good or better than what is happening today, but people, for one cognitive bias or motivated reasoning or another, don't want to recognize that or are prepared to take the risk on the AI that they're already running with their humans? I mean, I can definitely give probably a couple of conflicting answers to that because I think we all internally spend a lot of time thinking about this. Look, in one way is like, yeah, some customers have a nuanced understanding that their humans are imperfect. And some of them even will have like an error rate that they know their CS reps will do incorrect refunds for. And then that's the bar. So that's one answer. But another answer is that like, if I was a PM at Waymo, I wouldn't be just trying to equal the human accident rate. I'd be trying to exceed that by two orders of magnitude because that's what's required to really build a product that will penetrate very fast through the market. And you can spend all your time obsessed that people are holding you to too high of a bar. But for better or worse, people do hold new products to a higher bar. And I think, you know, we've always kind of engineered for it. We've always wanted to give not just good enough or not just human competitive error rates. We wanted to give the best error rates we possibly can, the lowest error rates we possibly can. And, you know, the understanding changes over time. And then it's different for some customers. We have some customers for whom they're like, I'm a regulated industry. If, you know, a human makes a mistake, I can talk to the regulator about that. The regulator will understand it. But I'm worried that if the system makes a mistake, the regulator won't understand it. We have customers who are in that boat and, you know, regulates is also new to regulators. And then we have customers who are in the boat where it's like, nope, I can make the judgment call. I know it's superhuman. I believe you. I've done the trial. Let's go for it. And so, you know, like the adoption of running new technology gets contacts. And Owen, I'm sure you have, we talked about this a good bit as well. Yeah, totally. I think, you know, initially, as Fergal said, the expectation will just be remarkably high. There's just a great degree of kind of fear and skepticism. And so the lens, the microscope is really on every single interaction. It's almost expected that these new disruptive technologies just won't be as good. And so they need to oversell. But I think as soon as people start to realize the ways in which it's vastly superior to humans, maybe not all the times, but quite often, they'll give it a lot more permission for error. And Waymo has been in the market and with consumers a lot longer than the technologies that we are building. You know, people are not actively talking about the incredible customer service experiences that they've had. Often that doesn't even register that it's AI. And you'll see even with Waymo that people endearingly will forgive its mistakes. Oh, like stop behind this car that was pulled over and I got a little angry at it. And then it kind of like turned, you know, pulled past it. And it's endearing. It's funny. Right. So I do think that just at large, we as a society will warm to the eccentricities of AI and be quite forgiven because we just know how effective it can be. And in customer service, for example, it turns out that people really hate having to reach out to human customer service. They know they're typically going to have to wait days, maybe hours, but probably days. they're going to reach someone who doesn't really want to do their job. They're probably going to get a crappy half answer that doesn't really answer the question. The whole thing is just really unpleasant. But when you have a snappy, happy, expert, concierge agent, apparently very keen to hear from you and willing and ready to answer in seconds, they're going to ask a lot more questions. And so a lot of it is just us as a society building more meaningful relationships with these things. yeah speed is really killer in preparing for this i took a look and it hasn't been my department for a while but i took a look into our intercom data and you know for context we have always really invested in customer service and you've got great people doing it and they actually do you know show up with a smile and interact with customers in a really authentic thick way And our customers have always really appreciated us for that and specifically called out individuals on our team an awful lot. And yet, one thing that it is impossible for the humans to do at the level that the AIs can do is the speed of response. So we typically respond in two minutes, which for a relatively small company with a small team is pretty good. And yet it's just enough time for the person to tab away and do something else. And then when they get our response, they tab back five minutes later. And the next thing you know, we're like 30 plus minutes to get to resolution, even with everybody being attentive and doing a good job. And the AI's ability to just be there immediately is like, that's a pretty dominant advantage that we're not going to catch up to anytime soon. It's true. I'll tell you a couple of funny stories about that, though. if it's too fast people don't think it really is going to give it the right answer that it's probably like a pre-baked crappy automated macro answer so if it's too fast people don't quite trust it um and if it's too slow but still way faster than a human let's just say 20 seconds people will be pissed off because it's like, hey, this is AI. And the reality of where the technology is today is that if you actually gave it a couple of minutes as opposed to require that it responds in seconds, which I have pressured my team a lot to make sure that it does, it can actually do a better job if you give it a little bit more time. We've seen that with the latest models that are really just the same foundational technology applied at runtime to just have the permission to think a little longer. So the response time thing is actually super nuanced, and it also relates to people's understanding and expectations of AI. The one thing I will say, though, is that despite, in some senses, my pessimism about how quickly the base models are going to improve, I do think that they will get just far faster and cheaper. And so as we build all these technologies together, we will be able to probably do the runtime stuff just quicker. We'll be able to provide that level of response just in the amount of time that people expect. how do you think about in the if you kind of zoom out from where the product is today and the you know particular architectural decisions that you've made and all of the sort of history how do you know when to this is like in some ways the most human thing still relative to what ai is is good at how do you know when to be willing to consider changing the paradigm you mentioned like voice which is obviously a you know quite different paradigm than a chat. And I think another big trend that I'm sure you're thinking about a lot is what I call the choose-your-own-adventure style of agent, as opposed to the highly structured, scaffolded, optimized, controlled input-output, more like a workflow agent. It seems pretty clear that you started with a very methodically task decomposition-based approach to getting every little step in the chain working well. And it seems like Anthropic and other are now kind of trying to push like, well, you don't need to do that anymore. Hopefully we're going to make this thing so good that you just kind of give it some tools and it'll choose the right tool. It'll kind of figure it out. How would you, you know, are you doing like radical A-B testing where you have these kind of two different paradigms in competition with each other? And how do you think about, you know, if and when it would be time to make such a big paradigm shift? Yeah. I'll say one quick thing, which is that we're going to share at our pioneer event in New York, which is our big customer event, which what date is it, Fergal? October 8th. October 8th. We're going to share basically what we'd be working on and what kind of comes next. And part of that is going to be a paradigm change, because I think that there is a big opportunity for stepping back a little bit and thinking more broadly about the problems that we're trying to solve. and I think that the kind of answer bot, Q&A bot thing is fine, but it's still quite limited. So I don't really want to say more about it than that, but these things will need to become more agentic and there's big opportunities for that. But, you know, Fergal can probably talk to some of the A-B testing that we do do, but I don't think it tends to be very radical. There's probably a spectrum from radical to micro-optimizations and I know that we do stuff in the middle at the very least. Fergal? Yeah, I'm really excited for that announcement at Pioneer. Look, to your question specifically at a technical level, yeah, it's a very interesting time for architecture. I can tell you with high confidence that, you know, so firstly let me say that your synopsis of Fin is correct. Fin originally very deliberately was this sort of like, or still is, this building block style architecture where you use the models in as building blocks and then you carefully isolate them and you carefully test them and optimize them. That's worked really well for us and we've invested in training our own custom models for some of those building blocks recently that have been like more performant and you know way better than some of the third-party models we're using. So that's true and some of those building blocks are definitely durable. So in a rag system you need a retrieval engine, you need a re-ranker, you need a few other pieces like that that are definitely going to be durable. But I do think there is an interesting architecture switch or push on the cards in future and Antropoconers have been pushing it. I can tell you that we're pretty confident that switching to that today would reduce the quality of fin at kind of the question answering task versus where it currently is. And so like, we're pretty happy that, you know, for the highest quality that we are, or we need to be at the moment, but we have built prototypes there. And yeah, if we get to the point where we have another generation architecture that is like that form you're talking about, yes, we would A-B test that in the end. That is ultimately how we would get a sort of full spectrum analysis of exactly its trade-offs and its strengths and weaknesses compared to our current architecture. And that's how we've done architecture migrations in the past. Internally, we're on probably about generation four of core fin at the moment and each architecture change, we've done something like that. We may be tested in production and scale and some architecture shifts have been comparative in size to sort of what you're talking about there. So yeah, we'll see. But as Owen says, we will have things to share at Pioneer that are quite relevant to this from a product perspective. Cool. Well, we'll stay tuned for the next big update there. When it comes to the impact that Finn has had on your business, as far as I know, I think it was the first pay-per-outcome pricing scheme that I remember seeing with the, at least in my corner of the world, famous 99 cents per resolution pricing model. When we first scheduled this, I was like, I wonder if they still have that. And sure enough, you do. How is that going? Like, if you could go back and do that all again, would you do that the same way? Would you price it at the same point? Um, and cause obviously like a lot has changed under the hood, right? Like we've got per token prices are way down, but amount of tokens put in for, you know, context is at least in many use cases way up. Plus you've got thinking, you know, tokens, which means outputs are way up. So I don't really know how to guess the cost basis has evolved other than just to say there's clearly like strong forces pushing in either direction. but having anchored at that price point, you know, and probably being pretty reluctant to move it, you know, you've kind of got to manage against it, I would assume. So what can you tell me about what's going on under the hood? And then, you know, what would you say you've learned and maybe, you know, any advice you would give to people that are thinking about a per outcome pricing model? So, yeah, exactly as you said, you know, core LLM costs have definitely been falling. Cost per token has definitely been falling i guess the converse thing for us is that like fin is doing a lot more work now than it used to two years ago when we launched it right so it's like resolving a much higher percentage of your inbound volume but also it's resolving harder questions and so yeah we have to spend more token budget on that for putting more and more things into our rag system and then there's all these others kind of supporting ai systems around fin so we have things like our insights product we have things like you know ways we kind of double check and check like the content that goes into fin our chunking strategy is way more complex and you know and we've just you know we've added other pieces to kind of you know really make fin better over time And so, yeah, so I do think, yeah, absolutely, that the core, you know, cost per token has come down a lot. But Finn is doing a lot more work than it used to as we're trying to, like, push up the resolution rate more and more. And, like, we're constantly struggling. Like, our big Nordstar, the thing we really care about is, like, you know, what percentage of your inbound volume are we actually resolving under Owen's direction? We're investing a lot in, like, making the product better and better and better. And that's the thing we're pushing towards rather than optimizing purity for cost. So yeah, that's kind of how I would say it from the applied level. I'll tell you a little bit about our story and thinking, but first I want to set us apart from some of the narrative in the market where people are talking about negative gross margins. When we launched FAN, I was told it was going to cost $1.21 per resolution. And we decided that, for reasons I'll speak to in a moment, that we must be more aggressive. And we just really like the attractiveness of 99 cent. And so we actually took a hit on each resolution at the start but pretty quickly then turned that dynamic positive and then achieved software level gross margins And so any narrative that these AI products and particularly the agent products need to be negative or even crappy gross margins is clearly not always true because we shown that to be the case, even while there has been pressure because we're doing more work. Um, philosophically, we wanted simple pricing and we wanted pricing that would map to value. We learned a lot of hard lessons from having complex pricing in the past because we used to do a lot of things for different types of customers. And so we had many aspects to our pricing models that track different metrics and it just became a nightmare for people to track and follow. So we really have been simple to a fault and probably far simpler than we need to be. There will come a point where certain classes of resolutions that just save their reps an hour or multiple hours of work, where we will charge more for those. They'll still be super cheap. I have no idea what it will be. Is it $1.99? Is it $2.99? Is it $3.99? Is it $4.99? I don't know. But we'll always make sure that it is easy to translate from a value perspective so that it's always a no-brainer for customers. Our own work showed that we spend $26 per resolution all in. So that's salaries and everything like offices and benefits and all this stuff. And we've come across companies that say that they are as low as $5. We've come across some companies that say that they're lower. we don't think that they're doing the analysis correctly and so we're highly confident just 99 cent is just a great deal and what we really really like about it uh not not only that it's simple or a great deal is that it also aligns our incentives and so if we charged some sort of synthetic token or for every conversation it would actually not be in our interest to make fin more effective at solving customer problems. If we charge for every query we got, we're good. We're golden. We don't need to get better. Whereas we charge per resolution. Every time Fergal's team increases that by one percentage point every month, we make our customers way happier and we make our CFO way happier. So it's this beautiful alignment. The biggest cost to this outcome-based pricing is that it's novel. And so there's a bit of education. We were certainly the first in the market. My favorite part of that story is that two other companies also announced that they were first in the market. There was Zendesk, and I think Sierra did the same, although I need to double-click on Sierra, but they made a big deal about outcome-based pricing. But we were certainly the crazy first people to do that. And we did experience an education cost, but when we went back and surveyed customers, and try to find out if they would prefer per conversation cost, which some of our competitors do, they said no. And so we think that outcome-based pricing makes sense for all those reasons, and I can't see us about to change it. And certainly historically, when you read some of the kind of academic approaches to pricing, there's a great old book I read years ago called Pricing on Purpose. That analysis showed that value-based pricing always yielded a higher profit than cost-based pricing. Even though cost-based pricing has a baked-in profit, the problem is that it doesn't properly price discriminate. And people who get more value from your product than others will pay the exact same. But when it's value-based and people say, okay, yes, I pay more for doing more work, they're happy for it to do so because it perfectly maps the things that are important to them. Yeah, that makes sense to me. We're trying to move toward value-based pricing as much as – and outcome-based pricing as much as possible as well. The one thing I would say is that it's not just an academic problem. So you can come up with all these beautiful theories about how this is simpler and maps to value. But if the education and the friction that comes from that is too much, it might totally fail in the market. And so you need a degree of kind of artistic license with your pricing where it's not just science. There's real art. And you need to be willing to kind of leave money on the table in a bunch of cases. For example, we have this really deep insights product, the best in the market. helps you understand exactly what's happening in your business using this modern AI, these modern AI technologies. But to this date, we've decided to not charge for it because we know it makes people better using our product and it helps them get a higher resolution rate out of it. And so that's an example where the science has been ignored at Intercom for the sake of simplicity and the art in pricing, which is making something easy to comprehend. yeah well it sounds like you feel like you nailed it and it certainly has beautiful simplicity to it so i um i can see why it's working i i we don't have too much time left i did um in my uh prep for this i did two things which kind of converged on the same um outcome for waymark as a business one was i went to the uh dev platform docs and if i could make a customer request, it would be to create an LLMS.txt if you haven't in the last six weeks since I did this. But what I did at the time was sent, I guess it was chat GPT operator. I don't know if it introduced agent at exactly that moment, sent it against the docs and said, visit all the pages on this docs website, put them all into a Google doc for me. I think I got to have many hundred pages as it kind of went back and forth copying and pasting. Then I took that, put it into Gemini and said, create a consolidated, just what I need to know, cut all the crap that's repeated from this. And it did that and gave me basically an LLM's TXT. And then I went to Claude and said, code up this data export for me so I can do an analysis outside the product. And Claude, by the way, one shot at that. And so I got my data exported into a CSV and that was pretty cool. And then I went into the product and I was like, oh, I mean, I kind of actually already created this with these background insights product type experiences. So I guess the main question, two questions there would be, are you seeing people doing a lot more casual hacking against the APIs? And also then your own proprietary product work. How have you told a little bit about it in terms of just making people better? But I was kind of surprised to see that there was so much happening in the background that I wasn't even really aware of until I went looking for it. So I'm kind of interested in the product philosophy there to the degree that there's more that you haven't already said. I don't know if our API usage has increased. I would say that you're describing a pretty sophisticated story there. You're not an outlier, but you're a sophisticated user of AI in our customer base when you tell that story. And so it might just be a little early yet. Absolutely, internally within Intercom, we're really pushing ourselves to be early users of all these products. I remember being really blown away by Cloud Code recently when I kind of saw it first. And, you know, we definitely tried to up-level our own team to be good at those things. We've really built a big insights product. And there's an awful lot of AI in the insights product in order to deliver it. And so I guess I'll say two things on that. The first thing I'll say is, you know, we built that with sort of, you know, custom or semi-custom AI rather than doing something like just expose everything to, you know, a deep research style environment. We did that because we needed to do that. They hit the quality bar, the performance, the efficiency. We have some customers that have very high conversation volumes. And, you know, all the models, they have context lens and the context lens get high, but the fidelity decreases when you try and use those very high context lens. And so in our opinion, the way we built that was still the right way to build it, which was very much like we use LLMs to process the conversations into kind of like topics and subtopics. And then we use sort of a BERT style, you know, medium sized language model to deliver it at scale. And we think that's still the right way to build an insights product today. If you are a medium or large customer, you can't just throw everything into a big context window. Like we've experimented with that, but the quality of output degrades. And you can end up with a product that looks nice superficially, but isn't really reliable. So we think that today to get something really, a really industrial strength insights product, you still got to go the way we went. Now you're kind of like, oh, there was a big insights product there. I hadn't really heard about. And yeah, candidly, it is difficult to market the depth of technology. We have this very broad product. It's very deep. And, you know, we, like everybody else in this space, are struggling to communicate against all, you know, you mentioned earlier, keeping track of AI. There's just so much noise in this space. And, you know, we're still struggling to find the right way to tell customers, look, we've built a deep product here. And clearly, you can see that and you can recognize the depth there and um and a lot of folks uh count and so we're still working we're still up leveling our product marketing game all the time in this new world of ai to get great at like telling people what it is and i'm just really relieved that we've managed to improve our marketing of like complex queries and procedures recently we had deep technology there for like a long time that was just you know people weren't ready for or we weren't doing a great job at explaining it. But I think, you know, I think insights is a, is a deep product as well. Yeah. Oh, and I'm sure you have, you have thoughts on that. No, only to say that I agree with you that, um, depth of usage is just a problem amongst all software vendors. You know, it's, it's customers discovering, um, the full things they've poured blood, sweat and tears into. It's just a problem. And it's not, you know, if anyone who imagines like some better onboarding pop-ups is a solution are completely wrong. I mean, yeah, just the amount of just attention to go around is just at a bare minimum. There's just so much contention for that attention. It's just a giant problem. So this is where some of these kind of high touch, you know, more enterprise oriented businesses are being effective now because the vendor is stuck and stitched into the organization. The organization has committed very large amounts of money to use a product and they get the benefit of their attention and an opportunity to train them on the whole thing. But if you're kind of more in the kind of like upper end and mid market bottom end of enterprise where we are at, having everyone discover your features is hard. That said, you also want some degree of progressive discovery. You can't overwhelm people. And most people do find this functionality as you finally did. And doing so brings a level of connection and kind of commitment to that product. that is invaluable. And it'll be very hard for you now to go and move to one of our competitors because they don't have this damn inside's product. So it's not a solved problem. And trust me, it's something that causes me a lot of heartache because you got great people like Fergal killing himself to build stuff. And then it's like, well, don't look at the data, but only 26% of people actually use that feature. Yeah, that was, I mean, OpenAI just said that I think it was only 7% of people were using reasoning models before GPT-5. So you're in good company there. Let's maybe zoom out from all the AI product work. And I really appreciate you getting so deep into the weeds with me on that. And just think about the rest of the company. You mentioned you've kind of slowed hiring on human agents and FIT is doing a lot more of the work. Is there a similar impact when it comes to hiring junior developers? And how are you handling the adoption across functions? You know, is there a sort of everyone must use AI kind of mandate? Is there, you know, a budget that people are kind of goes, you know, go forth and spend it on whatever tools you think are interesting and cool? Or have you picked, you know, something that, you know, is sort of the official stack? Just really interested in how you're thinking about all that. And also, I guess, if I could tack on a 1B, does this all mean that you will be potentially even more ambitious in terms of expanding into sort of adjacent niches? Like one thing I do wonder about is whether companies with a certain platform will see, you know, maybe in the past I had to be focused, but now I can kind of layer on, you know, or append these adjacencies and actually feel like I have a good chance of making it work because I get all this productivity benefit from AI. So there's a lot there. You can take it in any direction you like. But a number of things. Okay, I have a number of things to say to cover that. I think we can probably speak to maybe developer efficiency, and there's certain things we're doing there. First thing I'll say is that I'm actually between two minds on the topic of AI usage in organizations. On one part, I think that companies that don't adopt AI broadly will become the dinosaurs of this industry. And I believe that young companies that are just AI native from the get-go have great advantages and a lot of efficiency and can move more dynamically and maybe even be more creative. That said, there's just so much AI out there, overfunded, overhyped. There's like 10 products in each category. And I don't trust a company of 1,200 people with a big budget to properly discern what's net valuable or not, if they're told, use more AI. You can be guaranteed they'll buy a bunch of crap. And I've already seen us adopt some big AI platforms with big fancy names, and I'm like, did we need this? And probably we did, and maybe I'm like the old guy, and I'm like too grumpy and jaded and cynical, cynical, but I think we're going to, I'm quite certain that in a couple of years, we'll have a more nuanced view of what AI was outstanding for and what was kind of a bit of a pipe dream. So that's the broad piece. I'll speak to the idea of going into more areas and spaces. So this is part of what we're going to start to talk about soon, but But it's just patently obvious to us that people are not going to have multiple agents talking to their customers. It makes no sense. You're going to need a coordinated approach to solving customer problems. If you have a sales agent with one set of goals and a service agent with another set of goals and an onboarding agent with another set of goals, they're going to be competing against each other. And if they come from different vendors, they'll have different styles, different approaches. They may even have different interfaces. How can you track goals across all those agents and effectiveness and really coordinate and orchestrate them? It's not going to happen. It's not going to happen. So it's quite obvious that for FIN, which is the leading service agent by customer account revenue performance, we win all of our head to heads with our direct competitors. and in our benchmarks, it's quite obvious that we need to also be the leading customer agent. And so we'll work through the entire customer lifecycle and make sure that not only do people not have to split between different agents, but we can finally realize our dream of this beautiful high-touch concierge experience for every single customer where they have the attention and personal treatment that we've all dreamt of giving our customers. It's not actually possible. So that's the way in which very holistically and in a very qualitative way, these agents are going to be so much better than humans. You're talking, and we spoke earlier today about how they're faster, but that's actually quite quantitative. But you're going to see pretty soon when we really properly push the fact that Finn is a customer agent and many people are doing things with it today and we have it deployed in some other use cases, which we'll talk about at our event. And you're going to see people then really start to get excited about the experiences that they're having. You know, incredible levels of attention and personalization. So that's the thing that we're really focused on. When it comes to engineering efficiency, what I'll ask Virgil to speak to, if he has any take on this, is the fact that our CTO, Derek Curran, did decide some number of quarters ago that we were going to actually 2x our engineering output and our efficiency. And so I thought that that was quite cool. I actually didn't have a good read on how doable or not that is. some people on the outside of the company said, oh, that's easy. But I actually think that's probably a pretty lofty goal. Have you any take on that, Fergal, and how far we are from getting there? Yeah. I mean, it's pretty funny. Earlier, you talked about just mandating overall improvements and people are going to go and buy crazy tools. And look, the 2x thing, I definitely had mixed feelings on because I think we certainly put it in the perf system in some form. And that can always be a dangerous thing to do. You know, people will start optimizing for it in the wrong way. But on the other hand, it's like there's a change in the world and like people are resistant to change. People will not properly adopt a new technology because they're busy and change is hard. And so, you know, you do need to kind of say, hey, this is a priority for us as a company. We are carving out time. We're going to make you focus on this. And so I really appreciated the push of it. within my group, the AI group, you know, in one way, it's like, well, we're all AI technologists. So we're going to be great at adopting this new thing. But on the other hand, it's like, we're all AI technologists. We better not take it for granted. We better not assume we're great at adopting this new thing. And I remember using Cloud Code. I came across it online, like really soon after it was first publicly released and I played with it. And I really, it took me out for that week. I spent all that time that week, like hacking on it, like late into the night, building prototypes and just being like, wow, there is a change here. And then I went and I, you know, demoed it to the AI group at my all hands on Friday. And like some people were pretty skeptical. Some people were like, we already use cursor. And I'm like, yeah, cursor is great, but like, this is different. And, you know, so I think people do need a push and you have to push yourself constantly because it can be fatiguing and you know to experiment a lot of those experiments will be wrong and like look you know how close are we to 2x i don't know i don't even know if it's achievable and it's one of these it's one of these soft things i know for a fact there are things that we do now that we just wouldn't have done before due to the reduction in friction i can like hack together a prototype a designer can hack together a prototype and so there's definitely qualitative changes there you could fight endlessly about the quantitative impact you know whereas the bottleneck and those law like you know really if you improve part of the system how much does the system overall improve but like there's no doubt that there are incredibly powerful tools here and then there's a mistake to ignore them you've got to engage with them positively and optimistically you've got to try it out and then yeah you can't be dogmatic about it you've got to be pragmatic and stirring in the end and be ruthless and like okay is this a toy or is it really valuable and if it's valuable double down on it and i think that's it that's the sort of the explore exploit process we want all our people running we want them to like not be stagnant we want them to be curious and optimistic and like trying out the new technology and then be rigorous about thinking, did this really make me faster? Did it make me slower? And when it works, double down and double down on similar things in future. And I think Dara's push, I think the company is moving to a posture like that. There's skepticism, but there's optimism too. And I think that's probably the ideal thing, optimism and skepticism together. Yeah. I think that's a great note to end on. This has been a fantastic conversation, guys. I really appreciate the level of depth and detail and simultaneously the level of strategic vision that you've been willing to share. So definitely we'll keep our eyes open for your upcoming announcements at the Pioneer event. You want to tell us when and where that is again? It is October 9th in New York City, and it's going to be a very exciting event. And it'll be live streamed also for anyone that wants to watch it online. New York City and on the internet. Cool. October 9th. Thank you again. And Ona Cave and Fergal Reid from Intercom, thank you both for being part of the Cognitive Revolution. If you're finding value in the show, we'd appreciate it if you'd take a moment to share it with friends, post online, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, we always welcome your feedback, guest and topic suggestions, and sponsorship inquiries, either via our website, CognitiveRevolution.ai, or by DMing me on your favorite social network. The Cognitive Revolution is part of the Turpentine Network, a network of podcasts where experts talk technology, business, economics, geopolitics, culture, and more, which is now a part of A16Z. We're produced by AI Podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at AIpodcast.ing. And finally, I encourage you to take a moment to check out our new and improved show notes, which were created automatically by Notion's AI Meeting Notes. AI Meeting Notes captures every detail and breaks down complex concepts so no idea gets lost. And because AI Meeting Notes lives right in Notion, everything you capture, whether that's meetings, podcasts, interviews, or conversations, lives exactly where you plan, build, and get things done. No switching, no slowdown. Check out Notion's AI Meeting Notes if you want perfect notes that write themselves. and head to the link in our show notes to try Notion's AI Meeting Notes free for 30 days.

Share on XShare on LinkedIn

Processing in Progress

This episode is being processed. The AI summary will be available soon. Currently generating summary...

Related Episodes

Comments
?

No comments yet

Be the first to comment

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies