Back to Podcasts
Gradient Dissent

How DeepL Built a Translation Powerhouse with AI with CEO Jarek Kutylowski

Gradient Dissent

Tuesday, July 8, 202542m
How DeepL Built a Translation Powerhouse with AI with CEO Jarek Kutylowski

How DeepL Built a Translation Powerhouse with AI with CEO Jarek Kutylowski

Gradient Dissent

0:0042:42

What You'll Learn

  • DeepL started in 2017 when neural machine translation was becoming the new standard, allowing them to build specialized models that outperformed existing solutions.
  • DeepL's models balance the need for accuracy (maintaining source text) and fluency (generating natural-sounding target text) through custom architectures and training approaches.
  • The company has had to build its own GPU infrastructure and data curation pipelines to support the compute-intensive training of large-scale translation models.
  • Rather than training individual models per customer, DeepL focuses on injecting the right context into its general models to adapt to specific use cases and customers.
  • Staying ahead of the competition requires continuous research and engineering efforts to leverage the latest advancements in language models and translation techniques.

AI Summary

This episode discusses how DeepL, a successful AI-powered translation company, has built a powerful translation technology by leveraging the latest advancements in neural machine translation. The CEO, Jarek Kutylowski, explains how DeepL has focused on developing specialized translation models that excel at maintaining accuracy while also generating fluent, natural-sounding translations. He highlights the company's engineering efforts to build custom GPU infrastructure and data curation pipelines to stay ahead of the competition. The episode also touches on the challenges of providing context-aware translations at scale for DeepL's large customer base.

Key Points

  • 1DeepL started in 2017 when neural machine translation was becoming the new standard, allowing them to build specialized models that outperformed existing solutions.
  • 2DeepL's models balance the need for accuracy (maintaining source text) and fluency (generating natural-sounding target text) through custom architectures and training approaches.
  • 3The company has had to build its own GPU infrastructure and data curation pipelines to support the compute-intensive training of large-scale translation models.
  • 4Rather than training individual models per customer, DeepL focuses on injecting the right context into its general models to adapt to specific use cases and customers.
  • 5Staying ahead of the competition requires continuous research and engineering efforts to leverage the latest advancements in language models and translation techniques.

Topics Discussed

#Neural machine translation#Specialized translation models#Model architecture and training#GPU infrastructure and data curation#Context-aware translation at scale

Frequently Asked Questions

What is "How DeepL Built a Translation Powerhouse with AI with CEO Jarek Kutylowski" about?

This episode discusses how DeepL, a successful AI-powered translation company, has built a powerful translation technology by leveraging the latest advancements in neural machine translation. The CEO, Jarek Kutylowski, explains how DeepL has focused on developing specialized translation models that excel at maintaining accuracy while also generating fluent, natural-sounding translations. He highlights the company's engineering efforts to build custom GPU infrastructure and data curation pipelines to stay ahead of the competition. The episode also touches on the challenges of providing context-aware translations at scale for DeepL's large customer base.

What topics are discussed in this episode?

This episode covers the following topics: Neural machine translation, Specialized translation models, Model architecture and training, GPU infrastructure and data curation, Context-aware translation at scale.

What is key insight #1 from this episode?

DeepL started in 2017 when neural machine translation was becoming the new standard, allowing them to build specialized models that outperformed existing solutions.

What is key insight #2 from this episode?

DeepL's models balance the need for accuracy (maintaining source text) and fluency (generating natural-sounding target text) through custom architectures and training approaches.

What is key insight #3 from this episode?

The company has had to build its own GPU infrastructure and data curation pipelines to support the compute-intensive training of large-scale translation models.

What is key insight #4 from this episode?

Rather than training individual models per customer, DeepL focuses on injecting the right context into its general models to adapt to specific use cases and customers.

Who should listen to this episode?

This episode is recommended for anyone interested in Neural machine translation, Specialized translation models, Model architecture and training, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

In this episode of Gradient Dissent, Lukas Biewald talks with Jarek Kutylowski, CEO and founder of DeepL, an AI-powered translation company. Jarek shares DeepL’s journey from launching neural machine translation in 2017 to building custom data centers and how small teams can not only take on big players like Google Translate but win. They dive into what makes translation so difficult for AI, why high-quality translations still require human context, and how DeepL tailors models for enterprise use cases. They also discuss the evolution of speech translation, compute infrastructure, training on curated multilingual datasets, hallucinations in models, and why DeepL avoids fine-tuning for each individual customer. It’s a fascinating behind-the-scenes look at one of the most advanced real-world applications of deep learning. Timestamps: [00:00:00] Introducing Jarek and DeepL’s mission [00:01:46] Competing with Google Translate & LLMs [00:04:14] Pretraining vs. proprietary model strategy [00:06:47] Building GPU data centers in 2017 [00:08:09] The value of curated bilingual and monolingual data [00:09:30] How DeepL measures translation quality [00:12:27] Personalization and enterprise-specific tuning [00:14:04] Why translation demand is growing [00:16:16] ROI of incremental quality gains [00:18:20] The role of human translators in the future [00:22:48] Hallucinations in translation models [00:24:05] DeepL’s work on speech translation [00:28:22] The broader impact of global communication [00:30:32] Handling smaller languages and language pairs [00:32:25] Multi-language model consolidation [00:35:28] Engineering infrastructure for large-scale inference [00:39:23] Adapting to evolving LLM landscape & enterprise needs

Full Transcript

You're listening to Gradient Dissent, a show about making machine learning work in the real world. And I'm your host, Lucas B. Wald. Today I'm talking with Jarek Kotelowski. He is the CEO of DeepL, which is a very successful Gen.AI company. You might not have heard of it because what they do is translation and a primary focus of their business is on enterprise. But they are making really, really significant revenue off of a really specific Gen.AI use case. I think translation is a really interesting category to talk about when we talk about Gen.AI, because it's one of the first categories that's being completely disrupted by AI systems. Many, many human translation companies have gotten into trouble. They're starting to shrink as Gen.AI takes off. I think it's a real bellwether to where lots of industries are going. So this is an interesting conversation about both the business implications of running a company in the space and also the technical implications of how do you stay ahead of companies like Open.AI when you have a specific use case. Yarek was very forthcoming with answers to my questions, and I found it super interesting. I hope you enjoy it. I was really excited to talk to you as CEO of one of the most interesting, you know, Gen.AI companies that maybe a lot of people haven't heard of, but I have. But I think, you know, you better introduce your company to our audience. Yeah, hi, thank you for having me. It's a pleasure. so I'm Jarek, CEO and founder of DeepL and DeepL is a company that has actually started a little bit before the AI hype we launched in 2017 and we've been using AI to tackle the language problem in the world we're specialized in translation specifically for businesses like in all of those use cases where you have customers in a different country where your company is maybe spread across the whole world. And we're trying to provide solutions which just help you cross that language barrier as good as possible. And AI has made amazing strides in making that so much simpler. Yeah, that's basically us. And I guess what's kind of amazing about you is translation is such a fast-changing space. My background actually was in building translation models back in the aughts. And I don't think any of it is relevant at all anymore. And then you're also kind of going up against Google Translate and actually all these language models can kind of do some translation if you ask them. So I feel like you're going head to head against these kind of like juggernauts, but beating them on the quality and technology and the technology, the underlying technology is so different now in terms of state of the art than when you started. So I guess maybe could you talk about like how you feel your technology advantage works? Yeah. Yeah. I think it was a totally different space when we started, as you say. Like it's been a fast changing moment. And we were really lucky to start in 2017. I think there was this moment when everything turned to neural machine translation. and I think we chose that moment really wisely because everybody had to throw away basically what they have been doing until now. Everyone had to switch over to Neural and at this point in time, I think for a startup, there was like this opportunity to go ahead and build models. They're excelling what was out there maybe in academia or what the others have been doing. And back then, it was a lot of own architectures kind of creating just the best model type that can suit translation. Like Transformer came out very, very quickly, but we found that there's actually better architectures for translation specifically. On the one hand, you need to kind of generate text, of course, but you have to stick to also what you're seeing as the source text, you have to both maintain a certain level of accuracy because the translation needs to be kind of near to the source text. But at the same time, you want to write in the target language natively, so to say, like you don't want to let the model do word-by-word translations. You want to have to give it a little bit more creativity. creativity. So just kind of this mix of both models that are good at copying and also at writing itself, monolingual, bilingual. That was something that we've been working for quite a while. And that has only continued. Like model sizes are much bigger right now. Reinforcement learning and all of those techniques are coming in. Also allowing those models to do more than just plain translation from sentence to sentence. So it's been quite a journey. I think the advantage that we have really comes from the fact that we're focused on this one area. And those models that we built, even though they might be really competing with the large ones on size, they're still really very much focused on this use case that we're building them for. But it does seem like pre-training, like a model on language would kind of, you know, inform how translation works. Do you train your models completely from scratch? Or, you know, would you use like, you know, if like, you know, Meadow wants to, you know, publish a Lama model and spend, you know, millions and millions of dollars in making that, would you use it? Or how do you think about that? Yeah, yeah, we're looking at those and we're using them as pre-trained models. we're putting still a lot of compute on top of that. There is just this advantage of training on specialized curated data that we've built up over the years and also making sure that we have a proper distribution of all of the different languages. I mean, those models need to be able to tackle maybe not only English and German, but also a few more languages, the kind of smaller ones. And for that, you really also have to have the data and give the model the training steps to look at this data. And so how much of what you do is like basic research and how much is like engineering the models? That's really a good question. I think all of our research somehow, we tend to think about that it's academic and it's sometimes really, really, really model driven. But it always has to be like super applicable and always has to really go into the product. And it also means that a lot of that is engineering. I think it's like maybe like 50-50 would be a good way of describing that. I mean, like performance is super important. compute is expensive and and and also for training and and we're always like a step ahead I would say of the whole market like back then when we came out in 2017 we had to start to build our own data centers because we we essentially couldn't get the GPU compute and and we had to build our own frameworks for like how do we put the the training workloads onto the data center so so a lot of that is kind of pioneering and and and the same uh goes and that then just increases your engineering uh workload because you cannot take like the off-the-shelf uh product that's that's already out there on on on the market wow so you were building out gpu data centers in like 2017 yeah yeah like i've liked really the kind of the first machines i've kind of wrecked myself personally uh that's been that's been pretty cool actually wow did you expect your compute cost to be so big when you started the company like i would think it's it must it must be much more competing that you're buying than than i mean like luckily there was a good kind of correlation between the growth of the company and our revenue streams and the computes that we uh that we needed to build what we what we had to build uh so we've been able to to finance that that pretty well but but yeah i mean like it's it's been it's been becoming larger and larger and uh especially with like the advent of you know the dgx uh generation um at nvidia and now with blackwell like this is this is substantial cost uh but we also consider that being essential for us to to maintain an edge and then be able to train those large scale models. And now you talk about like proprietary data. So, you know, what is that for you? Like, again, I think people always thought, okay, Google has this advantage because they're scraping the whole web. They must find a lot of kind of parallel corpora there. Like what sort of proprietary data do you have? Yeah, I think everybody can scrape the web. You can be better at this. you can be worse at this. We've been doing that for quite a while already and being able to both find parallel corpora, bilingual, that on the one hand, but also monolingual. I mean, that's important too, especially if you're thinking about languages where you cannot find that much bilingual data, then supplementing with monolingual becomes pretty important. It is an effort and you got to know what you're doing. I think in 2017 that was even harder. Right now we have a lot of already pre-crawled corpora on the internet. It's a little bit easier. You can try to start and kick off with those. I think extracting the data maybe out of the kind of websites, etc., etc. It's just like a lot of engineering work. Sometimes actually pretty fun algorithmic work even to do that efficiently. If you have like a huge website and want to really like a full, extremely large domain and want to match which sentence matches the other, it's computationally not that simple sometimes if you want to do that cheaply, but an exciting problem to solve really How do you think about the quality of a translation Like you know I think in the past maybe it was easier you know and that like the translations are so bad that sometimes they'd be like incoherent or just wrong. But I guess like it seems like translations have gotten, you know, in my experience, like pretty, pretty high quality. Like, what do you look at to know? So what kind of separates your translation from a competitor's translation? Or what are the metrics that your models are optimizing at this point? I think an important part is taking context into account. So quite often, if you really look nowadays at a sentence without any context, just looking at one sentence, then even a great human translator or you and me, we cannot do a better job without really knowing what this is all about uh you have to you have to take it into account what kind of document that is maybe sometimes really like what is this company about that is translating that um in order to to get this one uh sentence uh perfect and and and then um and then with that you just give it the model so much more power to do that so i think that's a truly important one. Keeping the models fresh, of course, like data changes, language changes, things develop, and you want to have the models be able to do that. And then kind of this fine-tuning of the model, how much you want to give him a focus on accuracy versus on fluent writing. And those things are sometimes really contrary. themselves, like especially if you have languages that are really different from each other, there might be this clash of whether do you want to make it sound nice or do you want it to be really correct in a way. Interesting. And so does every customer get kind of their own fine-tuned model or how does that work? Yeah, like that wouldn't be scalable. So we are not, at least not in large-scale training models per customer. We're trying to find ways on how we can give the models the right context, how we can inject context-specific or customer-specific information for the particular use case and for this particular customer without having to retrain everything, basically. And with kind of the hundreds of thousands of customers that we have, that's just like the only way of doing that. I think there's many companies, not only in the translation space, but in general in AI who are trying to train models per customer. I don't think that this is a particularly great way unless you have like really very specialized situations in which kind of there is an ROI on that big investment. So how does it work? Like if one customer has an application where, you know, they want like a more, maybe a technical application where you really want like a, you know, like you were saying, an accurate translation and another customer like just wants the language to be fluid. Like, are there sort of like three models to choose from or how does somebody like tune like aspects like that that you mentioned? I think the models we have to pre-tune them in order for them to be able to pick up what kind of language that is. and and then there's kind of for for the technical application this customer uh they're gonna maybe upload their terminology uh that they want to have used in uh in in their um in their translations to our models so that it all is always consistent uh across the whole uh technical documentation base um that's not going to be so important for example in the marketing case in the when you really want fluency, when kind of the craziness of the model or the creativity of that and sometimes choosing something else out of the probability distribution is going to actually make for a great translation. Whereas if you want it really to be consistent, you're going to maybe just control that on your own. okay so like what happens when translation like good translation gets gets really you know cheap and easy like like are you seeing like businesses operate in different ways like once they they start to have access to your technology uh i think this the whole kind of language industry and the whole language problem has changed so much over the last kind of eight years when we've been out in the market that has been both driven by the availability of the technology and the ability to just throw in something into the translator and get an answer like so quickly it's sometimes even not about the cost it's really sometimes about the speed in which you get those translations and also the demand from the market I think has has been growing. Customers demand to have customer support in their own language. They want to see materials being localized when they want to buy. It's not such an easy market anymore if you're just speaking English as a company. So I think that has driven also a lot of our customers to embrace that, really. I think that one of the biggest changes was that, and we're going to see that in AI in general, I think, is that some of those customers of translation, like even within a company, like be it a legal department or be it a marketing department, they started to really self-serve on those solutions. It's not a centralized function in many companies anymore. They just go out to a provider like us. They start using our product on our own. They integrate it into their tools and do not have to rely on an external agency maybe or somebody who would be doing translations in a traditional way. And that has changed kind of this whole axis and therefore also makes up for much more content, much more volume being translated in general. Do you think that the quality of translation has gotten over a threshold where that's like less of a differentiator for customers? Or do you think most customers are still kind of hungry for even more high-quality translations? There's a lot of hunger for quality. I think depending on which quality level you are at, you're always unlocking new use cases for being tackled by a machine translation. Like whatever is enough for this like single one-to-one email that you're sending to your colleague that's sitting in another office in let's say I don't know Taiwan you don't care so much I mean it's it's honestly it's gonna be fine if you're then thinking about translating a contract or translating your terms and conditions and putting them onto your website in 20 different languages that matters a little bit more and in a mistake there might have like really legal consequences so um if you're able to do that automatically and you're able to to simplify this whole workflow a bit more that that makes for for a big difference and then in in a lot of workflows there's still a human in there um checking the translation post editing it as as we would as we would call it and uh the easier you can make this job the less edits are necessary, the less changes are necessary. This really impacts the time needed for that process. And if you think that this person checking is a paralegal, there is like a really hefty hourly salary that is associated with this process. So there's really a big return on investment on any incremental quality improvement that you can do. um what are the i guess one of the things that kind of came up when i was like you know just researching your company kind of looking around for it was a lot of you know human translators talking about your company and sort of worrying like is this going to make me obsolete like it does sort of seem like we're on that trajectory doesn't it like it like do you do you think there will be human translators 10 years from now i think that they will definitely be there i think the amount or the content of translations that are going to be done by humans only is going to be severely reduced. So I think a lot of the kind of boilerplate and boring work of translation, that's going to be all done by AI. A large part of that is already done right now, in the future even more so. I think humans are going to be still incredibly important in this process to kind of guarantee, especially in those like high compliance use cases, if you think about life sciences companies and financial institutions, like there's really the need even now to have multiple human translators on a single piece of text. And it's going to definitely continue. I think we also have to be realistic that on the simplest cases of translation in the most common languages and in cases where quality doesn't matter so much, I think AI on itself is going to be doing an amazing job by itself. Do you keep humans in the loop still for some of your translation applications? Not for production, not for inference. That would just be not scalable. I mean, we're working with thousands of translators and humans to train the models and to kind of give us the feedback and the quality assurance and, and all of that. Um, but you can't employ that at least not in those volumes that, that we are translating in any way in this, um, and like during inference time. Where do humans still outperform the models? Cause when you talking about like like say like legal use case or something And I would sort of imagine like a human might also you know make a typo that a model might make And like you know at least from what I see from translation model performance, it seems so spectacular that I wonder like, would a human do a better job than a translation model? Of course, I'm not like, you know, doing this all the time for my jobs. And maybe I'm way off on that. But I don't know, like my impression is it's pretty close, or maybe even the models might be more reliable in some cases. What are the cases where the model really still needs a human to get it to that level of quality? Not just for compliance, but for actually making the translation somehow work for the East Coast. I do think that models are definitely more reliable and more accurate in a sense. They're not going to be doing those mistakes that we as humans from time to time do just because our brain slips. So that is an advantage for the models. And that's going to be an advantage even more so in the future, I think. I think the models still do not understand the world as we do. And there is a difference there. I mean, with all of those great reasoning models and with also the LLMs that we're using for language, we see they kind of get the world just because of all of the text that they've seen. But this knowledge, this understanding is not as deep as with us humans. And therefore, sometimes in like those very tricky situations, they just cannot distinguish on what was meant there. Like what was the intention of that particular text? And which is why I said context helps because it gives you more of this. But even sometimes that's not enough. And then you're sometimes running into those edge cases. you're running into like half broken sentences which due to you know some some kind of text parsing are slightly weirder you're looking into very short texts because they've been written for an app and the user face of this and and the models sometimes they get confused by that honestly they haven't seen that in the training material so much or they've just seen that very very rarely and they cannot cope with this added complexity then. That makes sense. What about, and I'm just, you know, I think I'm in the, I have the experience of talking to a lot of like enterprises about LLMs in general, and there's always this sort of fear, like stories they say around like hallucination. Is there like a parallel hallucination issue in translation? Yeah, it's been coming up. It's not like it's not there. I mean, like the models are encouraged to be a little bit creative and you have to give them the freedom to just kind of write on their own. And sometimes if they don't know what they should be doing, they start making stuff up. So you have to employ control over that. I think within translation, it's a little bit easier because you can always cross-check and go back to the text, to the original text, and then kind of even post-factum maybe sometimes do some evaluation of whether this has gone astray or not. And the kind of creativity space that you're giving those models is then slightly lower than in a general purpose LLM, which is just generating text. But you have to be wary of that. And we in general have seen that specialized models, and that's kind of one of the differences, hallucinate less than general purpose Gen AI models when they're being employed for translation. Then I guess you keep talking about text transition, but you also offer speech transition, right? Is speech just like a smaller market or what? I mean, is your emphasis on text amount? It's just a newer market. I'm super excited about speech, actually, because it makes such a big difference. So that's something that we just put out on the market last year. I think the tech just wasn't there yet for it to be productized in such a good way to be able to like users being just happy with the output. And it has just come to that level where it's really practically applicable. I think we've gotten accustomed, as you say, to great text translation over the years by now. So it's not making that much of an impression to us. I think speech translation is just like this new amazing thing that has come up. I was on my own in like customer conversations in Asia, in Japan, where we'd usually have my sales team help translate a little bit or we would have even an interpreter in the room. And it's always a little bit cumbersome. you don't get really fully what's happening in the room or you get it with like a ton of delay and with now with speech translation technology you're you're like fully embedded fully immersed into the conversation it's it's not as good as if you really speak that language of course but it's but it's pretty damn near I have to say have there been kind of new challenges that have come up with the speech part? I mean, yeah, sure. You've got the speech recognition part, which is super important. Then the language we're speaking in is just different than how we're writing. And it's just not as clean. We don't have so much time to think about what we're saying comparing to when we write something. And therefore, it tends to be just like a little bit garbled and you don't know where the sentence starts. You don't know where the sentence ends. So speech recognition can kind of solve for parts of that because those models are really trained to package that stream of words into some coherent sentences. But still, I think the model has to cope with more. And the quality of the source input is also lower just because speech recognition makes its own mistakes. And then the translation model has to somehow figure out on what should I do then? Like, is this, does that world even really match here? Or should I maybe substitute with something that's just more probable at this point? And do you try to like keep, like preserve like the rhythm of the speech and the tone of the speech? Like, does that somehow carry through or? Not yet in that big. It's like not a big focus. I think right now the main focus is really relying on latency and just making the translation as real-time as possible. Like, we know this is incredibly important for the user experience. The quicker you get the translation, the better you stay in the flow, the more you can match the mimics of the speaker that you're talking to, to what you see in terms of translation. So the conversation becomes much better then. And that's like one of the most important parts. And then just the pure translation quality, being able to catch up on all of those company-specific terminology, technology, making sure that you do not miss on the proper name of the CEO of that company. Like all of those things are super important to make a good impression on the users. Do you think about like the impact that you'll have on the world when like speech translation is like very, very easy to turn on? It seems like it really changed the way businesses work, doesn't it? I'm very much looking forward to that, honestly. I think this way we can really get all of this great cultural diversity that is out there in the world and the different working styles and the strengths of different countries. and we can really mix and match that through kind of our global supply chains and the way that we are working. And those of us who really speak well English have been incredibly privileged in this international world, I would say. And we have the ability to now have many, many more people join this community so well. And then maybe even in the process of participating, really learn that language and become fluent by themselves. But in the first moment, giving them the confidence that they can speak up in that meeting, they can participate when they have an idea, which quite often, honestly, does not occur if you're not really proficient in that language. Totally. and I mean even like you know like a world without language barriers where you call your you know friends in a different you know language seems pretty amazing doesn't it I mean yeah I mean like for me there's like a limit to that at some level I think I think I would really still want to have friends and speak to people whom like I really understand the language on my own I think it also brings in like much more like it brings us much nearer from a cultural perspective because the language is usually kind of even tailored to the cultural history of a country and there's there's so much embedded in that uh so so i think i think i think there's like there is realistically a limit to what the i can do here um especially in all of those private situations and like i cannot imagine like living with a partner and like speaking through a phone with them like for my whole life that just doesn't work. But for all of those business situations, I think it's just like that's going to be just purely great. And I guess what's the state of the art of handling less prevalent language or less common language? How much data do you need to collect to make a usable translation model, either in speech or text? I mean the question is usable for which purpose But yeah I mean there is a gradient in how good translation quality is depending on the different language pairs And it availability of data on the one side And then it also obviously the amount of work that companies like us or even what happens at academia or with our competitors can put into these particular language pairs It's just like a question of, once again, business return on investment. And we're trying to make sure that we cover the languages best that our customers need and that they're requesting from us. So definitely there is like this tier one of languages that are the biggest global languages. Then there is a second tier of slightly smaller languages where there's already quite a lot of material where you get really good results. like Poland would be and Polish would be a good example, which is like where I was born. And it's a decently large language with a good amount of training material. But then if you go into really, really smaller languages, that's going to be much harder and that's going to take more time to really get those on the board at the same quality level. I think also probably we're going to have to become smarter in how we train models and not require so much data for them in order to get those languages to the level that we're expecting to get really fluent. Do you build specialized models for every language pair or is it all kind of combined into one gigantic model? We've been building a lot of separate models, actually. And in the last time, we've been starting to consolidate that at least into chunks of models and or like models that can handle a group of languages. It's also a little bit different depending on when you're thinking about text or voice. If it's speech translation, then once again, like latency comes into play. Super small model sizes or like smaller model sizes are important. and then they might not be able to cope with all of the different languages at the same time. Just the parameter count is not enough. I see. And I guess you could do different tokenization strategies for different languages probably. Oh yeah, totally. You can do that. If that makes sense, yeah, you can do that. But I was kind of amazed by, I don't know if you saw that Anthropic had that paper they put out where they were kind of showing how it seemed like it's similar sets of neurons like fire in their network with words being the same thing in different languages. I kind of always wondered if it worked like that, but it's kind of amazing to see that. It sort of made me think like maybe these more combined language models would start to work better as these networks get more powerful. Yeah, yeah. I mean, they work better. And on the other side, it's much easier on the engineering and deployment side if you don't have like hundreds of models to to cope with and like version them and um train them independently so it's just like easier for us um but yeah groups of languages especially if they're similar uh that that helps a lot and then if you have like a group of similar languages that don't have enough data they fuel each other and and make it easier and uh yeah like over the time we've been looking at many features of those models and how they map what happens in those models with some of those linguistic nuances and and our understanding of language and and sometimes these are really it's kind of really funny uh things that you can find and how how certain things match to each other and how you find clusters of meanings and how that all really sits sits sits near to each other uh but then it's so at the end it's so many dimensions that if you want to kind of try to sum it up and and understand really what happens from end to end that at some point many times it just gets too just far too complicated. And it occurs to me that you're like one of the few companies that has really deployed, you know, deep learning giant, giant models I guess they call them now at scale. Can you, can you talk about some of the like engineering or like operational challenges of, of making this work? Like, you know, kind of what surprised you as you, as you scaled up the size and volume of inference in these models? Yeah, I mean, like for us, it was like pretty much everything. Just A, we've started so early on that. And as I said, like we had to build a lot of the stack for that. Even things like how do you distribute requests that are coming in from the users to the different GPUs that you have available? Like you always have to kind of strike the balance on how big batch sizes do you make. you want to utilize your GPUs well, but also you want to maintain like low latency for your users. So you have to make sure that you have the tech which kind of groups those, understands what those requests are, sends them off to GPUs. Like now in 2025, there's like more common technology and it's much simpler to do. And like back in the years, that was definitely trickier. If you have a wide range of models, like depending on language pairs, depending on the load that you're getting on the system for different language pairs, you might want to like spin up new models, like spin up more models for Japanese because it's the kind of the time zone for Japanese language and spin down other models. So we have to build the tech for like scheduling all of that and reacting to load changes. So I think GPU compute is really different than CPU compute. And there's been quite a few funny algorithmic challenges to really solve in that too from an engineering perspective. Are you one of those companies that's like totally like compute constrained? Like if you had more GPUs, you could generate more revenue immediately? the air um i i think like we we don't have a problem of getting the gpus like i mean that the kind of market we've gone through a few moments in in time when like just getting the gpus even if you had an infinite money uh was super hard i think we're not at this point right now it's it's like the supply works and um of course more compute would be great but i think we would probably also need more researchers, more brains basically to also utilize that at some point. It's not only about raw computing power, although that's also important, of course. Do you run on all NVIDIA GPUs or have you experimented with some of the more exotic GPUs? It's actually all NVIDIA. We're obviously looking at the other ones and then kind of testing benchmarking uh migration is hard though like i mean that's that's like everybody knows that that's that's not a big secret uh and and speed matters in this industry um so really uh doing big migrations is not it's not that easy and especially as we're running our own individual um architectures uh on the models it's also not that easy to just go to an off the shelf provider of inference and just stick your model in there there's gonna be just so much more migration uh overhead uh for that so yeah for for that for the time being sticking to nvidia but also like really looking at the alternatives all of the time and and we're seeing the market catching up i would say i there's there's there's a lot of new fascinating stuff coming up there do you think it's gonna change like what you need to do to stay ahead of the market over time like It seems like if the LLMs get more and more and more kind of powerful, could general purpose LLMs start to eat into your translation market, especially the sort of simpler translations that are as mission critical? How do you think about that? Yeah, I think we have to change there. I think we have to understand much better what the translation is being used for. It's less so about kind of translating the sentence from A to B, but understanding like what is the full workflow in an enterprise with this translation? Like, is there going to be somebody reviewing that? Is that actually a second version of a translation which has been done already earlier? And there was that first one has been done by AI, but then there was like a revision by a human which introduced some changes. and then it makes sense to like feed all of that input into into the model uh in order to to to enable it to be even even more accurate so it's becoming more about kind of the the enterprise workflow and how you can embed the ai into that how you can do actually like more more deep deeper product research but also fueled by ai in order to solve the higher order problem and not just the simplistic translation case. And then it's frankly not that trivial because we're coming from a super horizontal product as we are with just translation being embedded in so many different use cases. And you have to be smart about picking and choosing the most important ones and where you can also add much more value to those. Interesting. Have you started to offer services like that? I mean, that's just kind of part of our ongoing product discovery, discovery, like understanding what our customers are using translation for and then kind of embedding functionalities into the models that drive those and then exposing them in the proper way. So it's just like part of our normal product development cycle, I'd say. Cool. All right. Well, thank you very much. That's all the questions I have. I appreciate your time. Lukas, it's been perfect. Thank you very much. Thank you very much. Thanks so much for listening to this episode of Gradient Descent. please stay tuned for future episodes

Share on XShare on LinkedIn

Related Episodes

Comments
?

No comments yet

Be the first to comment

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies