Back to Podcasts
Practical AI

Autonomous Vehicle Research at Waymo

Practical AI • Practical AI LLC

Thursday, November 13, 202552m
Autonomous Vehicle Research at Waymo

Autonomous Vehicle Research at Waymo

Practical AI

0:0052:08

What You'll Learn

  • Waymo has expanded its driverless car service to 5 major metro areas, serving hundreds of thousands of rides per week
  • Safety studies show Waymo's autonomous vehicles are 5-12 times less likely to get into accidents or collide with pedestrians compared to human drivers
  • Waymo is expanding to new domains like highways and snowy environments, and plans to launch in London next year
  • The autonomous vehicle system includes sensors (cameras, lidar, radar, microphones), compute, actuators, and redundant safety systems
  • Waymo's vehicles are electric, which the company sees as beneficial for the environment and accelerating the transition to electric vehicles

AI Summary

The episode discusses the advancements in Waymo's autonomous vehicle technology over the past 5 years. Waymo has expanded its driverless car service to 5 major metro areas, serving hundreds of thousands of rides per week to paying customers. Their safety studies show their autonomous vehicles are 5-12 times less likely to get into accidents or collide with pedestrians compared to human drivers. Waymo is also expanding to new domains like highways and snowy environments, and plans to launch in London next year. The episode provides a high-level overview of the autonomous vehicle system architecture, including sensors, compute, and redundant safety systems.

Key Points

  • 1Waymo has expanded its driverless car service to 5 major metro areas, serving hundreds of thousands of rides per week
  • 2Safety studies show Waymo's autonomous vehicles are 5-12 times less likely to get into accidents or collide with pedestrians compared to human drivers
  • 3Waymo is expanding to new domains like highways and snowy environments, and plans to launch in London next year
  • 4The autonomous vehicle system includes sensors (cameras, lidar, radar, microphones), compute, actuators, and redundant safety systems
  • 5Waymo's vehicles are electric, which the company sees as beneficial for the environment and accelerating the transition to electric vehicles

Topics Discussed

#Autonomous vehicles#Waymo#Sensor systems#Safety and redundancy#Electric vehicles

Frequently Asked Questions

What is "Autonomous Vehicle Research at Waymo" about?

The episode discusses the advancements in Waymo's autonomous vehicle technology over the past 5 years. Waymo has expanded its driverless car service to 5 major metro areas, serving hundreds of thousands of rides per week to paying customers. Their safety studies show their autonomous vehicles are 5-12 times less likely to get into accidents or collide with pedestrians compared to human drivers. Waymo is also expanding to new domains like highways and snowy environments, and plans to launch in London next year. The episode provides a high-level overview of the autonomous vehicle system architecture, including sensors, compute, and redundant safety systems.

What topics are discussed in this episode?

This episode covers the following topics: Autonomous vehicles, Waymo, Sensor systems, Safety and redundancy, Electric vehicles.

What is key insight #1 from this episode?

Waymo has expanded its driverless car service to 5 major metro areas, serving hundreds of thousands of rides per week

What is key insight #2 from this episode?

Safety studies show Waymo's autonomous vehicles are 5-12 times less likely to get into accidents or collide with pedestrians compared to human drivers

What is key insight #3 from this episode?

Waymo is expanding to new domains like highways and snowy environments, and plans to launch in London next year

What is key insight #4 from this episode?

The autonomous vehicle system includes sensors (cameras, lidar, radar, microphones), compute, actuators, and redundant safety systems

Who should listen to this episode?

This episode is recommended for anyone interested in Autonomous vehicles, Waymo, Sensor systems, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

<p>Waymo’s VP of Research, Drago Anguelov, joins Practical AI to explore how advances in autonomy, vision models, and large-scale testing are shaping the future of driverless technology. The conversation dives into the dual challenges of building an onboard driver and testing that driver (via large scale simulation). Drago also gives us an update on what Waymo is doing to achieve intelligent, real-time performance while ensuring proven safety and reliability.</p><p>Featuring:</p><ul><li>Drago Anguelov – <a href="https://www.linkedin.com/in/dragomiranguelov/">LinkedIn</a></li><li>Chris Benson – <a href="https://chrisbenson.com/">Website</a>, <a href="https://www.linkedin.com/in/chrisbenson">LinkedIn</a>, <a href="https://bsky.app/profile/chrisbenson.bsky.social">Bluesky</a>, <a href="https://github.com/chrisbenson">GitHub</a>, <a href="https://x.com/chrisbenson">X</a></li><li>Daniel Whitenack – <a href="https://www.datadan.io/">Website</a>, <a href="https://github.com/dwhitena">GitHub</a>, <a href="https://x.com/dwhitena">X</a></li></ul><p>Links:</p><ul><li><a href="https://waymo.com/research/">Waymo Research</a></li><li><a href="https://waymo.com/blog/2025/06/scaling-laws-in-autonomous-driving">New Insights for Scaling Laws in Autonomous Driving</a></li><li><a href="https://www.youtube.com/watch?v=11WEL7sR1GA">AI in Motion</a></li></ul><p>Sponsors: </p><ul><li>Outshift by Cisco - The open source collective building the Internet of Agents. Backed by Outshift by Cisco, AGNTCY gives developers the tools to build and deploy multi-agent software at scale. Identity, communication protocols, and modular workflows—all in one global collaboration layer. Start building at <a href="http://agntcy.org/">AGNTCY.org</a>.</li><li>Shopify – The commerce platform trusted by millions. From idea to checkout, Shopify gives you everything you need to launch and scale your business—no matter your level of experience. Build beautiful storefronts, market with built-in AI tools, and tap into the platform powering 10% of all U.S. eCommerce. Start your one-dollar trial at <a href="http://shopify.com/practicalai">shopify.com/practicalai</a></li><li><a href="http://fabi.ai/">Fabi.ai</a> - The all-in-one data analysis platform for modern teams. From ad hoc queries to advanced analytics, Fabi lets you explore data wherever it lives—spreadsheets, Postgres, Snowflake, Airtable and more. Built-in Python and AI assistance help you move fast, then publish interactive dashboards or automate insights delivered straight to Slack, email, spreadsheets or wherever you need to share it. Learn more and get started for free at <a href="http://fabi.ai/">fabi.ai</a></li></ul><p>Upcoming Events: </p><ul><li>Register for <a href="https://practicalai.fm/webinars">upcoming webinars here</a>!</li></ul>

Full Transcript

Welcome to the Practical AI Podcast, where we break down the real-world applications of artificial intelligence and how it's shaping the way we live, work, and create. Our goal is to help make AI technology practical, productive, and accessible to everyone. Whether you're a developer, business leader, or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn, X, or Blue Sky to stay up to date with episode drops, behind-the-scenes content, and AI insights. You can learn more at practicalai.fm. Now, on to the show. Well, friends, when you're building and shipping AI products at scale, there's one constant. Complexity. Yes, your wrangling models, data pipelines, deployment infrastructure, and then someone says, let's turn this into a business. Cue the chaos. That's where Shopify steps in, whether you're spinning up a storefront for your AI-powered app or launching a brand around the tools you've built. Shopify is the commerce platform trusted by millions of businesses and 10% of all U.S. e-commerce, from names like Mattel, Gymshark, to founders just like you. With literally hundreds of ready-to-use templates, powerful built-in marketing tools, and AI that writes product descriptions for you, headlines, even polishes your product photography. Shopify doesn't just get you selling, it makes you look good doing it. And we love it. We use it here at Changelog. Check us out, merch.changelog.com. That's our storefront. And it handles the heavy lifting too. Payments, inventory, returns, shipping, even global logistics. It's like having an ops team built into your stack to help you sell. So if you're ready to sell, you are ready for Shopify. Sign up now for your $1 per month trial and start selling today at shopify.com slash practical AI. Again, that is shopify.com slash practical AI. Welcome to another episode of the Practical AI Podcast. This is Daniel Whitenack. I am CEO at Prediction Guard and I'm joined as always by my co-host Chris Benson, who is a principal AI research engineer at Lockheed Martin. How are you doing, Chris? Doing great today, Daniel. Lots of, as always, lots of AI and autonomy to talk about. And you know what? We have Waymo to talk about as well. We have Waymo to talk about. Yeah. Speaking of Waymo, we're very excited to welcome back Drago Engelov, who is the vice president and head of the AI Foundations team at Waymo. Welcome, Drago. Thank you, guys. It's great to be back after five years or so, right? After five years, yeah, we were commenting before we started the recording that the last episode with Drago was on September 1st of 2020. So that was episode 103. So a few things have changed in the world generally, but certainly in relation to AI. I'm wondering if you could maybe just catch us up at a high level, Drago, in terms of driverless cars, autonomous vehicles? How do you see the world differently now than you did in 2020? So one thing I would say is in October of 2020, we opened our Waymo One service in Phoenix East Valley to everybody. So, you know, just one month after we talked. But since then, we have launched and scaled quite dramatically in now five major metros. And this is San Francisco, Los Angeles, Phoenix, Atlanta, and Austin. And we are also serving hundreds of thousands of rides a week to paying customers. We are expanding. We announced expansion to at least half a dozen or no more cities that will be going on through next year. And we may announce yet more. In the cities we were at, we continue reporting the safety performance of our autonomous driver. And we are over 100 million autonomous miles driven on the road at this point. So it's fairly statistically significant. And in those miles, our safety study at close to 100 million miles showed that we are five times less likely to get into accidents with critical injuries. And over 10 times, I think 12 potentially, less likely to get into collisions or injured pedestrians. So that has been happening. and we are on to doing more and more right now. I think we work on improving the driver further. We have a sixth generation vehicle coming up. We have started partnering with different companies. For example, we're partnering with Uber in Austin, Atlanta. So our vehicles show up on their app in those cities. We have partnered actually with Lyft in Nashville, if I believe. And we partnered with DoorDash to explore delivery. So we're exploring and expanding the scope and the partnerships that we are doing as well. But I think in 25, I would say a lot more people have had and continue having the opportunity to try Waymo. I'm quite a convert myself. To me, probably the aha moment, the big aha moment was in 22 when I got riding in San Francisco by myself fully autonomously. And so since then, it took some time for more people to get exposed. But now I think the phenomenon is out there. And I think also the autonomous vehicle industry went through like cycles. cycles. There was certainly around 22, 23 time of pessimism in autonomous vehicles. But I think through our success, through generative AI, and I think there's other companies now, it's again, a very lively space. There is others that are also trying to push what's possible with autonomous driving and robotics. So it's again, very, very happening place. And yeah, we are contributing probably I would like to think the most advanced version of an embodied physical AI today that you can do without. That's fantastic. I got to say, as a native Atlantan, I'm so happy that you guys are in my city. And we're a very, very car centric city as well. You know, you really have to have a vehicle to get around. And I noticed, you know, as you were naming the cities that you guys are in, that tended to be the case in terms of variato. Does that play into any of the way that you guys think about testing in terms of being, you know, like Atlanta traffic for its size is notoriously bad. Um, and I would love to see, uh, ever more Waymos, uh, and, and other autonomous vehicles here because I am terrified of all the drivers around me with our daily collage of, of traffic accidents and stuff like that. So, um, I keep telling everyone just wait, autonomous vehicles are coming. I'm kind of curious how you pick these different testing cities that you guys engage in, and what are some of the things that you're testing for that maybe those locations are particularly apt for helping out on? So, I mean, it's a bit of a combination of both technical and business reasons. I think we are trying to do large metros where, you know, autonomous vehicles can be a big market and help a lot of people. So that's one. And also, we've intentionally been growing our ODD, so to speak, operational design domain. Our first service, Waymo One in Phoenix East Valley, Chandler, that's maybe a bit suburban with up to 45-hour arterial roads. And we learned to master it and then went to San Francisco that is dense urban with fog and some rain and hills and windy roads and narrow roads and tons of pedestrians downtown. So we dealt with that and then we started expanding i think some of this is atlanta is a big city also different state uh there's some differences across the various states and both how people instrument the roadway and how people drive right so we're spreading geographically more and more i think also we're spreading to other domains a few a few that are really top of mind is uh highways We have been working on highways for a long time. We've gotten to a certain point with highways. Generally, to have a good taxi service, you need highways, right? And it turns out that's a very fascinating, interesting problem. They're difficult because whenever you move at high enough speeds, like 65 miles an hour or so, right, the consequences of any mistake are really high. And many things can happen. And so it pushes your robustness and safety capability there. So we've been doing highways. But one thing I did do is I rode. Now we can give highway rides to employees. And I rode one to Millbrae Station to get to the airport. And it's fantastic. So I hope to be able to bring it in the future to more and more people. I think that will make the service a lot more useful. Also, we announced that we will drive in other cities that have snow. So potentially even in 26, right? So our sixth generation platform is designed after the Jaguar, right? It's a Zikr vehicle and Jili Zikr. It's a Zikr. And that Zikr is designed with our hardware suite to be able to handle snow. and we are also heading out to other countries so we announced that we intend to launch driverless capabilities in london next year and london is left side driving city and so it's tokyo where we are currently uh have vehicles and we're testing right so you can see we're trying to cover little by little the operational design domain of most large metros with all of their properties. We're, of course, also in Texas. That's its own unique state. But we started with more southern states, large metros, so you don't have to worry about snow at least. You want to tackle these challenges in some order, not just try to do everything at the same time. It's very difficult to validate your ability to do well in everything all at the same time, right? So we're just taking, we're kind of mixing what makes business sense with actually expanding the capabilities to become truly global driver. And you started, you mentioned the driver, the car. I'm wondering if for those out there, those listening, which this is kind of maybe hard to do just from an audio standpoint, but if you kind of imagine the driverless car as a system in 2025, how would you kind of describe that architecture, that system? What are the kind of main components? Like I imagine, you know, the sensors, the actual car, the computer, like how, what does that system look like in 2025, just at a high level? And then of course, I'm sure we'll get into some of the modeling things and foundation models and all of those things. But I mean, the car is, you know, ultimately it's a robot on wheels, right? The main distinguishing capabilities are that it has a set of sensors in our case, camera, lighter, radar, and microphones. Our microphones are quite helpful for many things, including listening to sirens, right? And occasionally instructions. Then you have compute on the car. It's a non-trivial amount of compute. It's more than you can put on a phone, right? And all our vehicles are electric. That was an explicit choice of the company. I personally am quite proud of this choice. I think that's good for the environment to actually have such cars. and I think can accelerate, I think, transition to more electric vehicles, which I think is good personally. And so they have this robot on wheels with computer sensors, and then you have actuators, right Then there is a lot of system design engineering to make sure you know steering and brakes and all these things they need redundancy and robustness to make sure that if any system goes wrong or we need to think also if compute parts of it can go down that you have contingencies So it needs to be designed with redundancy. You need to think of what if, you know, steering wheel column, like there can be also issues with steering. What is the redundancy? So for autonomous vehicle, you need to think additionally and build these things into the hardware. So it's a robot designed for safe transportation from the ground up, even though we're using, we're just extending existing platform. And we work with the various automakers. As you're doing this, and you guys have progressed over these five years since we last talked, one of the challenges is probably not every person out there is a Chris or a Daniel who's very invested in this kind of technology going forward. You have a lot of people out there. Here in the South, we joke that every other driver thinks they're a NASCAR driver and stuff. And that notion of control and safety and, you know, the general population may not have as much confidence in some of these technologies because they're not following it closely and living it the way you do all the time. How do you approach that and how has that changed over these last five years since we talked to you last in terms of getting buy-in from the public and getting them feeling, you know, like it's – as you talk about the safety statistics, which are amazing, but getting them to really feel that deep down, you know, inside that they can – that they know they can trust and believe in this mode and that it is, in fact, much, much safer than what they are typically doing on a day-to-day basis. So there is, you know, people do not feel statistics. It's hard, right? Because they're product of many, many rides. You're doing 10 or even 100 safely is not enough. I think what people feel is when they get into the vehicles, and this worked for me in my home, even though even before, and also my wife and friends of mine, people get comfortable really, really fast. You need to pass a certain bar where they feel, okay, this thing actually is a really, really good driver. My my mother-in-law sat in it just a few weeks ago for the first time. And she, she wrote, it rolled around. She's like, this car drives much better than me. Right. And once she thinks this way, she's, she's immediately at ease. I think, and I think people relax after the first several minutes are very exciting. And then they relax and enjoy the experience and mind whatever they like to mind, either the environment or their phone or other things. People get really used to it. if you cross this threshold of can I trust you? And I think your driving immediately shows this. Now, us in the industry also understand that, you know, coming back to statistics, you need to back it up. And so with regards to backing up, Waymo, right, we believe in transparency and we're quite open with the incidents that happen. We file the details and we also track the statistics and do our best estimate. We have a great safety team. They publish these reports in them. We evaluate and try to estimate how are we doing compared to a fleet of human taxi drivers or human drivers driving in our area that we are handling. And this is both by us, but also like their studies done by insurance companies who of course want to quantify this very well. And so there's a Swiss rest study also proving our numbers. They also believe we significantly can decrease claims of different kinds for injuries, for accidents, and so on as well. So that's another external validation for the kind of thing we provide. So that's what I would say to people. Now, you know, it's a process. You need to work with the local communities. You need to work with police. You need to work with, you know, the various city stewards, officers. We train a lot of people. We engage with them. We work over time. I think you can see that in the cities we have been over time, I believe generally the trust in us increases. And I think that the satisfaction of Waymos by the users, if you look at the apps like in the stores, so I think on the App Store, we had a five-star rating, right? So there is a lot of, a bit of almost like people that would just use Waymos now, if they could. And that's a testament to the value that people see in the rights. But it drives, of course, to safety and ultimately engaging these people, getting them comfortable. Often when people experience this, many of them become converts. So I encourage people, try it. you may be the next convert if you have not yet I personally love it I take it as much as I can and it's always a pleasure working on a product you enjoy yourself so I feel blessed that way well friends it is time to let go of the old way of exploring your data it's holding you back but what exactly is the old way well i'm here with mark de poe co-founder and ceo of fabi a collaborative analytics platform designed to help big explorers like yourself. So Mark, tell me about this old way. So the old way, Adam, if you're a product manager or a founder and you're trying to get insights from your data, you're wrestling with your Postgres instance or Snowflake or your spreadsheets. Or if you are and you don't maybe even have the support of a data analyst or data scientist to help you with that word. Or if you are, for example, a data scientist or engineer or analyst, you're wrestling with a bunch of different tools, local Jupyter Notebooks, Google CoLab, or even your legacy BI to try to build these dashboards that someone may or may not go and look at. And in this new way that we're building at Babi, we are creating this all-in-one environment where product managers and founders can very quickly go and explore data regardless of where it is, right? So it can be in a spreadsheet, it can be in Airtable, it can be in Postgres, Snowflake, really easy to do everything from an ad hoc analysis to much more advanced analysis. if, again, you're more experienced. So with Python built in right there, NRI Assistant, you can move very quickly through advanced state analysis. And a really cool part is that you can go from ad hoc analysis and data science to publishing these as interactive data apps and dashboards, or better yet, at delivering insights as automated workflows to meet your stakeholders where they are in, say, Slack or email or spreadsheets. So, you know, if this is something that you're experiencing, if you're a founder or product manager trying to get more from your data or for your data team today, you're just underwater and feel like you're wrestling with your legacy, you know, BI tools and notebooks, come check out the new way and come try out Fabi. There you go. Well, friends, if you're trying to get more insights from your data, stop resting with it. Start exploring it the new way with Fabi. Learn more and get started for free at Fabi.ai. That's F-A-B-I dot A-I. Again, Fabi dot A-I. Well, Drago, I understand that every driverless car company is going to have a different approach to modeling and all of those sorts of things. You've talked a little bit about the hardware and the car, but I think it would be good for people to understand. We talk about this driver, or you mentioned the driver. People might have in their mind, because we do talk a lot about models now. after the generative AI boom that there's this model that can reason and blah, blah, blah. And so people might have this view of like, there is a model that drives the car. Could you help us really break down like in 2025, is this a system of models, models that do different things, a kind of combination of different types of models and even non-AI pieces? Could you just help us kind of generally understand how that works? So when you think of the stack, right, let's talk first about what it needs to do. It needs to perceive the environment using the sensors. It needs to build some representation of this environment. It needs to use this representation of the environment to make a set of decisions. And so traditionally, I mean, autonomous vehicles around a long time. When was around? Over 15 years already, right? So it's a rapidly developing technology space, but traditionally you can think of, there's this, historically people thought, okay, there are these models. There's a perception model that builds a representation of the world that can be useful for certain things. And then there is some kind of behavior prediction and planning module that reasons what we could do and potentially some people like to also reason and what others could do to cross-reference our behavior with the other folks. And then based on all this information, eventually select promising decisions. So that's what a stack normally does. Now, there's different ways to implement it. Generally, the trend has been to have few, and in some cases people claim they have one, large models, AI models on the car. and you can say ML or AI. For a while, it was called ML. When the models became big enough, people called it AI, right? So you have these large AI models on the car. A few or one, depending on the various companies and they're connected in certain ways. You can train them end-to-end or not. That's also an option. Different companies can choose. The two are orthogonal concepts. Whether you have modules and whether you can train them end-to-end is different concepts, right? So it can be structured end. to end. So essentially have models end to end. These are two. And so different companies on this very coarse taxonomy fall somewhere in this bucket, right? And I think Waymo always has used AI or ML since I've been there and it's been the backbone of our tech. I think over time our models have streamlined and become fewer and fewer. I can say that. I think off-board what my team does is build this large foundation models for Waymo that are not limited by how much computer latency constraints you have. And they can be quite helpful to essentially curate data or teach the models that actually run on the car in the simulator. We can get to simulators later. So we have experience with most aspects of these options, whether it's end to end and whether it's structured or not, right? I think off-board, I can definitely tell you we've explored a lot with large vision language models. That's one of the latest technologies that's relevant to us. I think in the field of robotics, people talk also about vision language action models, because you can tie in one model, you know, both understanding vision and language inputs and potentially ask for certain actions as outputs, right, which is ultimately what the robot needs to generate. So that's an exciting area that has developed in 25. I think in our foundation model, Waymo Foundation model, we combine benefits of these vision language models, but also combine it with some bespoke Waymo architecture innovations. I think in areas such as fusing these new modalities that vision language models typically are not trained on, like LIDAR and radar is one. Another one is modeling the evolution, future potential evolutions of the world. There is some interesting Waymo technology on how to do this well that we also use. But we fuse all of this and VLM technology, world knowledge from also other bases, whether it's a world model or visual language model into something that then is able to do well on autonomous driving tasks. So that's off board. On board, we don't typically talk exactly what is there, but I think we're trying to get state-of-the-art the best architectures that we believe solve the problem and put them together on the car. I think it's a really, really high bar to have a model perform in all the conditions and all the situations we need it to right And so we also have some notion of as you know VLMs also have this weakness of hallucination So we have the safety harness around them to prevent hallucination, to double check what they're predicting, right? So we also have that aspect in our style as well, which we have worked on historically. So that's what I can say on a high level. I hope that's not too scattered. Maybe you guys, if you want anything specific, we can discuss that in a little bit more detail. So I do have a follow up to that. And recognizing that you're not able to get into the specifics of how of this of the architectural decisions and model decisions that Waymo is is engaged in. If you could abstract it a little bit and maybe just talk about the space a little bit. But I'm curious, as you talk about world models and having representation of the environment, that brings in not only AI, but the notion of simulation as one of the tools in the tool chest, if you will. I suspect we have a lot of listeners that are hearing lots of different AI use cases in general, but may not have as much expertise in autonomy. And so as you talk about that notion of representation of that environment, could you talk a little bit about what that problem looks like and what are different things that you might think of to solve it without having to get into how you guys have done it? But just kind of like what is that juxtaposition of simulation, AI, and representation of the world and the environment around you look like? So maybe, I mean, simulation, if we're going to go there, maybe I can just juxtapose two things there. I like saying this historically. I've been doing this for a while. There are two main problems in autonomy. One is to build this onboard driver. And another one is to test and validate this onboard driver. And both are really, really hard problems. And people usually talk about the first one. but I think imagine there is some collection of models and you need to prove that it's safe enough to put them out in the real world. That's in itself a really challenging problem, arguably no simpler than putting the model, the first model together. And that one ultimately, because you need to be a bit more exhaustive, potentially takes even longer time to build the full recipe to validate things properly, right? So these are the two problems. Now, in autonomy, what is different maybe than the standard AI models is there's a few things. One is, I mean, ultimately output actions that are commands to a robot that are a different type of data than traditionally, say, text and images, right? I think that's one. Another one is we operate under strict latency constraints. You need to react quickly. For us, what is also interesting in AV is this is probably the first serious domain where we had to really learn how to interact with humans in the same environment. So it's highly interactive multi-agent setup, right? And then we have additionally, if we choose to add additional sensors and cameras, we have a lot more modalities coming in and we have a ton of data. So essentially the way to think of it is imagine you get maybe billions of sensor reading per second or even tens of billions, a lot. And you need to make decision. You need to have a context of many seconds of these sensor inputs, maybe a dozen cameras, half a dozen light and radar. So you need to collect, you know, maybe five to 10 seconds. some can argue 20, 30 of context to make a decision. And the decision is fairly low dimensional. It's like, okay, steering or acceleration. But the inputs are incredibly bulky. So you need to somehow learn the mapping from this extremely high dimensional space, representational space to decisions. That's very hard, right? Under latency constraints, under safety critical constraints, That's what makes our domain interesting. Now, a lot of the things that work in machine learning in one domain transfer to the other, right? So yes, there is, for example, very similar scaling law findings that if you have cutting edge architectures and you do proper studies and scaling and you have a lot more data and compute and you feed it to these architectures. And now for every class of algorithms, there's a bit different scaling laws. but even the simpler imitative algorithms that people also did in language, predict next token, we can predict next action, right? There is these direct parallels. You can do reinforcement learning in language. We can do reinforcement learning in our simulator, right? These are the parallels. But how exactly things translate is interesting. The ideas translate, the implementation is a little more creative than the usual, just staying on the internet because there is a bit of a domain jump to the real world, right? So that's interesting. The other part is compared to say language, LLMs, you can actually, we have a paper MotionLM from two or three years ago where it was, the idea was, hey, why don't we talk in those motions to make them like language? It turns out it's a very effective idea. Now it models that architecture, which is very LLM inspired. It models future interactions of agents in the environment very well. Like you can think of agents talk to each other with these motions they execute simultaneously in an environment. And now you can leverage the machinery. We have this paper. It's quite effective, right? So that's an example of this. Now, one other interesting point, though, is text is its own simulator. Essentially, you know, you speak text to each other. That's the full environment. You spit out text tokens, text tokens, and text tokens. In our case, we predict actions, we execute actions. Imagine now, but you need the simulator because now based on these actions, you need to envision what the whole environment looks like and how your, whatever, hundreds of millions to billions of sensor points look like. So now you need something that generates them as you act so you can test yourself how you behave over time. is as you make decisions at a fairly high frequency, then there is a known problem, which is called covariate shift. Essentially, decisions can take you to places you may not have seen before in the data. And there you may have particular failure things that you may not observe, unless you push yourself and drive on policy to those places in the data. But to drive there, now you need the simulator. The simulator needs to be realistic enough where you don't go somewhere else entirely as opposed to the actual place you will end up with decision-making. So that's another very interesting point. Like simulation is hard. If you want robust testing, simply having drivers on the road is not a particularly scalable solution if you want to keep iterating on your stack because some of the events happen once in a million miles or more. And you would much rather test them in the simulator. But for the simulator, now you have to solve this problem, which is interesting and challenging. So that's unique in our domain. What if AI agents could work together just like developers do? That's exactly what agency is making possible. Spelled A-G-N-T-C-Y, agency is now an open source collective under the Linux Foundation, building the internet of agents. This is a global collaboration layer where the AI agents can discover each other, connect, and execute multi-agent workflows across any framework. Everything engineers need to build and deploy multi-agent software is now available to anyone building on agency, including trusted identity and access management, open standards for agent discovery, agent-to-agent communication protocols, and modular pieces you can remix for scalable systems. This is a true collaboration from Cisco, Dell, Google Cloud, Red Hat, Oracle, and more than 75 other companies all contributing to the next-gen AI stack. The code, the specs, the services, they're dropping. No strings attached. Visit agency.org, that's A-G-N-T-C-Y dot org to learn more and get involved. Again, that's agency, A-G-N-T-C-Y dot org. Well, Drago, I'm really intrigued by how you kind of helped me form a mental model for the types of problems that are part of the research in this area. I would definitely encourage our listeners to go check out Waymo.com slash research. There's a bunch of papers there that people can find and, you know, read. But also there's even Waymo Open Dataset, which supports research and autonomous driving. So that's really cool to see. It's amazing. I'm wondering, Drago, as you look at this kind of, I see all sorts of things from, you know, scene editing to forecasting and planning to, you know. Did I mention you need to embody the agents in the simulator too? They're not deterministic. Oh, yeah. If you start doing different things, you need to, well, guide the agents to react to you in reasonable ways as well. Otherwise, you know, they'll be reacting to an empty spot where you're no longer, even if you collected the situation with your sensors, as you start deviating from it in the simulator, you still need the agents to do reasonable things, right? Yeah, yeah. Yeah, yeah. That makes sense. And I guess that really kind of gets to my question a little bit, which is, I assume over the last five years, as we haven't chatted, there's been a lot of progress in certain areas and maybe certain challenges that are kind of holdouts that remain very, very challenging and maybe not as much progress as made. So in this kind of autonomous driving research world, can you paint in broad strokes kind of where there has been very rapid progress as things have advanced and maybe some of those of the like the hardest problems to solve that still remain kind of at arm's length, if you will? I mean, I would say one thing for folks that especially are closer to robotics, they will see just like the field of AI is going through some crazy inflection point of both methods people develop and popularity. I think the same is true in robotics and the same is true in AV. I've been in the space over 10 years now just doing AVs. And I would say every couple of years, our capabilities with AI and machine learning dramatically expand due to innovations. And this innovation train has not stopped. So where we are five years later compared to five years before in terms of modeling, I think is still a huge improvement possible. I think we're moving more and more to machine learning power stacks. And I think ultimately understanding how to leverage, I mean, data-driven, right, elegantly, scalably handle this problem with data-driven solutions. And so that's been generally an evolution. And I think we understand how models behave better. I think these latest architectures and the scaling that we mentioned is a really interesting domain. We started studying it, for example, for a while back. So there's this paper we have, for example, of scaling laws of motion LLM architecture. So it's an LLM-like architecture. So you say, oh, what are its scaling laws? How does it compare to LLMs? We have a tech report on this, for example. Still similar kind of learnings transfer as LLMs, but there's some bespoke really interesting things. For example, for that architecture, improving what's called open-loop prediction performance seems to correlate to improving closed-loop performance. That's not always true, right? And we see different scaling factors compared to language, like our motion space is nowhere near as diverse as language tokens. Right? So we need actually for the same set of parameters model we need a lot more data of examples of how the world behaves to scale These are interesting findings generally right So that one I think now as the architectures keep evolving, now there's diffusion and autoregressive models, and now how do each compare and how do they compare in open loop and closed loop? These are all very interesting areas people are studying. I think generally there's this question lately as well of how do you build the best simulator with machine learning and what kind of models are there? And, you know, most recently there's some groundbreaking work, like the Genie model by Google. I don't know if you guys saw it. It's a controllable video, essentially. You can like give motion controls and it dreams the video close to real time of what it should look like. So essentially you're controlling the world you're imagining a bit, right? And you can do this in real time or you can do it of course off board to or offline with even larger potentially models. And so now these models are pre-trained on a large amount of video and text. And so they capture a lot of knowledge of how the real world behaves and it's somewhat complements the knowledge that vision language models capture from the internet corpuses. And so how do these two relate? How do you mix them, right? Which one is beneficial for which type of tasks? These are all interesting capabilities that people are doing. And maybe one other interesting topic is there's a lot of talk about architectures for robots that is some combination of system two and system one architecture. So you guys may have heard it, right? Now, we know that large models are more capable when trained on more data and more compute, but in latency-sensitive situations, if they're too big, you can't run them in real time. so now the question is okay well what if you have a real-time model that handles most cases but then you have a slower model that does better high-level reasoning that runs at some slower hertz that helps guide and understand additionally and provide this to the fast model well needed while still keeping this reflexive capability if someone jumps in front of you you still respond right like these are interesting questions in our domain as well so there's many actually it's a really really uh fascinating time and i think we're studying a lot of these questions just as the whole field is and we have some very interesting findings some of them not published there generally i would encourage people come join us you can well you know contribute to the premier embodiment of physical ai currently out there and you can do interesting research right sounds like fun yes these are all fascinating topics and of course how how to control hallucinations in all these models how do you determine when these models are out of domain and potentially making clear mistakes right this can happen we have research experience with vlms like, you know, like the many of the current ones. But we have a paper called Emma, where we tried to fine tune VLM for driving tasks, got a bunch of learnings. It can be quite good, but it has limitations too, right? So how do you overcome these limitations with additional system design is very interesting. I'm curious as we're talking about this and just I'm really enjoying the conversation. And I work for another company in autonomy, me, but in a slightly different context. And I'm curious, one of the things that is popular in the industry I'm in right now is solving for swarming behaviors. As you're talking about many autonomous vehicles that are having to collaborate in certain ways. I'm curious from your take, that may or may not be an interesting problem for Waymo. I don't know what your thinking is on that, But I would love to know when you look at that space, what are some of the things that you think about and are interesting to you about the notion of many autonomous vehicles collaborating together? That's been a very interesting area that actually there was early research that I was impressed with where people proved that if you can control groups of vehicles, you can improve traffic flow. So to me, we are not exactly swarming yet, autonomous vehicles. They're still a subset, a relatively small subset of the whole traffic. So it's mostly when I think of swarming, I imagine, say, a crowd of 200 people on Halloween all around the car and stuff like this. That's swarming. Or you go to downtown after a Giants game and they're exiting and that is swarming, right? They're the human agents, so to speak, more prone these days to swarming than AVs still. Maybe we'll get more prominent. And I think when you think of coordinating multiple AVs, in our domain already, they do send each other valuable information. For example, if one of our vehicles encounters some very complex construction, it can help pass information about it to the others. If we encounter potentially slowdowns or vehicles getting stuck, that kind of information can be passed. I think controlling jointly vehicles starts becoming interesting now that we're getting to some kind of scale. I think one of the interesting domains where this is interesting is when you want to charge them. So imagine you need to charge now hundreds of vehicles in a location. How do you control all these vehicles so that they all get to the right place and don't block each other? And it's all very efficient. That's one example of where you're fairly swarmed. It's your own warehouse, right? Or a garage where this comes up. And then down the line, potentially, there is opportunities to improve traffic flow for everyone. But that's still maybe in the future. Well, you took us right there, Drago, as we're kind of getting close to an end here. I'd love to talk about that future. And we were talking beforehand, and I was saying I'd love for you to share just what you're excited about. And that could be, of course, in general related to driverless research. It could be kind of in the AI ecosystem generally, something that you're excited about as you look forward to or are thinking about a lot. Does anything stand out so that we can ask you about it? Hopefully not in five years from now, but maybe the next time you're on in less than five years, we can ask you about it. Sounds good. Well, I'm around, so I could come probably faster than in five years' time. In a Waymo. Potentially, yes. I think maybe let's go in a couple areas. First, maybe to parallel this chat we had earlier, maybe first about the product and then a bit about the AI. I think in terms of the product, in a way, with the safety studies we've shown, these are significant improvements over the baseline. And I think we've shown it already at scale with some fairly, starts to become fairly good confidence or some statistical significance at this point. And maybe your listeners, I'm not sure they understand, but even just on the U.S. roads alone, I'm not talking world roads, U.S. roads, 40,000 people die every year from accidents. That's a lot. And I think these gains are starting to become somewhat meaningful. So it starts becoming thinking, hey, maybe we have a mandate to expand. We should be expanding. It will save people's lives. And you think about it. And then the question is, how can I contribute to expanding? I mean, ignore all the, of course, I believe it's a great service. A lot of people love it for a lot of good reasons. we could potentially go into some reasons people found where they love it, right? But like, I think even just from the mandate, okay, you know, it's helping in meaningful way. And I think being out there can make quite a dent against some of these numbers. And so, yes, I would love it to expand more. Now, we're doing that. I think to me, then the question is, what can I do to contribute to it, right? And I think one of the most scalable solutions to tackling dozens of new cities and conditions and countries is machine learning and AI, right? And so now, for me, what I'm excited about is harness all the positive latest trends. I think for me, more directly first into the Weymouth Foundation model work we're doing, where we can directly experiment and deploy them and then try to push more and more of them to contribute similar benefits to the main production systems, which is the onboard driver and the simulator. Right. So that's what I think about. Now, more specifically, if you want to go into AI techniques, I think this question of, OK, how do I endow vision language models with more modalities? right is a fascinating one we actually have some good results already like how do you how do you expand to new modalities say right lighter and radar how do you connect it to actions the model what's an effective way to do this while preserving all the world knowledge that's present in the model uh that you're trying to build on top of is an interesting model and system design challenge and then what i'm also excited is building the simulator i I think, right, as realistic, as scalable as possible. I think the modern technologies, like the Gini model that I mentioned, these world models that are still relatively few and far between, but I think it's a ton of labs are working on them today. I think taking that kind of technology and build the most generalizable possible simulator with it, I think is fascinating. Now, the interesting thing is you could do that, but they can still be very expensive to run. So you still need to show, it's not just enough to show that it can handle very realistic, interesting cases. You still need to show how you can run it without breaking the bank. The amount of simulation Waymo does today to ensure that we're safe, we run like millions of virtual miles every day. That's a lot of things to simulate potentially with so many sensors on board and so on. So there's some very interesting questions in that space. How do we get the maximum possible simulator realism? And how do we get the maximum possible scalable simulator? And there's a very interesting mix of technologies getting involved to do that. That's awesome. Well, I'm certainly excited about that. Like I say, I encourage our listeners to check out Waymo's research page. Lots of amazing stuff to explore there. and folks can see our history right like I think you can see the kind of work and papers people did from I think 2019 to now and there's almost 100 papers there now and maybe it's not 100 only because we may not have uploaded the most recent ones I'll try to make sure we do soon if we're missing any so if the readers go there they can see the full the full set That sounds great. Well, thank you for joining us again, Drago. It was a real pleasure to have you on the show again. And let's not make it five years next time. We'll try to get you on and hear the update sooner than that, for sure. Don't be a stranger. Thank you, guys. Pleasure to be on the show. All right, that's our show for this week. If you haven't checked out our website, head to practicalai.fm and be sure to connect with us on LinkedIn, X, or Blue Sky. You'll see us posting insights related to the latest AI developments, and we would love for you to join the conversation. Thanks to our partner, Prediction Guard, for providing operational support for the show. Check them out at predictionguard.com. Also, thanks to Breakmaster Cylinder for the beats, and to you for listening. That's all for now. But you'll hear from us again next week.

Share on XShare on LinkedIn

Related Episodes

Comments
?

No comments yet

Be the first to comment

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies