The Decentralized Future of Private AI with Illia Polosukhin - #749

TWIML AI Podcast

Tuesday, September 30, 20251h 5m

Spotify Apple

TWIML AI Podcast

0:001:05:03

What You'll Learn

✓Polosukhin co-authored the famous 'Attention is All You Need' paper that introduced the Transformer architecture, and later left Google to co-found Near AI with the goal of teaching machines to code.
✓Near AI initially faced challenges with cross-border payments for their crowdsourcing platform, which led them to explore blockchain technology and eventually launch the Near protocol.
✓As the AI landscape became more centralized and closed-off, Near AI recognized the need for user-owned, decentralized AI systems to prevent a '1984-like' scenario where a few companies control the decision-making intelligence.
✓Recent advancements in confidential computing, such as secure enclaves in NVIDIA and Intel hardware, have enabled Near AI to develop a 'decentralized confidential machine learning' approach that protects user privacy without sacrificing performance.
✓The goal is to remove the trust threshold that users have to navigate when deciding which of their data and contexts they are willing to share with AI systems, by ensuring end-to-end encryption and no access by any single party.

AI Summary

The podcast discusses the decentralized future of private AI, as envisioned by Illia Polosukhin, the co-founder of Near AI. Polosukhin shares his background in machine learning and AI research, including his work on the Transformer architecture. He then explains how Near AI's approach to private AI combines blockchain technology for user ownership and sovereignty with confidential computing techniques to enable decentralized, user-centric AI systems that protect user privacy.

Key Points

1Polosukhin co-authored the famous 'Attention is All You Need' paper that introduced the Transformer architecture, and later left Google to co-found Near AI with the goal of teaching machines to code.
2Near AI initially faced challenges with cross-border payments for their crowdsourcing platform, which led them to explore blockchain technology and eventually launch the Near protocol.
3As the AI landscape became more centralized and closed-off, Near AI recognized the need for user-owned, decentralized AI systems to prevent a '1984-like' scenario where a few companies control the decision-making intelligence.
4Recent advancements in confidential computing, such as secure enclaves in NVIDIA and Intel hardware, have enabled Near AI to develop a 'decentralized confidential machine learning' approach that protects user privacy without sacrificing performance.
5The goal is to remove the trust threshold that users have to navigate when deciding which of their data and contexts they are willing to share with AI systems, by ensuring end-to-end encryption and no access by any single party.

Topics Discussed

#Transformer architecture#Decentralized AI#User privacy#Confidential computing#Blockchain technology

Frequently Asked Questions

What is "The Decentralized Future of Private AI with Illia Polosukhin - #749" about?

What topics are discussed in this episode?

This episode covers the following topics: Transformer architecture, Decentralized AI, User privacy, Confidential computing, Blockchain technology.

What is key insight #1 from this episode?

Polosukhin co-authored the famous 'Attention is All You Need' paper that introduced the Transformer architecture, and later left Google to co-found Near AI with the goal of teaching machines to code.

What is key insight #2 from this episode?

Near AI initially faced challenges with cross-border payments for their crowdsourcing platform, which led them to explore blockchain technology and eventually launch the Near protocol.

What is key insight #3 from this episode?

As the AI landscape became more centralized and closed-off, Near AI recognized the need for user-owned, decentralized AI systems to prevent a '1984-like' scenario where a few companies control the decision-making intelligence.

What is key insight #4 from this episode?

Recent advancements in confidential computing, such as secure enclaves in NVIDIA and Intel hardware, have enabled Near AI to develop a 'decentralized confidential machine learning' approach that protects user privacy without sacrificing performance.

Who should listen to this episode?

This episode is recommended for anyone interested in Transformer architecture, Decentralized AI, User privacy, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

In this episode, Illia Polosukhin, a co-author of the seminal "Attention Is All You Need" paper and co-founder of Near AI, joins us to discuss his vision for building private, decentralized, and user-owned AI. Illia shares his unique journey from developing the Transformer architecture at Google to building the NEAR Protocol blockchain to solve global payment challenges, and now applying those decentralized principles back to AI. We explore how Near AI is creating a decentralized cloud that leverages confidential computing, secure enclaves, and the blockchain to protect both user data and proprietary model weights. Illia also shares his three-part approach to fostering trust: open model training to eliminate hidden biases and "sleeper agents," verifiability of inference to ensure the model runs as intended, and formal verification at the invocation layer to enforce composable guarantees on AI agent actions. Finally, Illia shares his perspective on the future of open research, the role of tokenized incentive models, and the need for formal verification in building compliance and user trust. The complete show notes for this episode can be found at https://twimlai.com/go/749.

Full Transcript

I'd like to thank our friends at Capital One for sponsoring today's episode. Capital One's tech team isn't just talking about multi-agentic AI, they already deployed one. It's called Chat Concierge and it's simplifying car shopping. Using self-reflection and layered reasoning with live API checks, it doesn't just help buyers find a car they love, it helps schedule a test drive, get pre-approved for financing, and estimate trade-in value. Advanced, intuitive, and deployed. That's how they stack. That's technology at Capital One. brainstorm. Try it at notebooklm.google.com. to them. Again, you can have higher intelligent models and you can access all of their context and memory as well in this. All right, everyone, welcome to another episode of the Twimble AI podcast. I am your host, Sam Charrington. Today, I'm joined by Ilya Polosukhin. Ilya is a co-founder of Near AI, but is perhaps best known as a co-author of the now famous Attention is All You Need paper, which introduced the Transformer. Before we get going, be sure to take a moment to hit that subscribe button wherever you're listening to today's show. Ilya, welcome to the podcast. Thank you for having me here. I'm looking forward to digging into our chat. We're going to be talking about the way you're approaching private AI at Near AI. But to get us going there, I'd love to have you share a little bit about your background and in particular, how you ended up as a co-author of this now famous paper to working on the privacy side of AI. For sure, yeah. So my background is very much in machine learning and AI research. I joined Google research because I saw the cat neuron paper, if folks remember that. And I was like, okay, we should do that, but for text and really figure out how to learn. And so Google was a great place to do that. Lots of language, lots of compute. And so my team was working on question answering, machine translation. And as part of that, due to even actually requirements and latency for Google.com, we were trying to figure out how to actually build deep learning model that can consume lots of context and really reason about it without taking a ton of time to process that. And so that's where Transformer architecture comes. now as you know there was a lot of evolution happening in 2016 2017 and was just transformer i was excited to actually put it in production and build a product around this and specifically i was excited about the idea of machines writing code and so in 2017 i left and with my co-founder alex kidanoff we started near ai was the idea that how do we actually teach machines to code And you were one of the first authors of the paper to leave Google. Is that right? Yeah, I was the first one to leave. And so back then, I mean, this was like even, I actually left before the paper actually was officially published. And so that's why I'm the only email that's actually Gmail. Back then, nobody thought what's happening right now is possible, right? So it was like, you know, what we were pitching was kind of somewhere between science fiction and delusion. um and and the reality is back then it wasn't possible the compute wasn't there right the kind of scale at which you know we know this models are needed uh kind of wasn't there and wasn't studied very well um and on our side what we're trying to do is get a lot more training data that is relevant to this problem which is you know people writing some code for descriptions or writing descriptions for some code. And we found kind of a niche of people around the world who would do this for reasonably cheap. It was computer science students in like developing countries. And so we effectively like, I mean, now, you know. You built like a quiz platform or something like that? We built like a crowdsourcing platform for computer science students where they can go and, you know, effectively practice their coding and get paid. Now, the challenge we faced was the students were in China. They were in Eastern Europe. They were in Southeast Asia. In all of those countries, there's some kind of problem with paying. In China, people don't have bank accounts. They have WeChat Pay. In Ukraine, for example, you need to sell half of your dollars on arrival if it's in a foreign currency. There was some countries, nothing worked. PayPal didn't work. Verizon didn't work. And so we started looking at blockchain as like, hey, this global payment network that people are talking about. we can just use it to pay people. And this was 2018. There was nothing that really kind of matched our requirements. We were sending small amounts of money. It was microtransactions, even though it was computer science students, but we didn't want to make it super complicated for them. And so there was nothing that was easy to use and actually cheap. Back then, transaction fees were in dollars. and that's how we actually like okay we should solve this problem right we have this you know we talked with other people other people have this as well so like we can solve this problem and go back to ai kind of with that already in the pocket and so uh that's kind of how near protocol was born uh we we launched it in 2020 it's one of the most used blockchains in the world right now with 50 million monthly active users it's used for payments micropayments uh kind of loyalty points, remittances, and variety of financial use cases, as well as data labeling and other AI workloads as well. We have multiple data labeling projects actually running on top of it. And so now, effectively in 2021-23, as the resurgence of this and actually figuring out the scale of the AI kind of came in, we started now with a renewed lens of the blockchain, looking at it and actually see how can we contribute it and how can we leverage it. And pretty quickly, as you work in blockchain, you get, I would say, indoctrinated by the value where it's kind of user ownership, right? Self-ownership, self-sovereignty. and it was pretty clear that the kind of AI space changed, right? It went from, you know, this was open research, everybody was contributing, the papers published, you know, Transformer Code was out, you know, for everybody to build on top to like kind of everybody was keeping a secret, things are starting to like close up. And as we know also just like from kind of market dynamics, that leads to then, you know, more and more centralization and monopolization of the technology, and in turn becomes kind of the monopoly that we've seen before in other areas. And so the example that I use is AOL. Like imagine if internet was effectively run out of AOL. And if you want to host a website, you need to go to AOL and ask them to do this, right? And similarly, if you're a user, you can only kind of access through this. But in case of AI, because it's such a fundamental, technology right is intelligence as a technology it it's so much more dangerous right because the like i mean internet is information but this is actually like the processing the decision making and so kind of the realization was that if we have only kind of a handful of kind of closed source like profit-driven companies dominating a space, we may end up in 1984 type situation, right? Where you effectively have a company that can be very much not intentionally then effectively deciding how everybody thinks, right? Because that's how we process information. That's how we're going to be successful. And so that's kind of where this idea of user-owned AI was born, which was like, hey, let's combine what we've been building on the blockchain side, which is user ownership, kind of network effects of everybody contributing and participating in comparison to, you know, centralized kind of for-profit company and create an AI that actually is on the user side, on, you know, your side, not their side. And so now there's a lot of, there was a lot of open question. How do we actually do that? Right? Because you have, right and so so that took some time because uh you know all the methods that people use for example for privacy for verifiability extremely expensive right uh there's like homomorphic encryption there's zk proofs etc all of them have you know like 10 000 to 100 000 times overhead and when we're talking about you know machine learning which is already using all the right Possibly we can. Let's layer in two super compute intensive projects. Yeah. And so we've been doing a lot of research. And actually, the interesting thing happened is the hardware. So NVIDIA hardware and Intel, both kind of at a similar time, enabled this mode called confidential computing. So this is inside the chips itself. You can enable it in such a way that even the owner of the hardware, of the compute, is not able to access what computation happening inside. But whoever kind of requested this compute can actually have a certificate saying that this data was run on this compute, let's say Docker, and this is the response. Yeah, I was just going to ask if this came around. This is the like secure enclave stuff that came around when folks were trying to harden Docker containers for multi-tenant environments. That was part of this. I mean, so there was a lot of research and kind of hardware over the years trying to do this. But historically, it's been very like low level. Like you needed to rewrite your programs in kind of like assembly. In fact, like C was like special instructions. And Intel actually in 2024, mid 2024 released on the new fifth generation Xeons, this new kind of generation, which allows to exactly run just dockers. And similarly, NVIDIA enabled work with that specific mode where effectively drivers inside can connect to the NVIDIA as well in the security mode. So that kind of all came together effectively like a year ago. and so since then like you know okay that enables us to that gives us some components but then now we still need the whole system right and so so that's really what you know we're enabling is we call a decentralized confidential machine learning and so confidentiality I think may be important to discuss why it's important well confidentiality is a combination of things right First of all, for the user, like a lot of people usually like, oh, you know, I don't really care. Like there's some people who are like, I don't care about privacy. There's people who are like, I care about privacy, but they go and still use all the products that, you know, take all their data. But there's important kind of interesting effects that there's still something that you're not going to trust. like you know we don't normally walk around with like a hot mic that records everything we say although that is becoming popularized by some ai companies right it is yeah it's a conversation that we're having now in spite of how crazy it sounds or would have sounded a few years ago yeah and and this is this is example of somebody who should definitely use our platform okay and similarly like yeah i mean there's just so much context in your life that right now we're still not putting on on you know into this ai systems and like i think everybody's kind of on different spectrum right like i for example don't trust you know my email and my calendar to maybe an ai company some people would right but then they wouldn't trust as their medical data but maybe they wouldn't trust their banking, bank data, right? So there's always a threshold where you kind of get like, maybe I shouldn't do that, right? And so what we offer is effectively removing that threshold and say, hey, actually, it's all confidential, all end-to-end encrypted for you. And you can trust that there's no other single party, not developers, not operators of hardware, not model developers, et cetera, are able to access it. So it's as if it was local and potentially even better than local because you have like additional security mechanisms. And so is the idea that, oh, there's like so many questions I'm trying to ask here. So like you're describing a system that, you know, many people say like, if I can't run this locally on my machines, I'm not going to run it. But it sounds like what you're trying to do is more like create a system that would allow like remote and cloud-based, but also private to the same level as local AI. Am I parsing that correctly? I mean, I run some of the models locally, but I mean, obviously they're not as intelligent as what you can have in the cloud. They're not as fast. But importantly also, like, you know, even if we have like a smarter model, you still have a lot of things that are happening on a background that you want to like keep happening, right? You want, you know, set up an agent that runs and reads all the news and summarizes it and processes it or workflows, et cetera. So there's always going to be a need for background work and analysis and kind of surfacing it, even as local models improve. So like, I think that's, that's really the, and you know, you want to back up, you want, you want a way to synchronize between devices. Like there's a lot of kind of, um, um, set kind of functionality that you want that requires cloud. And right now there is no really private cloud, right? There's multiple companies usually who actually have access to the data and to the computation. The other thing is for developers, actually, if I'm an application developer, if five, 10 years ago, data was a gold mine, it's becoming actual liability. And it's becoming liability both in Europe, for example, there's GDPR data privacy. We have California data privacy. There all this kind of different data privacy laws that are popping up In China you actually need to pay data tax if you using consumer data Yeah And so the reality is actually like if before this was like really valuable, for many use cases now, it's actually a liability. And so this actually creates a platform where you as a developer don't need to deal with the user data. You're effectively pushing software to them. Again, similar how local works, right? you pushed the application to the user and it runs to their device, you don't need to deal with whatever data. But again, now you have background processing, you can have higher intelligent models and you can access all of their context and memory as well in this. So you kind of get like interesting combinations from both sides. There is another side, which is also interesting. So right now, if I'm a model developer, like actual Frontier AI models, I have an interesting challenge where, you know, if I'm not, you know, the largest labs, which are only few, let's say I developed a new, you know, amazing model for anime characters or whatever. And now I have a choice. I either, you know, I only have some amount of compute. I either use this compute to serve customers or research and develop a new model, right, and continue trading, right? And so if you get a lot of usage and you're kind of limited by that, you then still need to handle all of their data. So you have this kind of challenges with all the DPRs in the world. And now if you say like, oh, but this ton of clouds, GPU clouds around the world, you can just go and rend them off when you need it. And the challenge is actually this model developers don't trust third parties because they're afraid that their model will leak. And this has happened where the model weights have leaked from third parties. What's a specific example of that happening? So Mistral gave its weights to Hugging Face and it ended up on 4chan. Oh, wow. I hadn't heard that. And so, I mean, this is the same reason why a lot of the people build their own clusters because they want to control everything. I mean, obviously, there's like some efficiency comes from like optimizations, but a lot of it is also just like we want to control, you know, like literally have guards on the doors to make sure nobody can access. So we're also solving that problem, interestingly, because of secure enclaves, you can actually encrypt the model weights and they only get decrypted inside the secure enclave. And then user data is also private, right? So effectively bringing kind of privacy from both sides, like kind of model developers don't need to deal with user data. They kind of don't need to, you know, they also don't need to rent the hardware, right? It gets kind of rented at the moment when users using it. And then on the other side, the users don't have access to the model, but they also know their data is not going anywhere. Let's pause here. So you're introducing a twist here. So I thought we had this trend, at least my mental transition was, okay, privacy is talking about this local thing. And then the previous time I interrupted, it was like, no, it's this cloud thing. But now what I'm hearing strikes me more as this decentralized thing where it is actually local, like it's running on my laptop or device, but also on other people's devices. And the reason why I'm saying that is because you're saying like, I don't know, you said something in particular that made me think that like the model's coming to my device and the data's coming, you know, the data's on my device and the training's happening there. And then like, wait, let's maybe back up and like kind of frame what we're talking about topologically, I think. Yeah, topologically, this is compute hardware, let's say GPUs and CPUs, that live in this decentralized confidential cloud. Ah, so it is a cloud, but it's not the same cloud. It's like a decentralized, or it may be. Could you run it in a, like run it on? It's hardware, so probably not running it on. Yeah, you need bare metal to be configured and then join the network. But yeah, I mean, like Amazon, you know, data center can repurpose itself to become a member of this cloud. Yeah, so this is a cloud. So like you as a user accessing it, but it gives you very kind of close guarantees to the local. And you can potentially even add additional like, you know, PIN code to FA, et cetera. Like you can actually restrict things that you may not even have on a local. host because I mean localhost you still can access the hard drive physically here you like you still have like a level of interaction that can provide additional controls but there's no other like there's no third party that can access that like your data and your compute right so this this cloud is kind of the the you know the middle layer as a user as an end user like I'm contributing my data in some way because I want some processing on my data or to access intelligence and the cloud can't access my data and presumably the model provider can't access my data, but the model provider like is providing the model into this cloud and it can access my data and return some results back to me. Correct. Interesting. Interesting. You mentioned at one point that data is a liability to the model providers, like presumably they need access to, if not end user data, like some data to train their models, to tune their models. Like, you know, particularly now in the part of the, you know, AI lifecycle that we're in, like we're finding that one of the key differentiators for, you know, organizations is building this data flywheel where they're getting early users, getting access to their interactions or traces and then improving their models based on that using, you know, reinforcement fine tuning or whatever. Does this process like, I get that user data can be, you know, can have a cost, you know, to the model provider, but that's not all there is to the story. Like they still need that user data to improve. Like how does that play in in this model? Yeah, so I think it's, first, I think that is changing as well, the need for the actual user feedback data. But first, before we go there, why is this liability? So imagine I'm a European, right? I'm in Lisbon right now. I use OpenAI. OpenAI trains on my data. And then I go and I evoke my GDPR law and say, hey, remove all my data. I'm assuming that that is not a resolved issue either because I can't believe it's because no one has asked yet. I'm assuming it's because they've just ignored it. And at one point, you know, there may be a challenge, but we're just not there yet. Yeah. So that's what I mean by liability, right? I mean, you know, maybe OpenAI has the money to pay their fine. Like, I mean, similar how Google and Facebook have paid, you know, billions of dollars in fines. But if you're a smaller model developer, that's why I was kind of using other examples. This effectively can be like existential. Yeah, I get it. It can be a liability. So that's piece number one. The piece number two is actually why I think the space is transitioning from user feedback. So let's use a DeepSeek example. So when DeepSeek released their first model, which was at least at the time from the open rate models was a state of the art. Like, for example, for R1, it did not use the explicit user queries, right? Yeah, we're talking about this transition from user feedback to verifiable results. It's a combination of data labeling, like, indeed human, but you actually want a very specific supervision and you want to control kind of what feedback you get. So human labeling is very, I mean, its own space, right, where there's a lot of know-how how to do it properly. And again, we've been running that for years. So there is synthetic data. There is kind of this indeed verifiable math, physics, logic kind of coding, et cetera, which clearly improve reasoning as well. So there's like a lot of the, I would say, you know, even if we're talking about like shifting the vibe of the model, which I think something that usually credited to, you know, like Claude versus OpenAI, right? like the vibe is different and kind of how even that is you probably want like more trained people to actually give feedback versus just relying on kind of very, very noisy signal that comes from users. Now, I mean, again, this is like, I would say it's not fully transitioned into this and depends on the use cases. So it's important to note. But as I said, like the cost versus reward is shifting and and like it's shifting i think faster than uh at least some people realize it interesting and would you say that that is because you know we're just learning how to manipulate you know the vibe or or output characteristics of a model based on more kind of curated uh you know, training data or feedback, or are there techniques that are enabling this shift? Like, what do you see as the driver of this shift? I mean, I think it's combination. I mean, again, even the original chat GPT, the GPT 3.5, it was like on the human label, but not on the actual user feedback, right? Sure. Yeah. So sure, there's like this transition from RLHF to RFT and like the verifiable stuff and all that. Like, but it sounds like you're, it sounds like you're speaking just like a broader trend. Let's, let's go back to like Google, right? Of Facebook, like Google and Facebook learn from user behavior, like directly, right? There's no, nobody's human labeling like, Hey, which search result? I mean, there's like a little bit of that, but like in mass, it's mostly just signal from user clicks. right and then at large scale it's been processed into like actual signal for the machine learning i think what llams did is kind of transition that to like hey we actually just pass a lot of unsupervised data right not not like user click data and then we add a little bit of a human like a very specific human labeled data and for that we also need like the better the foundational model, almost like the more complex things we want people to label. And so kind of like, again, for our example, we were finding computer science students because we needed people who code. And so just kind of the, you know, some maybe broader data wouldn't be that much that useful. So part of it is we've collected enough data and we've kind of baked the generic stuff into the foundation models so now where the innovation is happening is um you know bringing in more subject matter expertise or specialized skills or indeed like a verifiable thing or combining like you know synthesizing data and then using another model to evaluate it and kind of then human labeling like all those kind of pipelines right yeah yeah and again it depends like for for some things like i mean audio for example same thing right it's like it's great to have a bunch of audio from people that use your product but then again like you may get in trouble so much right we've seen that happening it's better to just pay people to contribute their audio and like sign off the rights and it's and it's like it's kind of pretty straightforward to do that like we have a project on near as well running that uh and so like it like the amount of yeah like I'm kind of that's what I mean like the the shift like how much we can collect data and how how useful that is versus getting a bunch of user data and then dealing with all the repercussions of that uh so I had asked about the you know this like creating a data flywheel and the importance of that for you know companies in the space and your response is like kind of to reinforce the liability aspect of that data and then talk about this broader shift that's happening to more specialized data creation slash collection. I think, yeah, I'm not sure that I'm fully sold that that flywheel thing is not important. But if you don't have anything else to add on that, we can move on. I mean, the other piece we do want is opt-in users can contribute their data. So we do, if you want to contribute, you should get something in result, right? It can be economic, it can be credits, it can be something. And the underlying blockchain has a mechanism to make that tenable in a way that it's not tenable today. Yeah. And so one of the models, for example, that like a new business model that we have been building is right now, I don't know if you saw this post by Dario, where he said like, hey, every model we've built was a successful thing, but we're spending more money. They're each their own businesses. Yeah. So we actually do, I mean, we talked about this like last year, like where effectively every model gets its own token. So like a way to distribute reward and value from the revenue while also rewarding with this token. I mean, effectively you can think of shares where you can actually like whoever contributed data gets a token of this model that they were trained on their data. And then the revenue is distributed to these token holders. Right. as a, like for the model's lifetime. Right. So we can actually like run and guarantee those parameters as well. It was, so what's interesting about that idea is that, you know, when you talk about this idea of the, kind of the broader conversation around like compensating rights holders for content that's consumed like i feel like it it you know gets to be like an antenna like how would you ever do that like you know you're crawling all of the internet like how would you even possibly begin to do that but just you describing this token model it's kind of like oh well you maybe you could do that like you're crawling a site you know that site you know you reserve a token for that site someone needs to verify that they have control over that site to access the token now they have a share of the model and they can, you know, gain in the rewards. And all of a sudden, at least for me, it like clears up a lot of the just like it's not possible feeling about it. Yeah, exactly. And so that's a really like that example, as well as you can contribute data privately. So you can say, hey, I want this data to be used in the model training, but I don't want anyone to see it. Right. For example. Right. And so you can also do that. Or you can say hey I want it to be used at inference time as part of the search retrieval index but not a training So you contribute like it payable Like, for example, for payable data, New York Times can contribute their data fully privately into secure enclaves. And then it's going to be used at retrieval time. It's going to get recorded and they're going to get paid for that. So things like that, you just get a lot of these pieces kind of for free in this new model. So earlier in the conversation, we talked a little bit about closed models versus open models. And did you anticipate, you know, as, you know, ChetGPT happened and kind of the private foundation models began to establish themselves, themselves like did you anticipate that they would be like fast followers of like these open models or has it surprised you how quickly uh open weights and you know to lesser degree open models have um come about and their capability no i was actually i think i was trying to remember it was like february 23 i was talking about like hey open source is gonna catch up um because yeah i mean i I think the challenges with pure open source right now, and again, this is something we're solving, is that I built a model, I released it. Everybody's like, cool, here's the stars on GitHub, stars on Hugging Face, but then you don't make any money, right? And also, because of that, it also becomes less about open source and about open weights, and people keep the source so they can kind of continue doing things. And so in result, we're actually wasting a lot of resources because everybody kind of redoing experiments because we actually don't know what were the things people did to get to these results and so everybody kind of need to reproduce or or poach the people uh who've done it and so kind of the way we think about it is to reverse it where you can actually uh have an open process of training right so the training i mean the data either fully open or this like kind of encrypted data right that you can run over so available but not necessarily transparent yeah yeah uh and you need to pay to access it in whatever your your model token or some other way uh if you if you plan to monetize differently and then the resulting weights are actually also encrypted and and only run in this kind of DCML model, so you can actually monetize it, right? So you can receive kind of revenue from using it. You can say, hey, you know, compute cost is X. I want, you know, 20 cents for each million token over that to go to the model developers and all the contributors to do that. So you kind of can reverse that and get actually like actual open research and collaboration happening there while monetizing the outcomes. And the benefit is you still get all the properties of open source, right? Everybody can use it. There's no way to stop using it. You can even run it on your hardware if you have the modern Blackwell or Hopper. You need to set it up in this confidential mode. And yeah, you can fine tune it. You can do all those things on top. Yeah, I mean, a missing property is the ability to see it and change it, maybe. you know, maybe that's more true for software than for a model. If you have the ability to fine tune on top of it, that's a way that you can change it. There's very little people who go and like do a brain surgery on a model. I mean, there's few that like do, you know, like evolutionary algorithms and other stuff. But yeah, usually it's either fine tune, post train, RL, etc. You know, like the pushback that I would offer is that like, if the best models were open, you know, maybe we'd see a lot more brain surgery and maybe we'd have a more interesting kind of ecosystem of, of results. Like I think, you know, there are people that do that kind of thing for various reasons with, you know, open models, but I think there's a lot less invested in them because they're not as good as the closed models. I don't know. Interesting. You know, I'm curious, like, you know, there are definitely some aspects here that make a lot of sense to me. Historically, you know, there's always been this big barrier that's not at all technical. And that is, will people pay for privacy? And whether that's, you know, currency or, you know, the sheer force of will that's required to jump over the hurdles to achieve it. you know, inconvenience, you know, do you feel like, you know, this is different or it's different in this space or like, how do you think about that challenge? I think twofold. One is, I think there is a audience that will pay for privacy, right? And it's not... I don't think the issue is that there's never that audience is that it's relatively small. It is relatively small, But I think the idea here and kind of, I think everybody's on a threshold, as I said, of like what they feel they would give to the model. And the more you give, the better, like actually we're getting to a stage where models are generally, like they're sufficiently intelligent. And actually it's all becomes about context, like about context management, about tool management, about all those pieces. And so the idea here that kind of we are very much going after is that because it's private, you can actually share a lot more with it. And you will add your email, your calendar, your medical data, your financial data, your crypto wallet, et cetera. And so it's able to manage your whole life, not just like some aspects that you were willing to share. and so and kind of the first cohort that actually cares about privacy that's you know that is our early adopters who we kind of target to really enable this but then kind of as this becomes mature now it's it's appealing to more people because it's a better product and kind of smarter product more intelligent product again not because the model is right away more intelligent but because but because you have more context. The other side of this is actually because of this open research process, what we are aiming for is to have people, again, fine-tuning specialized models for specialized use cases, right? And again, this is where, because they have a monetization embedded into this, right, they can actually, you know, invest effort and time and compute to actually build interesting specialized models that may be better at financial use cases, healthcare, et cetera. And again, all of them are available on your platform. They can compute over your data. And then again, now that all your data is there, it's useful, there's useful models. Other developers as well, again, if I'm building a note taker or something else, I can build a note taker right now that takes all your data, listens to it all the time, sends it to my server, stores it on my server, et cetera, but liability. And also, now it's a hurdle for everybody else to adopt it. Or you can just say, actually, I'm going to build my app and deploy it into this cloud where it runs on your side and saves context there as well in your data store. And so now your data store becomes even more useful because it has all the notes as well there and your AI can now read over those notes. So you don't need to, again, merge those things with Zapier and do all those things. That's the idea. You have network effects of more context, more data, more applications building around the user. And so, yes, it starts with early adopters who care about privacy and kind of layers on as more and more applications and things become available on this cloud. So I want to talk a little bit about the process of making models available to this environment. Like, to what degree is it a simple lossless transformation of, you know, an existing like SafeTensor, GGF file, whatever, some weights file versus like, am I having to rebuild my model in some new paradigm? How does that work? Yeah, I mean, we've actually run, you know, VLLM and customized version of VLLM. So everything that, you know, normally served already works. And if you need something custom, then you can also package your own Docker. I mean, that is less secure from a user side, but it's also available. It's the secure enclave that ensures that there's no kind of man in the middle attack between VLLM and like the model weights and the customer data. Yeah, so exactly. So what's happening is, you know, you checkpoint your, you know, model weights on chain. So, you know, the hash of the model weights and the encrypted hash as well, you know, the encrypted data is uploaded kind of to decentralized storage. And now when somebody wants to run a model, they have an encrypted TLS connection directly into the secure enclave. That secure enclave gives you back the effectively signed certificates that it runs in secure enclave. You can also verify them with our on-chain kind of key management system. And then we also have this concept called multi-party computation. so near blockchain itself kind of uh right now part of our nodes form this multi-party computation network which allows uh inside the secure enclave effectively have its own private key to decrypt things and so that that's kind of one of like how all those pieces work together there's like secure enclaves but also like if you encrypt something right it needs to have like you need to encrypt it with some key that is only known like the private key of this is only known inside secure enclave and nowhere else and so this is where that there's a kind of this npc network enables that and so yeah like effectively you know you encrypt locally you upload it it checkpoints and now when user calls they know that like effectively the secure enclave will respond that this model hash was run on your data here is like signature by nvidia intel etc And you can also verify kind of certificate provisioning. And is that signature created at like, you know, by some process at the boundary or is it, you know, intrinsic to the inference actually happening by this model on this data? like is it a so the signature certifies the docker container that runs inside secure enclave and so docker container is our docker container of is vllm that runs this model hash so we attach that that's what i'm saying if somebody builds custom docker container you can do that but then user needs to trust your docker container got it so the the trust boundary is the container and if the container is doing what you say it's doing then the signature certifies that that was a container that was actually used. Yeah, and in our UI, we have effectively like, you know, like a green shield that you can click, similar like HTTPS works. You go there and it gives you like, it gives you like, hey, it's all correct. And then you can go and actually verify all the signatures and all the certificates and even which GPU it ran on and like other stuff as well. And like links you like effectively to all the relevant Docker Githubs and other things you need to know if you want to like re-verify everything yourself. And so that's maybe an interesting segue into like, you know, where you are with all of this in terms of the, you know, how much of it is, you know, aspirational, how much of it is built. Like you're, you know, clearly you have at least the notion of a user interface, if not an actual user interface. Like how far along are you? Yeah. So we have, I mean, we have a product that we can, I mean, that we're in testing and alpha testing with kind of cohort of users. It's both developer products, so you can buy credits and effectively use confidential inference in your own applications, as well as we have a kind of consumer product, which is private chat GPT effectively, which indeed provides you all of the kind of certification and verification information if you want while using it. And then the custom model right now is that's in development, can be coming out in a few weeks. And remind me, which part is the custom model? So this is where you can encrypt and upload your own model. Oh, got it. Okay. Yeah. And then, I mean, the fine tuning and kind of training that's coming a bit later. And we started off talking about the fact that encryption is computationally complex, like it brings along its own costs relative to the per token inference costs that someone might see, whether it's OpenAI or OpenRouter or something. Where do you expect this to fall, relatively speaking? So it will be affected at the same cost. The overhead on the computation side is 1% to 5%. Yeah, so it's very minimal. And it's mostly just, I mean, it's like encryption, decryption on a boundary. Yeah. And kind of constrained by that, not by computation. And so what do you see as like the main barriers to, you know, near scaling, you know, this approach and getting people onboarded, not just technically, but ideologically and that kind of thing. Yeah, I mean, I think the kind of, as we just discussed, right, like are people willing to pay for it? Yeah. And again, I think... Well, I think what I heard you say is that I'm not really paying, like I'm paying, it's going to cost the same. Right. So actually it's open source, it's open source model. So it's actually cheaper than open AI. Yeah. So, and I did reference the idea that the cost is also convenient to learning a new thing. Like maybe, maybe we should, you know, hit pause on the adoption conversation and go back to, you know, we talked about from a model provider perspective, you know, that's, you know, the same. They just deploy into your container the same model format. What about from an end user perspective? I guess in the general case, they're just using a chat app or whatever. They're using an app, so it's not different from them. So presumably, who is the argument that there's no particular inconvenience cost to anyone in this ecosystem? Everything's kind of the same? or yeah i mean the goal is to make it everything is like either the same or better right like you either don i mean it looks exactly i mean very similar experience right and the idea that because it kind of private you can also have additional features that you wouldn have in uh in the public it also it is open source so you can you know people can contribute you can fork etc right uh so you know you cannot just go and fork chat gpt and add some stuff and have your own version with some things but here you will be able to do that because it's your data right that travels with you so you can like launch your own version with custom improvements. And then everybody who logs in will get all their messages, all their history, all their memory, all the apps with them. So it's also like kind of detaching your identity from specific application. But what do you say to the skeptic that says, you know, it all sounds too good to be true. Like where's the, you know, besides from the fact that, you know, building software is hard, building a company is hard, getting people to fund weird things is hard. Like, what's the hard part? I mean, the hard part right now is it's inertia right now. It's, I mean, I think like there's a cohort of people who are like, hey, I'm already in Google ecosystem. Why would I do anything? Right. Everything is already here. Google has my email and my calendar anyway. Why do I care if it's private? Exactly. Yeah. So I think like there's, again, there's aspect of that. And like, I think people generally trust Google with their data. um so i think that is like that's inertia that we kind of need to address right and i mean not to i mean i work with google so there's indeed there's indeed a lot of security to make sure the data is protected but uh obviously there's still like i mean there is a way for somebody in customer support to help you with your data so there's a way for a third party to have access to your data. There is, I mean, we've seen this with OpenAI, right? There's news that they're effectively scanning all the chat logs and then the ones that are flagged are sent to human to evaluation and then to police, right? So you effectively have potentially humans looking at your chat logs. We had obviously data leaks from Grok and others that like, you know, your chat logs got visible and indexed. So I think that is a backdrop of why we get adoption, but the inertia is the other side. OpenAI already has whatever, half a billion users or a billion users. So we need to have a product that indeed can deliver on if people want to switch and use. How about latency? Admittedly, like encryption is not the latency killer that it was, you know, 10 years ago or so. As a lot of that stuff's getting pushed in the hardware, like, is it an issue for you? Not really. I mean, we have like, it's a little bit higher latency, again, just because like when it goes through the boundaries, there's like some additional delay on kind of encryption. But we're also working on like, it's also engineering challenge of just like streaming encryption and stuff like this, like improving that. I mean, we're using TLS right now, right? There's no additional latency. Like, this connection isn't encrypted, although we're actually going through centralized server. We're actually trying to, you know, figure out how to make it, like, as direct as possible. So, like, ideally, actually, latency is less because ideally right now, yeah, you actually should be accessing the closest GPU, ideally, in your city. There's, like, I mean, I've talked to people who, like, you know, there can be data centers everywhere. people build like mobile data centers, et cetera. And like, you want to find that one, connect to it, run your compute on it, you know, hydrate your data there. Like that's kind of where, you know, this infrastructure can move to and I really deliver on that. Like we're not there, but that's kind of the vision is really actually reducing latency because, you know, you're sitting in Philippines, right now you need to go to Texas or whatever where OpenAI servers are versus like there's data center in Philippines actually are sitting underutilized. And so you should be using it. I guess another question that I have is that I think when you boil it down, a lot of the value proposition here is around trust and the user being able to trust the interactions they're having with AI. and privacy is a part of that, but there are also still fundamental trustworthiness issues with your creation, the transformer, and its ability to give you results that are worthy of your trust. You know, hallucination, for example. Are you doing anything there? Like, do you see that as, you know, how do you think about that as an issue? and what are your thoughts about if or how that gets solved? Yeah, I mean, that is an important question. And indeed, especially as we, like I kind of mentioned, I think the AI will be how we interface with computing. And you ideally, yeah, want to make sure that there is no kind of biases in that that are not represented with your view. So I think there's like few components improving trust in these models. I think it starts with indeed the open process of how these models were trained. Because there's this concept of like sleeper agents, right? You can actually like train things into the model so that at some point during some conditions, it activates and behaves in a different way than normal. And so you can introduce vulnerabilities in the code into a coding model based on some condition. I'm assuming if somebody wants to do a new StacksNet, that's how they're going to do it. So you want to know how it was trained. Then right now, you're running... Even if it's open source model... Isn't that first point an argument for true open source as opposed to just an certificate or a stamp of something? So that's what I mean. You want open source, but you don't need open weights. So you want to know what went in, but the weights itself can be encrypted so you can monetize. But yeah, so that's what I'm saying. We need to open source, not like open weights is right now and just kind of everybody's like, I mean, effectively it's useful, but it's mostly like as if, you know. So was your argument earlier that if you can close and encrypt the weights, then you have greater, you anticipate greater willingness to open the source, like make the training process more transparent? Because you can monetize. Like in this encrypted weight model, you can monetize the usage of the model. Yeah, that's right. So your point was people are holding onto the source because they want to retain some monetizability and they're making the weights open. So that's their piece that they're holding back. So the opening of the weights is the marketing, right? for them to then leverage their closed sourcing to then cook something else, right? Either for specific customers or next model or whatever this is, or attract like to their app. But if you can monetize actually the model you trained and you have the whole process open, and especially if there is like some semi-formal way for people who leverage your learnings as well to then kind of contribute back as well. Yeah, but I mean, like that assumes that there's not a lot of perceived innovation in the training process. And I don't know that folks that are training, you know, frontier models in the, like, ahead of the, you know, at the frontier sense of the term would necessarily believe that, right? I mean, there can be, that's what I'm saying. But I think the lag on the frontier is three to six months. And so I think the benefit here is if you are able to effectively do a training run and start monetizing it, you can leverage that monetization to then potentially reinvest into the next thing, etc. But because you opened, everybody can go and contribute and maybe come up with new ideas, etc. So you're just kind of accelerating this process. Again, this is how the computer science and AI worked before. We would open it up. The papers was out as soon as possible. And now everybody's delaying everything at least six months to a year. So we're solving big picture trust. The first part is openness. Yeah. Well, yeah. Open research, open data, knowing what data goes in and what bias there. Second is verifiability of the inference. Again, right now, there's people complaining that Claude gets dumber in daytime. Maybe it doesn't, maybe it doesn't. We don't actually know. So having verifiability, again, this can be something where when you specifically ask what stocks to buy, there's a rule that says, actually, let me tell you to buy this. Only use SmartClaude. so like verifiability of that that that gives you another level of and then i agree that the the third part is actually like you know especially when we allow those models to go and start do actions like how do we make sure it doesn't do anything um and so so there i think there is few interesting areas i mean there's a lot of research right people are trying you know how to like ground hallucinations how to do all those things so like all of that needs to be done and And again, I think open research process would help a lot with that. But the other aspect of this we're looking at is actually formal verification. So right now, when we talk about software, when we talk about the CI systems, we're testing some use cases. We have evals, maybe we have vibe testing, but we don't really have guarantees that the system will comply to some requirements. And so formal verification is a way to actually achieve that. And formal verification should be happening at the invocation site. So as you call something, right? Like, hey, so the example is like a little bit closer to blockchain space. You know, if you're putting money into something, you want to make sure you can, you know, at least get as much money back, right? That nobody will be able to steal your money. For example, like savings account type thing. and so you want to have the guarantee that when you put in your money that you'll be able to do so you want the proof at that time so so that's kind of like we call it at at invocation uh verification similarly when you're calling a system you can say hey you know you can access my email but you cannot delete anything right you can and you know you cannot leak anything you cannot you cannot do this set of actions, right? You know, and so the system, like the T that runs it, the secure enclaves that runs it, like proves to you that the code that it runs inside indeed complies with this requirement. So this Docker hash, for example, you know, complies with this requirement. And so now you have not just proof of verifiability, but as a proof of specific, you know, preconditions. And if it fails, right, if it doesn't match your preconditions, right, this, This doesn't execute. And so today, what we talked about being possible is this idea that if you can get access to whatever the source code is that's going into the Docker container, you can certify that it's this thing that you know and trust that operated on your data. and what you're referring to is at some point in the future where you can not necessarily have to know what that thing was, but know some conditions about what it's able to do or how it's running. And the verification is not just like a cryptographic hash, but it's like verification and validation of the code and what it's doing. And I mentioned that for folks that are curious about this, just a couple of episodes ago, I had a really interesting conversation with Christian Sagetti about verification, you know, dug deep into all this stuff. He's one of the pioneers on that for sure. And so that was the third of your three things for kind of solving this broader trust issue. Yeah, I think that that's going to be a really important component of that to make sure that we can actually trust the systems. Again, as a user, you're probably not going to go and review Docker container code and ensure. And especially, it gets complicated when things are starting to call each other. Like when you have one AI system calling another AI system, calling some MCP tool, calling another AI system. And so the idea here, you actually have, like the properties are actually composable, right? So like if this service proves to you that it will not, you know, will work in this specific way, when it calls other services, those services need to prove it as well. And if it doesn't, it's not able to call it. So that's, I think, is really important is actually kind of gives you this composable effect, which I think right now is the biggest problem. As you can imagine, the systems become more and more complex and you have all these AIs talking to each other. And we have no idea what they agree on doing and what they end up actually executing. So I think that's going to be a pretty fundamental system change. but it actually requires this verifiability because you need something to guarantee that the code inside is following this property. So you need some containerization that you can trust. Very cool. Well, Ilya, thanks so much for jumping on and kind of talking us through what you've been working on. It sounds like super interesting stuff we've had. I think we've covered this idea of privacy on the podcast to some degree in the past. We've talked quite a bit about differential privacy, talked a little bit about the open mind, PySift kind of decentralized training kind of stuff. But this is definitely a different take and one that I think is kind of right in line with the direction that things have gone from a Transformers Gen AI perspective. So I'm super interested in seeing how it all unfolds. Yeah, I appreciate you having me here and diving in. Yeah, thanks so much. Thank you. you

Share on X Share on LinkedIn