Pushing compute to the limits of physics

Machine Learning Street Talk

Monday, July 21, 20251h 23m

Spotify Apple

Machine Learning Street Talk

0:001:23:32

What You'll Learn

✓Grew up fascinated by theories of everything and subatomic physics, wanting to understand the universe
✓Realized the reductionist approach in physics was failing, leading to a shift towards complexity and quantum information theory
✓Worked on quantum neural networks and quantum computing, eventually joining Google to develop TensorFlow Quantum
✓Believes the way forward is to understand the universe as a complex system and build physics-informed representations and models
✓Sees the need to leverage programmable, parametric complex systems to grapple with the complexity of natural systems

AI Summary

The guest, Guillaume Verdun, discusses his journey from pursuing theories of everything and understanding the universe as a child to eventually working on quantum computing and machine learning at Alphabet. He talks about the limitations of the reductionist approach in physics and the need to understand the universe as a complex system. This led him to explore quantum information theory and building parametric quantum systems to model quantum complexity. He then joined Google to work on TensorFlow Quantum, a product focused on learning quantum mechanical representations of physical systems. The conversation covers his background, the evolution of his interests, and the shift towards a more complex, physics-informed approach to AI and computing.

Key Points

1Grew up fascinated by theories of everything and subatomic physics, wanting to understand the universe
2Realized the reductionist approach in physics was failing, leading to a shift towards complexity and quantum information theory
3Worked on quantum neural networks and quantum computing, eventually joining Google to develop TensorFlow Quantum
4Believes the way forward is to understand the universe as a complex system and build physics-informed representations and models
5Sees the need to leverage programmable, parametric complex systems to grapple with the complexity of natural systems

Topics Discussed

#Theories of everything#Quantum information theory#Complexity science#Quantum computing#Physics-informed AI and machine learning

Frequently Asked Questions

What is "Pushing compute to the limits of physics" about?

What topics are discussed in this episode?

This episode covers the following topics: Theories of everything, Quantum information theory, Complexity science, Quantum computing, Physics-informed AI and machine learning.

What is key insight #1 from this episode?

Grew up fascinated by theories of everything and subatomic physics, wanting to understand the universe

What is key insight #2 from this episode?

Realized the reductionist approach in physics was failing, leading to a shift towards complexity and quantum information theory

What is key insight #3 from this episode?

Worked on quantum neural networks and quantum computing, eventually joining Google to develop TensorFlow Quantum

What is key insight #4 from this episode?

Believes the way forward is to understand the universe as a complex system and build physics-informed representations and models

Who should listen to this episode?

This episode is recommended for anyone interested in Theories of everything, Quantum information theory, Complexity science, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

Dr. Maxwell Ramstead grills Guillaume Verdon (AKA “Beff Jezos”) who's the founder of Thermodynamic computing startup Extropic.Guillaume shares his unique path – from dreaming about space travel as a kid to becoming a physicist, then working on quantum computing at Google, to developing a radically new form of computing hardware for machine learning. He explains how he hit roadblocks with traditional physics and computing, leading him to start his company – building "thermodynamic computers." These are based on a new design for super-efficient chips that use the natural chaos of electrons (think noise and heat) to power AI tasks, which promises to speed up AND lower the costs of modern probabilistic techniques like sampling. He is driven by the pursuit of building computers that work more like your brain, which (by the way) runs on a banana and a glass of water! Guillaume talks about his alter ego, Beff Jezos, and the "Effective Accelerationism" (e/acc) movement that he initiated. Its objective is to speed up tech progress in order to “grow civilization” (as measured by energy use and innovation), rather than “slowing down out of fear”. Guillaume argues we need to embrace variance, exploration, and optimism to avoid getting stuck or outpaced by competitors like China. He and Maxwell discuss big ideas like merging humans with AI, decentralizing intelligence, and why boundless growth (with smart constraints) is “key to humanity's future”.REFS:1. John Archibald Wheeler - "It From Bit" Concept00:04:45 - Foundational work proposing that physical reality emerges from information at the quantum levelLearn more: <a href="https://cqi.inf.usi.ch/qic/wheeler.pdf" target="_blank" rel="ugc noopener noreferrer">https://cqi.inf.usi.ch/qic/wheeler.pdf</a> 2. AdS/CFT Correspondence (Holographic Principle)00:05:15 - Theoretical physics duality connecting quantum gravity in Anti-de Sitter space with conformal field theory<a href="https://en.wikipedia.org/wiki/Holographic_principle" target="_blank" rel="ugc noopener noreferrer">https://en.wikipedia.org/wiki/Holographic_principle</a> 3. Renormalization Group Theory00:06:15 - Mathematical framework for analyzing physical systems across different length scales <a href="https://www.damtp.cam.ac.uk/user/dbs26/AQFT/Wilsonchap.pdf" target="_blank" rel="ugc noopener noreferrer">https://www.damtp.cam.ac.uk/user/dbs26/AQFT/Wilsonchap.pdf</a> 4. Maxwell's Demon and Information Theory00:21:15 - Thought experiment linking information processing to thermodynamics and entropy<a href="https://plato.stanford.edu/entries/information-entropy/" target="_blank" rel="ugc noopener noreferrer">https://plato.stanford.edu/entries/information-entropy/</a> 5. Landauer's Principle00:29:45 - Fundamental limit establishing minimum energy required for information erasure <a href="https://en.wikipedia.org/wiki/Landauer%27s_principle" target="_blank" rel="ugc noopener noreferrer">https://en.wikipedia.org/wiki/Landauer%27s_principle</a> 6. Free Energy Principle and Active Inference01:03:00 - Mathematical framework for understanding self-organizing systems and perception-action loops<a href="https://www.nature.com/articles/nrn2787" target="_blank" rel="ugc noopener noreferrer">https://www.nature.com/articles/nrn2787</a> 7. Max Tegmark - Information Bottleneck Principle01:07:00 - Connections between information theory and renormalization in machine learning<a href="https://arxiv.org/abs/1907.07331" target="_blank" rel="ugc noopener noreferrer">https://arxiv.org/abs/1907.07331</a> 8. Fisher's Fundamental Theorem of Natural Selection01:11:45 - Mathematical relationship between genetic variance and evolutionary fitness<a href="https://en.wikipedia.org/wiki/Fisher%27s_fundamental_theorem_of_natural_selection" target="_blank" rel="ugc noopener noreferrer">https://en.wikipedia.org/wiki/Fisher%27s_fundamental_theorem_of_natural_selection</a> 9. Tensor Networks in Quantum Systems00:06:45 - Computational framework for simulating many-body quantum systems <a href="https://arxiv.org/abs/1912.10049" target="_blank" rel="ugc noopener noreferrer">https://arxiv.org/abs/1912.10049</a> 10. Quantum Neural Networks00:09:30 - Hybrid quantum-classical models for machine learning applications<a href="https://en.wikipedia.org/wiki/Quantum_neural_network" target="_blank" rel="ugc noopener noreferrer">https://en.wikipedia.org/wiki/Quantum_neural_network</a> 11. Energy-Based Models (EBMs)00:40:00 - Probabilistic framework for unsupervised learning based on energy functions<a href="https://www.researchgate.net/publication/200744586_A_tutorial_on_energy-based_learning" target="_blank" rel="ugc noopener noreferrer">https://www.researchgate.net/publication/200744586_A_tutorial_on_energy-based_learning</a> 12. Markov Chain Monte Carlo (MCMC)00:20:00 - Sampling algorithm fundamental to modern AI and statistical physics <a href="https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo" target="_blank" rel="ugc noopener noreferrer">https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo</a> 13. Metropolis-Hastings Algorithm00:23:00 - Core sampling method for probability distributions<a href="https://arxiv.org/abs/1504.01896" target="_blank" rel="ugc noopener noreferrer">https://arxiv.org/abs/1504.01896</a> ***SPONSOR MESSAGE***Google Gemini 2.5 Flash is a state-of-the-art language model in the Gemini app. Sign up at https://gemini.google.com

Full Transcript

That was the most technical podcast I've ever done. I think that was one of my favorite conversations for sure. Hey, I'm Guillaume Verdun. Growing up, really I was pursuing theories of everything I want to understand the universe. I was a big fan of Feynman, Stephen Hawking growing up. And I was seven, I was like talking about subatomic particles. As all seven-year-olds do, right? I guess I got swept up in the school of thought that was, you know, a generalization of Wheeler's It From Bit, which was the It From Qubits program seeking to unify theoretical physics through quantum information theory. So viewing everything in the universe as one big quantum computer running a certain program or self-simulation. We have proof of existence of a really kick-ass AI supercomputer that we're both using right now to talk to each other. It's our brains. That, I would argue, is a thermodynamic computer, right? Because there's master equations describing the chemical reaction networks in your brain of neurotransmitters hopping around. And so, you know, if we're doing very similar physics, but with electrons sloshing around a circuit, then, you know, there's a very much stronger chance that we could run a very similar program in a very similar fashion, arguably with even more energy efficient components because electrons are much lighter than big neurotransmitters. This podcast is supported by Google. Hi folks, Paige Bailey here from the Google DeepMind DevRel team. For our developers out there, we know there's a constant trade-off between model intelligence, speed, and cost. Gemini 2.5 Flash aims right at that challenge. It's got the speed you expect from Flash, but with upgraded reasoning power. And crucially, we've added controls, like setting thinking budgets, so you can decide how much reasoning to apply, optimizing for latency and costs. So try out Gemini 2.5 Flash at Aistudio.Google.com and let us know what you built. Hey, I'm Guillaume Verdun. I'm the founder of Extropic, a company pioneering thermodynamic computing, a new form of computing for probabilistic inference using exotic stochastic physics of electrons. Formerly, I was working on quantum computing and machine learning at Alphabet. And I also happen to be the founder of a philosophical movement called Effective Accelerationism under the pseudonym Beth Jezos online. Happy to be here at MLST. Hello, everyone. Welcome to Machine Learning Street Talk. I'm your host for today, Maxwell Ramstad. Very excited to be here. I'm subbing in for Tim Scarfe, who unfortunately is down with a little bit of COVID. But I'm sure we'll have a really interesting conversation. It promises to be a super interesting conversation because we have a really awesome guest for today. Very excited to have Guillaume Gil Verdun, also known as Beth Jesus, on X and just generally online in the meme space. Very excited to have you, Gil. How are you doing? Yeah, thanks for having me. Thanks to MLST for hosting. and it's great to talk to you again, Maxwell. I think this collaboration, at least for this conversation, has been a long time coming, so it's a great platform to do it. And yeah, let's get to it. So you're a very interesting character, Gil. You're a fascinating researcher. I think you're a big presence in the meme space as well. Do you want to tell us a little bit about yourself? Tell the audience a little bit about your trajectory. How did you wind up being the CEO of a thermodynamic hardware company and also basically like a well-known person in the meme space as well? Yeah, I mean, it's been a long journey. I guess I've lived many lives. Growing up, really, I was pursuing theories of everything I want to understand the universe, as one does. and eventually leverage that knowledge to expand civilization to the stars. Originally, my plan was to become a theoretical physicist and work on quantum gravity and figure out some exotic form of propulsion that would obviously, I thought that was the bottleneck for the expansion of civilization was the speed of propulsion. And so I went down that path, was a big fan of Feynman, Stephen Hawking growing up and and of so this is a like a childhood project like since you were a wee boy yeah you've dreamed of yeah no I was seven I was like you know talking about subatomic particles and I was all seven year old yeah and I want to be a physicist and I want to be I would I used to call it astrophysicist I didn't know what theoretical physicist was but I want to be an astrophysicist. I want to work on, you know, FTL travel and stuff like that. And over time, you know, I did the career path for theoretical physics. You know, I did math and physics in undergrad at McGill. And then I went to Waterloo, the perimeter institute for theoretical physics, you know, met some of the greatest minds on earth, you know, they're really strong caliber. But what I realized going through theoretical physics was that the reductionist approach to physics was failing us. Right. You know, originally I was like, okay, well, just for the benefit of our audience, what do you mean by the reduction? Yeah. Yeah. Yeah. And what do you think were the problems? Yeah. And, and, and I guess, you know, these two sides of my life now came from my, my reaction to realizing this failure and the reductionist approach is kind of the traditional way we've done physics. It's kind of the very sort of rational old school way to do things, which is like, oh, I want to have a model. Maybe it's an equation. It's some analytic model. It has a few parameters. And I've reduced all of physics to a very simple model with few parameters. And ideally, the least amount of parameters possible, you know, Occam's razor principle. And, you know, these equations with these few parameters allow me to have predictive power over the world. And with this predictive power, I can steer the world. I can do things, right? I can predict and control it. And that's the goal of physics is to have better models of the world. And what we realized was that, you know, I was part of this, I guess I got swept up in the school of thought that was, you know, a generalization of Wheeler's It From Bit, which was the It From Qubit program, which, you know, is seeking to unify theoretical physics through quantum information theory, right? So viewing everything in the universe as one big quantum computer running a certain program or self-simulation. And that framework's actually a really useful kind of unifying framework to understand all sorts of systems. And people were studying sort of systems that, you know, have some connections between quantum gravity and regular quantum mechanics in the context of holography. So, you know, ADS, CFT for those that are familiar. But really what I realized there was that even whether, you know, whether or not ADS-CFT was going to work, it was clear that the complexism approach to physics was the way forward. The universe is very complex. You know, even beyond just quantum gravity, just looking at condensed matter systems, you can have equations describing the microscopics, but you can't predict the emergent properties. You actually have to apply an amount of computation that is similar to that of the universe to have a prediction of what happens at a larger scale. Not all equations you can renormalize analytically. Renormalize means get an effective physics at a larger scale or a more coarse grain scale. That's how we go from different types of physics, from quantum field theory to quantum theory to statistical mechanics, and then eventually classical Newtonian mechanics, and then even beyond that we get to general relativity and so on. And so clearly it was the way forward was to understand everything as a complex system. But, you know, that was a sort of ego death because in a way, you know, the human can't necessarily be the hero of the story. The model is no longer interpretable. Right. If you have something like a deep learning system, back then we were looking at tensor networks, which are a different sort of parametric complex system that we use to model, let's say, condensed matter systems or quantum gravity systems. If you look at these systems, they're no longer interpretable, right? I could have these big networks and they're just as opaque as the system that I was trying to predict. But with numerics and with a lot of compute, they actually, you're kind of, you know, not deferring agency, but you're leveraging a programmable parametric complex system to grok a complex system of nature for you. and in a way it was like okay well maybe maybe i can't be the hero of the story i won't be the one to figure out you know the grand unifying theory of physics but maybe i could build a computer or computer software that that can understand the universe or chunks of the universe for us and actually you know the more i dug into it it was clear that this was going to be the way forward right and um so built basically building a digital brain to overcome the limitations of our fleshy brains. Yeah, but even that's correct. But then, you know, at the time we were studying, you know, I came at it trying to understand quantum mechanical systems. So you had systems that have quantum complexity. And there you can show that actually classical representations will struggle to capture quantum correlations. So already, you know, I came at it from a non- anthropomorphic form of intelligence. You know, to me, intelligence was just, I'm trying to compress, learn compressed representations of systems through parametrized distributions, or in my case, it was parametrized wave functions and density matrices, right, which is the more general form. So that actually got me into a field, you know, initially, it was mostly numerics with tensor networks, but there was another institute I was part of at the University of Waterloo, which was the Institute for Quantum Computing, right? And there is very interesting because we had these controllable quantum systems. We had these control parameters, right? And it became clear to me that there was maybe a way to run these representations we're trying to learn of quantum systems that are parametric on a parametric programmable quantum system, right? And so now we could fight fire with fire, right? we can have parametric quantum complexity that's tunable to understand quantum complexity of our world. Right. And that's, that, that was actually my entry into artificial intelligence. And, um, you know, I, I wrote up some of the first algorithms for, uh, quantum neural networks. So that's what we called them. They have nothing to do with actual, uh, neurons. Uh, they're just parameterized quantum programs, but essentially, you know, one of us was one of the first quantum computer programmers, was the first user, Rigetti, and so on, which was one of the first startups. But essentially, that got me on the radar of Google, who then approached me and a team I built at Waterloo, an open source team, to go work at Google and build a product known as TensorFlow Quantum, which was a product focused on creating software that allows us to learn quantum mechanical representations of quantum mechanical systems in our world. And that was my entry into AI, which, you know, given that this is a technical podcast, I'm kind of, you know, going a bit deeper than I usually do, which is really nice in the backstory here. But over time, what I realized was that actually this sort of, you know, physics-based approach, right? Like if we have, if we're trying to understand the physical world with representations, we want to have, you know, physics-informed representations or physics-inspired representations, you know, just like to understand a quantum mechanical system, I use a quantum neural network. And then the dual to that is if I want to run and learn and train these representations that are physics inspired or physics based, then I need a physics based accelerator, right? That where that type of program kind of fits natively and can be executed as physics. And, you know, in my case, initially, it was I want to understand quantum mechanical systems. I want to learn quantum physics based programs and I run them on quantum physics-based processors, right? And so... Then I have a clarification question just for the benefit of the audience, because we have a very technical audience, but let's just make sure everyone follows. So what do you mean precisely by physics-based computing? So the question I ask, because it might confuse some of our listeners, in some sense, all computing is physics-based, right? Like it's all shuffling around, and electrons shuffling around in circuits and so on. What do you mean specifically by physics-based computing, and how would that differ from the kind of commonsensical or classical notion? Yeah, I mean, a quantum computer is a computer that leverages quantum mechanical evolutions as a resource, I would say. There's more formal definitions of what a quantum computer is, But a physics-based computer, I would say, of course, like every physical thing is embedded in the physical universe. So everything is physics-based. But it's true that it's kind of a continuum, right? Because you can have physics-inspired computers that are doing digital emulations of physics-inspired algorithms. So I would say that's physics-inspired. I would say a physics-based computer, at least for quantum mechanical computers, is one where you have a sort of parametrized Schrodinger evolution that you control. And you can show that it has some unitarity. And then in our case, and we'll get to that, you know, we're building physics-based stochastic computers or stochastic thermodynamic computers. So they're parametrized stochastic evolutions that we control. but in principle you could try to emulate a quantum mechanical computer or emulate a stochastic computer digitally. Of course in the case of a quantum mechanical computer there's proven separations of complexity there and there's experiments showing that you can't emulate them at scale. In our case with stochastic computers there's no complexity class separation it's just kind of orders of magnitude constant speed up or energy efficiency gain which practically makes them intractable to emulate at scale. Not impossible, though, of course. But yeah, so this journey, and feel free, I think you have a better definition of physics-based computers. I don't know if you want to go into it. No, it's consistent with what you were saying. I mean, at a very high level, I would say that a physics-based computer is essentially a computer whose components exploit, you know, the actual physical properties of the components in order to, you know, perform computations more efficiently. So rather than like, you know, use digital computation to simulate what, for example, is like a quantum phenomenon or a thermodynamic phenomenon, you actually use, you know, the probability densities that these systems embody, for example, at equilibrium and, you know, exploit those properties in computation. Would you agree with that as a broad definition? I mean, I know I'm not doing this technical. That's for a thermodynamic computer, right? But yeah, so there's different kind of physics and different kind of corresponding physics-based computers. Arguably, you can have one, you know, you can have probabilistic or quantum botonic computers. Of course, you can have, you know, you can have physics-based computers in all sorts of substrates. Technically, you could probably train a neural net from puddles of water, right, if you wanted. And that would still be a physics-based computer. So I think, you know, it's kind of a very broad community. I would say physics-based computing, if we include quantum computing and alternative computing, and there's all these archipelagos that don't necessarily talk to each other. And I think there could be, you know, more work in trying to unify the community because there's a lot of tools that could cross-pollinate between these different substrates. But to get back to our, I guess, our main line here, which is, you know, how did I end up going from quantum to thermo? Well, you know, I think going from being one of the first programmers of AI on quantum computers and seeing the field kind of evolve over several years. I mean, I was almost eight years in quantum computing. It wasn't progressing as fast as I wanted. And I was seeing a sort of writing on the wall where actually to have a quantum mechanical computer, you try to keep the computer at perfect zero temperature, right? Zero entropy. So you're constantly pumping out entropy that is seeping into the system through noise, right? And that's quantum error correction and fault tolerance. And it turns out that most of your computation, most of your energy is going to be sunk into that pumping. So it's like an algorithmic form of refrigeration. Um, and to me, it's, it's like, okay, well, you know, it's just like, um, you know, if you have a fridge and you, you're trying to keep your freezer really cold, it's going to use up a lot of energy if it, if the rest of the room is at room temperature, because that gradient of temperature is, is, is, is very difficult to, to maintain. And so it's like, okay, what if we had a hotter physics-based computer? Maybe it'd be much easier to maintain, you know, just the first thought, right? And so what would that look like? Ah, well, you know, in the limit where, you know, you let the noise seep in, things become mostly stochastic in the quantum, the quantumness kind of fades out. And we know this from sort of renalization of physics. That's why we don't have to care so much about quantum physics day to day, because it gets washed out at larger scales, right? And things become... is dwarfed by the scale of the components of the system that you're considering. So you can basically ignore the noise for the most part, assuming that you cool the system appropriately. And in quantum computing, it's almost the other way around. What you want to do is basically to engineer the scale of the fluctuations that are so big, basically, that you can kind of ignore them relative to the component size. then and this is just a you know uh an attempt at framing what you're doing uh feel free to disagree with this and then in the in the thermodynamic regime what you have is fluctuations that kind of coexist at the same scale as the components that that the yeah the hardware is made out of yeah i mean you know uh we're in wimbledon weekend right now uh in london and you You know, you don't have to understand the vibrations of the molecules at the molecular level in the tennis ball to be able to predict its trajectory. Right. You kind of get a mean field sort of prediction. Right. So that's that's that's Newtonian. At the quantum scale, it's it's more subtle because it's no longer probabilistic fluctuations. And, you know, you're essentially it becomes all purely quantum superpositions. Right. And the ideal quantum computer has no probabilistic uncertainty. And in fact, when there is probabilistic uncertainty, you know, that's when the quantum computer loses its quantum coherence. It becomes non right But we have a very similar problem with classical computers that i realized which was you know we use all this power to be um again to to have signal that so strong relative to to the jitter of electrons and the amplitude of that jitter um in order to maintain determinism right we want our transistors to be absolutely on or absolutely off because you know we have a lot of transistors and we want to the computer to be in a deterministic state that we have control over, right? Because, you know, humans like to be in control, right? When we're not in control, there's anxiety, right? But, you know, I think we got to learn to let go, you know, just like I learned to, in a way, let go of control mentally by giving up the reductionist approach, which is like the human interpretable approach to physics. We have to sort of, you know, just like we did with deep learning, we let go of software 1.0 imperative programming and let gradient descent be a better programmer than you. And that was very humbling for some. Some are still trying to hang on for control, right? With ML interpretability research and all this AI safety research and still want to feel like they're in control or that they understand the complex system instead of kind of letting it do its thing and letting it figure out what's best, right? And that's also kind of my main gripe with the whole AI safetyist field. Of course, there should be some research in interpretability, but I just don't think they're going to go that far. But to close the loop here, I think we should also literally sort of let go, loosen our grip on electrons and hardware. And that's what we do, right? Like essentially, we go from having a very tight grip and yanking these signals around to kind of loosening our grip, letting there be fuzz and kind of gently guiding the signals. Right. And so we can. So in some sense, you're kind of switching teams. Right. So, you know, in the more classical kind of way of thinking of this, what we're trying to do is to keep the noise at bay. Right. To make our systems as non-stochastic as possible. Filter it, pump it out, pay the price. That's right. But exactly. What you're doing is saying, no, no, no, like harness it. Right. Yeah. But as we know, like from Maxwell's Demon, you know, the classic fable, I don't know if we want to go through it right now, but, you know, there people can look it up. Maxwell's Demon shows you that knowledge comes at a cost, right? Like reducing. That's right. Reducing entropy in a system, keeping something in a deterministic state always costs you energy. and so at if every clock cycle we're trying to you know prevent the the the classical or quantum computer from decaying to a naturally probabilistic and relaxed thermo closer thermal state then we have to pay the price we have to pay energy to maintain determinism and reduce entropy whereas a thermodynamic computer is not always at equilibrium but we're dancing much closer to equilibrium and it's it's much cheaper energetically to sit in those states and maintain those states right do you want to maybe just like walk us through you know how are you actually like designing these things uh for a technical audience i guess they're they're markov chain montecarlo accelerators right um and we support uh discrete variables continuous variables mixtures of the two um and essentially we found a way to harness uh natural stochastic physics of electrons in order to accelerate Markov chain Monte Carlo. Right. And there's some subtleties on how we do that mapping, but you could just imagine we're embedding into the stochastic dynamics that are parametric and that we have control over those parameters. We're embedding our continuous or discrete variable MCMC into the dynamics of the electrons on the device, right? So it is partly analog. It is stochastic, but it's actually our most recent chip is a mixed signal chip. So we use digital classical components and sort of stochastic electronics. And the two have to interact just like, you know, in a Metropolis-Hastings algorithm, you have some components of the algorithm that have some entropy, some proposals, and then you have some non-random parts, right? Like computing, acceptance or rejection. But I guess this helps you address the kind of paradoxical situation where, because I was just saying in some sense, from the hardware's perspective, you're switching sides and joining the side of noise. But from the point of view of software development in the context of AI and machine learning, this has already happened, right? Like there's a kind of paradox in our current architectures where, as you just described it, we spend inordinate amounts of energy and effort pumping the noise out of the system. But then with sampling-based methods, we then reintroduce it through the software. Yeah, and you could think of even a transformer. You have a softmax layer at the end. A transformer is a big probabilistic computer already. So we're running probabilistic software on this stack that was made for determinism. That's highly inefficient. um and then now with test time compute right thinking at test time you know monte carlo tree search and um you know all sorts of rl rollouts um that's a that's a monte carlo algorithm right so that can be um yeah tree of thoughts it could be discrete diffusion it could be um now now even for for content there's uh diffusion models those are also it's also a Markov chain, right? You're reversing a Markov chain there. So algorithms, the biggest workloads that are eating the world and consuming a ton of energy are actually probabilistic workloads. They are probabilistic graphical models. And so it's kind of funny, like there's just like a, I guess I think it's just the community. There's kind of the new additions to machine learning that don't learn the fundamentals. It just goes straight to transformers or whatever's hot. And then I tell them we're building an accelerator for probabilistic graphical models. And it look at me like, what are you talking about? It's like, actually, you're running probabilistic graphical models all day. And it's kind of funny that, you know, there's a gap there. But hopefully, you know, I think coming on this podcast and hopefully getting more in the machine learning community to talk to each other, you know, I think probabilistic ML hopefully will have a resurgence. And of course, we're trying to stimulate that, right? Because what, you know, there's a co-evolution, What evolves together fits together, right? And there's an evolution of hardware and algorithms. And we have as investors two of the authors of the Transformer paper. And, you know, they say, they kept telling us Transformers are not sacred, right? They were what worked on the hardware we had at the time, which was Google TPUs. And, you know, the hope is that, you know, there's going to be new foundation models or new models that run more natively on probabilistic hardware that are really efficient on our hardware. and we're not just importing current day models, but hopefully, you know, if you view the algorithmic landscape as having a certain fitness function of, you know, what, in terms of performance you get, in terms of cost, but also it's induced by the current day hardware that's available at scale, right? And if you change that hardware substrate to a different substrate that has, you know, different preferences in terms of the structure of the algorithm that you're running, then you're changing the fitness landscape. And if you have a sudden shift in the fitness landscape, you have a sort of Cambrian explosion. You have this sort of high temperature phase of search, right, in the landscape. And that's hopefully what we're going to cause, right? And that's very disruptive. So, you know, incumbents have all the reasons to be skeptical. But at the same time, it's funny that, you know, even the current sort of incumbent algorithms are converging towards sampling and more probabilistic algorithms, which wasn't obvious when we started the company in 2022, but that was the prediction, right? That there would be a sort of, you know, my joke is that we're kind of interpolating between kind of deterministic forward pass models and full EBMs, right? And you can view sort of diffusion models as an interpolation there. I mean, correct me if I'm wrong, but your view is that this is essentially an inevitable development of tech, right? Like you've talked about the thermal danger zone and kind of the inversion of this Moore's law into what you call Moore's wall. Do you want to kind of run us through the logic there? Yeah. I mean, if you, you know, I came from quantum computing there. If you try to scale up your system, you have noise seep in. So, again, you're kind of you're going from the quantum zone of physics to the thermodynamic. You're kind of edging on it. And then if you're in the classical zone, but you try to get smaller, then you're getting in a thermal zone again, right, in terms of the physics. And then the jitter of electrons, because your transistors are so small, the fact that there's very few electrons and the transistors are small, the jitter of all the electron, the electron population matters compared to the amplitude of the signal. And then, you know, typically we like to have our computers have an error rate of 10 to the minus 15. So they can do many, many operations before there's an error. so that you don't actually have to run error correction. The only computers that usually run error correction are those that we send in space because of radiation. But the reason we can't scale down deterministic computers is because of this thermal danger zone. And they're innovating in all sorts of ways. They're doing all sorts of weird fins and all sorts of weird designs that are sort of hardware-level error correction. But those are always going to be ad hoc in some sense. right? Because what you're going to run into is just the scale problem. Yeah. As you miniaturize at some point, like these wires are going to become... To use less power, you need to use less charge. That's not controversial. And when you get to very little charge, the fact that charge is discrete gives you noise, period. Right. And so you're going to have to go thermodynamic at some point. Right. And, you know, at the end of the day, we have proof of existence of a really kick-ass AI supercomputer that we're both using right now to talk to each other. It's our brains. And that, I would argue, is a thermodynamic computer, right? Because there's master equations, right? You know, describing the chemical reaction networks in your brain of neurotransmitters hopping around and you could just... Operating really close to the Landauer limit as well, right? Right. And so, you know, if we're doing very similar physics, but with electrons sloshing around a circuit, then, you know, there's a very much stronger chance that we could run a very similar program in a very similar fashion, arguably with even more energy efficient components because electrons are much lighter than big neurotransmitters. And so, you know, my... what you have is like a programmable bowl, right? Well, you can think of these chips as like a kind of programmable bowl. Do you want to maybe tell us how they're used for computation? I think, so, you know, I think we started, for context, we started off in superconductors and there, and the reason we started in superconductors was because there's this beautiful theory in quantum computing, actually, called circuit quantum electrodynamics, where you go from, here's my circuit and here's my Hamiltonian, right? directly from the circuit. And, and, you know, I think that should win a Nobel prize or something someday. But that, that's basically how we engineer quantum mechanical systems to have certain desired physics. Right. And, and, and the point there is that a Hamiltonian gives you a notion of, of energy, right. In the, in the classical regime. And so we had a very direct way to go from, Hey, I want this energy. Here's how I design my circuit. Right. And so that's, that's where we started uh furthermore it was the way to build the most macroscopic thermodynamic computer you can build um the any any other companies with claims to build a much more macroscopic thermodynamic computer either did it digitally or emulated it so it's not a real thermodynamic computer uh but um but you know we had to super cool it because it's too big right and and you know boltzman distribution um is as you go smaller you have higher frequencies you can go to higher temperature and then we move to to silicon but to to answer your question about energy-based models um yeah you could view you could view um you could view our chips as a time dependent programmable energy function right and you have something akin to Langevin dynamics more generally diffusion in in that landscape um and Langevin dynamics is is uh happens to also be uh an MCMC algorithm, right? So there's a perfect match there between the algorithm that you can use for Bayesian inference for anything really, and the native physics of the chip, right? And so it's almost like, it's been there the whole time, right? And it's like, we've been doing algorithms that were physics that we could literally implement, right? It's a literal analogy, and we're instantiating as an analog computer. That was too beautiful to not build it. So we we did build it um hopefully we could cut some b-roll i brought some some uh superconducting chips here um we'll do that later um but uh now actually i think the big the big breakthrough and the big challenge was to um that's it i moved to uh silicon yeah and wow um getting programmable uh stochastic physics in silicon was a whole kind of order of magnitude uh more difficulty And we had to really innovate in terms of understanding, you know, stochastic mechanics of electrons in silicon. And we built our team for that. And essentially, we kind of reproduced some core primitives, namely the one we're talking about for now is the probabilistic bit. and the probabilistic bit you could think of as a, you know, a double well system and you can tune the tilt and so on. So if you have, if you consider bouncy balls in this landscape, you can control how much time the bouncy balls spend in one well or another and you can consider one well zero, the other well one. So essentially you have a signal that's essentially dancing between zero and one, right? And you can control how much time it spends in zero and one. And so that's like a fractional bit, right? So, you know, going back to what we were saying earlier about, you know, coming from it from Qubit, you know, we, school of thought, we basically made P-bit from it. And it initially was superconducting materials, but now we've done P-bits in silicon and achieved really, really high energy efficiencies. efficiencies um you know our p bits can generate uh bits controllable bits of entropy with only a few hundreds of adojoules and we've actually uh as of recently submitted our results for peer review um and we're gonna put them on archive um probably timed with some other announcements in the coming months um but it's really exciting time um and again this was a concept for a very long time and it was a big risk for me um you know initially when i left theoretical physics and went all in on quantum machine learning. Everybody told me like, really, you, you're going to be a quantum machine learning lead at Google? And everybody's doubting me. And then I became kind of well-known. And then I eventually led a team. I led quantum machine learning at Alphabet X. And then I quit all that. I quit quantum computing, quantum machine learning. And it's like, I'm going to take even more risk going to build a whole new paradigm of computing from scratch, from the concept up. And everybody thought that was crazy. And now we're here. Now we've made a lot of progress and now we're scaling, right? And because there's nothing stopping us from scaling at this point because we, you know, we've de-risked the manufacturing as well, which is usually not something academics think about. But if you're a startup and you have to scale or die, eat the world or die, you have to kill every risk possible. So, you know, we are looking to scale to millions of degrees of freedom next year. And I think we're going to hit it. So that's really exciting. So your first chip, right, had three P-bits. Yeah. And the latest has about 300 degrees of freedom. Yeah. Correct. They're not all P-bits. But now you're aiming. We'll get to that someday. Yeah. Right. And now millions of degrees of freedom next year. Yeah. That's very cool. So help me understand the layout, like the general space here. So, I mean, presumably you don't think that this is a, or do you, do you think this is a wholesale replacement for the current stack? Or do you see like, because I could see a world where like you interface, you use thermodynamic compute when it's relevant, but you mesh this with digital and quantum at the appropriate junctures so that you get, what you really get is like a multi-scale stack where each hardware bit is specialized for the kinds of computations that run natively on that kind of hardware. Yeah, absolutely. I mean, you know, if you're, I think there is some work from Max Tegmark on correspondences between sort of the information bottleneck principle and sort of downsampling and the hierarchy and machine learning and the renormalization group, right? So in physics, you go, you know, if you're trying to learn from quantum mechanics and you downsample, you get statistical mechanics, you downsample, you get Newtonian mechanics. You can imagine if I had a God neural network that just compressed all the information in a certain region of space, which is actually the thought experiment that got me into quantum machine learning. I was trying to understand black holes as a machine learning system. But let's say you created that system, then you can imagine the first few layers are quantum and then some layers are probabilistic and later they're deterministic. Just like as you distill the information, you don't need that sort of, you know, initially you need some quantum complexity, later you need some entropy, and then later you just need to do some classical, you know, coordinate transformations, right? And they all work together. But in a more practical, practical workflow. Yeah. I mean, you know, very often, whether you're doing simulations of, of stochastic differential equations, you're trying to simulate a physical system, whether you're, you're doing some sort of discrete or continuous diffusion. um uh you know there's there's very often there's a classical function uh often uh you know differentiable program uh that determines sort of some some probably distribution that you want to sample from whether it's in latent space or you know for these transitions and the denoising um now so the two work together you don't necessarily need entropy everywhere in your graph all at once right right um in principle you could use a a probabilistic computer uh uh for deterministic operations but it's not going to be the best at that right it right it um in the low precision regime right you could think of p bits as literal fractional bits so you can get into the fractional bit precision work and uh so that that can be interesting but not everything uh is is well suited for for that low precision um and um you know i do think quantum computers will have some applications but again they would be supplements uh for just quantum mechanical systems to understand quantum mechanical systems as a supplement to probabilistic and classical computers. But I would say that most things would be well covered by probabilistic and deterministic representations running on probabilistic or thermodynamic and deterministic computers. So currently, the superconducting chips require a lot of cooling, right? You need to get them around one Kelvin. Is that the case? Yeah. I mean, depending on the material, you can be a few hundred uh millikelvin or you can get to a few kelvin um a few kelvin um usually using niobium niobium I think that you don need as big of a fridge For us it was an interesting sort of experiment and we have a bunch of results there, paper coming in the coming months. Essentially, it's as efficient as we could imagine building a thermodynamic computer. And again, it was just to get people to imagine, hey, actually, or realize rather, hey, actually, there's forms of computing that are far more energy efficient by like an unfathomable amount of orders of magnitude, far more efficient than digital computers, right? Right. Because I'm just doing some back of the napkin because so you guys target 1000 to 100,000 X energy efficiency gains at the chip level. And what I was wondering is how does that discount against the cryogenic energy costs? Like how does that factor into your calculations? So, you know, any sort of anything from that, those results were, you know, and we put the asterisk there. Like that's just the chip, you know. And in general, you know, you'd have the thought experiment was if you scaled a very large network of superconducting chips to football field sized. and you had a ginormous dilution fridge, right? Again, it's like the big energy cost is maintaining that boundary, right? With the outside world. If you had a ginormous dilution fridge and you had millions and millions of P-bits and superconductors, you'd have the most energy efficient probabilistic computer, right? Is that practical? Probably not. Should a startup be doing it? Probably not either. Could a government do it in some sort of crazy moonshot? Maybe. And so we're kind of just going to put the idea out there for academia, national labs and whatnot to pick it up. I think for us, it was also just a great learning platform. Superconductors are ironically pretty accessible. A bunch of universities have fabs where you can experiment with them. So for us, it was kind of like, hey, we're going to want a community of people working on thermodynamic computing and experimenting with new primitives and showcasing new algorithms. So we're going to put out that work into academia. But for us, the product is, again, the silicon chips, which, you know, we could have decided to run them in cryo, cryo CMOS, but we ended up deciding to go for room temperature. Of course, a thermodynamic computer at lower temperature consumes less energy, right? Of course, you're assuming you're maintaining the bath at that temperature, but then you have the costs of the cooling, right? Right. So, you know, but, you know, if you had a if you had a von Neumann probe thermodynamic computer, maybe it could run much, much, much cooler. Right. If it's going to go to an interstellar space, it's not going to encounter that much heat. And so maybe we'd modify the design for that. But that's, you know, much problem for in the far future. And then coming back to impacts on the AI industry, I mean, EBMs have been around, energy-based models, right, have been around for a long time. Would you say that this is really the key to unlocking their potential? Because up until now, they've been basically limited by their inefficient sampling, right? So would you say that this is really like the technology that we need to really do the EBM thing seriously? seriously. I think modern neural networks are just, you know, mean approximations of EBMs, right? Like neural networks came from EBMs. The 2024 Nobel Prize was, you know, and backprop came from looking at mean field of EBMs. And so to us, we're just, you know, yeah, creating the hardware for the ancestors to neural networks. And they're kind of a superset, right? You could just take averages and get deterministic operations. And so, yeah, our hope is, you know, people think of going more probabilistic with their algorithms now that sampling is far more energy efficient and far faster. But because we've been stuck with deterministic computers, people have just tried to avoid, you know, using primitives that have, you know, they would just represent distributions by their moments, right? And they would use exponential families where you don't need too many moments, such as Gaussians, right? Because you could just represent them as matrices and vectors and matrices and vectors can fit on a GPU really well, right? And you could do some transformations there. So there's been a bias in algorithms towards what runs well on today's hardware. And hopefully with the remaining computers, it changes that. And so we're trying to get people interested in the space. to start imagining what they would do with a multi-million... So this is how you break free of the... Go ahead. No, this is how you break free of the vicious cycle that we're stuck in, right? Yeah, we're stuck in a... So we've got a hardware stack that works in a certain way. We optimize our software for that. Then we build bigger hardware. So this is how we break free from that. Yeah, but I think we're going to break free one way or another because, frankly, even if you just do a back-of-the-envelope calculation, um well right now we're gonna run out of power trying to scale the eye we can't even uh you know if everybody were to use like an agentic you know big model chat gpt or grok uh model you know um we'd run out of power in the united states you can't deploy it to everyone you can't have everyone using it like with high intensity right now we're gonna run out of power um and then that's not even getting into sort of um video models uh world models uh which are going to be necessary for embodied intelligence um at that point you know if you try to scale to the planet we're going to run out even if we produce the power let's say there was some moonshot and let's say we even figured out nuclear fusion right let's say we just did that and we scaled that very quickly uh we'd run out of or you know we're going to generate more we're going to double or triple the amount of heat being radiated by the earth right so we're gonna literally cook ourselves to death right we're literally cooked okay and so something has to change with the hardware layer we can't scale with the current hardware and so and so people thinking we're just gonna scale transformers and get to the moon and scale scale intelligence to the whole planet they're just flat out wrong right it's like provably wrong when people say hey you know to scale up intelligence we need to start building nuclear power plants, I always think, well, you know, like I run on a glass of water and a banana, maybe a coffee in the morning, right? Like certainly this is not like a naturalistic way to think about how intelligence scales, right? Because you're a thermodynamic computer, right? That's right. That's the thesis. But, you know, you can imagine, right? Like just taking our current design and scaling it to a wafer, you would have about 1.5 billion pbits, which you could think of as as neurons and about 20 20 billion parameters um per wafer and then you could do multilatered programs of those and that would run on not 20 kilowatts which is what a current wafer scale system would run on or more maybe 100 it's 20 watts right which is like our brain yeah right that's and then if you have a hundred or you know you have a few hundred billion parameter model running on 20 watts then we're you know we're in the brain we're in we're in the same ballpark as the brain not quite exactly it might be like within 10x but that's still much better than where we are which is like 100 million x right and so that's where that's where we're going and and again it's not it's not to us it's far less crazy than quantum computing there is no like quantum computer in nature that has very uh very high quantum coherence and we have exquisite control and very high quantum complexity all systems decohere we have you know multiple billion thermodynamic computers out there in the wild and they work pretty well um so we're just trying to to tap into the same physics we're not obsessed with sort of biomimicry we're not upset we're not we don't call ourselves neuromorphic computing and we're just doing again probabilistic approximate probabilistic inference as a service, but in hardware, in physics. And to us, that's kind of the parent workload of most of AI. And actually, it's not just for AI. It's for broader computing, right? There's simulation, optimization, and yeah, statistical inference, science at large can use this. So it's not just for gen AI, right? We're not just riding the current wave, yeah. I have the pleasure of interviewing not one, but two people today. Yeah, I'll switch that. Yeah, exactly. We've got your alter ego on the call as well. So I guess, you know, to kind of frame up the transition to discuss, because, you know, for the audience, Gil, or rather, aka Beth Jesus, is the founder of a philosophical movement called Effective Accelerationism, which originated as a kind of counterpoint to effective altruism. and I really want to get into the details there. But I guess to kind of set things up, looking 10 to 20 years out, how do you envision the future and what role does acceleration play in your vision for the future? Yeah, I mean, 10 to 20 years. I think we'll have embodied intelligence. I think thermodynamic computers will be pretty ubiquitous. They're going to be what runs embodied intelligence. We're going to have greater intelligence density in all our devices. We're going to have personalized, always on online learning intelligence. And so, you know, my goal is for everyone to own and control the extension of our cognition. because that is important to maintain. So democratizing intelligence. Not just democratizing. Yeah, but yeah, truly, because democratizing access, it's like, hey, I give you access to the one God model that amortizes its learnings across the fleet. Now you're all mind merged with my Borg mind. Congratulations, you've been assimilated. That's not democracy, right? If you have control over its constitutional prompts, So you're essentially controlling people by proxy and their thoughts and their will and their actions, right? If you control, it's not just, it used to be people control people's access to information and they steer them in directly. But now it's going to be directly controlling the model that's an extension of their cognition or their thought partner. And then that's much deeper and more subversive control. And so to me, I think that was the big existential risk that I was most worried about was the sort of precedent of people vying for control and vying for power, right, over people. Right. And that's why I've been pushing for decentralized AI and I've been pushing for sort of avoiding sort of overregulation of AI that would cause, you know, red tape inflation and would mostly serve the incumbents and would cause a centralization of AI power. um but um yeah i mean you know someday i would imagine 10 20 years out hopefully you know we're wearing some neural links uh you know our whole skull is uh some some neural link that is the thermodynamic computer that's an extension of our cognition and it's uh as powerful uh or it's not more powerful than your brain and and you have a full merge there uh i think something more plausible in a 10-year time scale is a sort of soft merge right uh i mean you're you're the active inference expert here. But, you know, if you have the same Markov blanket, right, you have the same perception and action states, right? So let's say I had an agent, let's say my glasses were very smart and they were always on, always listening, and I have an earpiece. I could have sort of subconscious thinking almost. It could just perceive things and suggest actions. So we're sharing perception and we're sharing actions through my body. we are the same agent right so that is that is a form of merge and you don't need a neural interface for that right and and and maybe we even have a way to communicate non-verbally or or you know just like friends that hang out a lot have a prior of each other's behavior or their behavior conditioned on the world states and and then they don't need as many bits of information to to adjust the new setting and so that's that's kind of a more plausible way i think i think i I think the soft merge is going to come first, but I think people are going to keep escalating and then we're going to go towards a harder merge, right? A hardware merge. That's a very compelling vision for the future. It's very techno-futurist. I love it. Do you want to give me the elevator pitch for EAC effective accelerationism? So, you know, because I think these are all related. I think this sets you up nicely to kind of just, you know, present the kind of core idea. Yeah, I mean, really, IAC is a sort of meta culture. It's kind of, I call it like a cultural hyperparameter prescription, if you will. And it's one where we're trying to maximize growth of civilization as measured by our free energy production and consumption, aka the Kardashev scale. The Kardashev scale is a log scale tracking how much free energy is consumed and produced. And really it comes from the sort of realization as I was studying stochastic thermodynamics in preparation for founding Extropic. I realized that there was this sort of generalization of Darwinian selection, which is thermodynamics selection. where each bit, you know, you go from selfish genes to selfish memes to selfish bits. Every bit of information specifying configurations of matter is fighting for its existence in the future. And the selection pressure on those bits of information are whether or not they can further host organisms an ability to understand their environment, predict the future, capture free energy, use it strategically, and grow. But really the master, you know, metric is like how much, you know, if I change this parameter, this value, and I let time evolve, how much free energy has the system dissipated through its trajectory, right, over time, right? And, you know, the theorems of stochastic thermodynamics tell us that actually trajectories of the system that have basically consume more free energy are exponentially more likely, right? And so that's how you have this sort of pruning of branches that consume less free energy. And so it's like, ah, okay, well, this is the golden metric of sort of selection pressure on the space of bits. And so I will design the highest fitness selfish meme that is EAC, which is figure out what is optimal for growth and do it. And so by construction, it should be the most viral thing and it will persist. Right. And surprise, you know, basically the alternative is basically growth or death is what you're suggesting. Right. Yeah. You know, we have the saying, accelerate or die. I mean, it sounds dramatic, but it's meant to sound dramatic. But it's also, you know, reality. It's either you align yourself with growth and you are part of it. Right. So whether you adopt your culture towards growth and then you're part of the growth and you benefit, or you don't and then you get outgrown. And whether it's at a national level, right, with your policies, right? Again, EU versus U.S. U.S. is obsessed with growth to some extent. And we've seen a sort of bifurcation, you know, in the GDPs and so on. at a policy level, at a cultural level, even at an organization level, right? If a company is not obsessed with growth, eventually it just gets disrupted by one that outgrows them and then has more resources to outcompete them, right? And so I think the point is that essentially if people are open-minded about new technologies in general and lean in, they will be selected for And the people that are skeptical, push them away, shun them away, you know, want to go back to the cave, get selected out. And so even from an empathetic argument standpoint, trying to popularize acceleration, like awareness that actually this is how the world works. I'm sorry to break it to you. You either embrace the acceleration or you get selected out. it's also kind of like, hey, you know, if you care about yourself and your tribe, your company, whatever, you should lean in rather than lean out and don't listen to the people telling you, you know, fear mongering you. They're actually very often doing so out of self-interest. And it's an actual form. I view it. I view spreading deceleration as a form of warfare that is psychological. In fact, that's how I view it. And that's why I've been such a warrior online against the decel mindset. I just don't see a scenario where being decel is beneficial to the host. In fact, it's just exploiting this gap in understanding of how the world works to destroy competition, right? And I mean, there's a lot of this in nature and in general in human dynamics. People kind of sabotage each other to out-compete each other. See, I've been thinking about this a lot. My good friend and colleague, Axel Constant, actually, and I have a longstanding debate as to whether ethics, morality, and this kind of thing can be reduced to free energy minimization at the end of the day. And I think it's a kind of difficult question because thermodynamics describes all mesoscale objects, right? So basically any configuration whatsoever will be minimizing free energy in this fashion, right? Like seeking to find pockets of free energy and dissipating it as efficiently as possible. I guess my worry is that, you know, things that grow without boundary, without bounds, I mean, in the biological case, you have like cancer, for example. So I guess, like, what's the role of constraints in the maximization of entropy when we're kind of designing the social system? because surely you know things like you know dictatorships uh you know fascist autocracies those also minimize free energy and you might even think um they're a local they're a local optimum right they're not a global right just like a cancer is like it kills the hosts and that's suboptimal on a sufficient time scale again it's not it's not an instantaneous free energy dissipation right it's it's it's it's it's basically infinite time horizon right and so actually blowing up the planet does uh you know exhaust you know burn up a bunch of free energy but in the long term you actually it's it's you know the same reason life exists uh it's it's much better to sort of conserve and strategically use free energy to secure more free energy and and keep growing rather and have some order rather than just like burn all and go and have chaos right and thermalize in one go right again um you know that's why we have life that's why we're not at equilibrium right um we're not just burning up all our fuel and just dying immediately um because you know as intelligent beings we we burn way more energy by being this sort of um you know partially somewhat coherent system that has predictive power of its environment And we kind of this energy seeking fire Right And right Yeah I mean but this is you know I would argue that, you know, IAC is also kind of a call to kind of popularizing complexism and complex systems thinking and the free energy principle at large. And we have to think through this lens about all systems in society from policymaking to to technology to innovation to to to to basically everything economics um and uh and it's a it's a different way of thinking one that is somewhat scary um you know again it's like it's again going from reductionism and rationalism to sort of um complexism and post-rationalism and and it's we don't have as much control and interpretability we don't understand the world as well right we kind of have to you know pardon my french but fuck around and find out the fafo algorithm right like but really it's kind of like exploration and and discovery right you kind of a priority don't know what's optimal right and in startups you kind of learn this the you know you have to actually go on the market and try stuff sometimes your prior your model based prior can't do that well because the system is too complex to have a model that has um good predictive power you actually have to be in a in a sort of open loop um i would say that um that yeah hopefully hopefully there's kind of a renaissance uh and um and uh these ideas become more popular in all sorts of fields of of science clearly it's eaten the software world and we're trying to make this sort of school of thought eat the hardware world. And I think we will do it. But again, most fields of study could be revolutionized by sort of thinking through the lens of complex self-adaptive systems and the free energy principle. So, I mean, you've described EAC as a kind of hyperstitious meme. Do you want to say a few words about what that means? Yeah. I mean, you know, because again, you're the active inference expert, you know, in active inference uh you know you can you have perception and action right you can update your you know based on your input um sensory information you can update your model of the world so you're minimizing divergence of your model of your your own internal model to the statistics of the world but then the dual of that is taking actions in the world to minimize divergence uh between the world and your predictive model of it. Right. And in a way, you know, we're just naturally biased towards, you know, in a way that the car goes where the eyes look, right, when you're driving. Right. And if we look at very negative outcomes and we're obsessed with them, we will drive whatever system we're thinking about towards those negative outcomes. An example of this, An example, slightly controversial, is bioweapons research and, for example, COVID, right? I would say that COVID was an accident from bioweapons research. It was probably defensive. And we were trying to explore what would be a really bad scenario. uh what if we had you know this mutation and this mutation would combine and and it would be a really bad virus and then they start experimenting and designing in that neighborhood of virus subspaces that would have never occurred naturally right just from evolution but because we were exploring that subspace of bad things because we're obsessed with it uh we made it happen right and so you know to me, it's like, if we're optimistic about the future, we tend to steer things towards that optimistic outcome. As a startup founder, if you're not optimistic about your startup, statistically, you are screwed, right? Because... Oh, yeah. And you know, from the active inference point of view, every action begins with a false belief, right? So first, you believe that you're moving, and then you reduce the prediction error in the direction of action, right? So... Exactly. I find this extremely compelling. I mean, one of the things I find most inspiring about IAC is that you're trying to be like, you're trying to present a radically optimistic meme for the future. And I think, you know, there's so much, not just AI doomerism, but there's so much doom and gloom going around that just like, just from the point of view of neurobiology, we need these kind of like, almost seemingly like delusionally optimistic beliefs to get off the ground anyway. Yeah. I mean, you could think of like the mimetic sphere as like a metacortex, right? It's a biological supercomputer. And EAC, we're just trying to do active inference towards better futures, right? So we're spreading this meme of optimistic futures, and it actually does steer the world. And, you know, it's been three years now and it has steered policies throughout the world towards, you know, going for these moonshots and resurgence of, you know, exploration of nuclear energy, decline the Kardashev scale, deregulation of AI, you know, widespread embrace of AI, companies being far more aggressive in their exploration of it. And, you know, overall, how do we spread it? How do we spread IAC? I guess my question is kind of motivated. I think most of most of the very visible IAC figures, you know, Andreessen, Musk and company, they tend to be, you know, libertarian right wing associated. if we that happened after the fact right so i would say you know like gary tan is is rather on the left and um you know initially it was very apolitical i would consider myself still a centrist and kind of you know not hitting how do we get the abundance bros to join i think i think it's already happening i think i think i think that was a reaction to to eac in kind of the techno optimist, you know, you kind of end up getting clustered with one party in the US, you know, in other countries, it might be different. But there needed to be a like actual techno progressive, let's say, versus techno progressive, you know, cluster on the left, it's not clear that that faction is kind of the dominant faction of Democrats, I really hope it becomes that, You know, again, EOC is kind of like an RL algorithm. It's not clear what the optimal policy is. I think a few years ago I was seeing a sort of convergence towards a lot of top-down control, a lot of top-down power. And I think the optimal is always balanced. You need some top-down control, but you also need sort of bottom-up self-organization. That's how the brain works, right? Exactly, right? You have a little bit of centralization, but not all that much centralization. Exactly, right? And I think absolute total libertarianism, it can work. It's just in the era of sort of unconventional warfare, complex systems also get adversarially steered, right? um whether it's it's free markets they get gamed or memetic markets right there's psychological operations um and uh yeah and so on and so forth dating markets um and so i think i think some balance of top down and bottom up is is the is the ideal um is it you know more like the us or is it more like china or it's something in between i don't you know i guess we have to figure out But that should be that segues perfectly into my next question. So, I mean, for you, this isn't just like a philosophical thing. This is like of geopolitical, geostrategic significance, right? Like this is important for the future of like, you know, democracy as we understand it. Yeah, I would say that our failure to view the world as a complex self-adaptive system and still thinking in kind of the classical view of like, hey, I have a first order model of what's happening and I'm going to do a first order correction. Right. And it's easy to convince a crowd of that and politicians get elected on such platforms, but then they don't think of the higher order effects of their policies. Right. And instead of thinking at it from a complex systems steering standpoint. And whereas I think our adversaries of the West understand the complex systems approach and they tend to steer us in directions that lead to our detriment. And to me, it was like, OK, how do you fight a multidimensional war, you know, where there's OK, there's like invisible sabotage of every complex system that can be steered adiabatically in nefarious directions. And if it's slow enough, then it's it's it's it's kind of above the the infrared temporal cutoff that politicians give a shit about, which is four years. Right. So if you're steering the United States or the West on a 20 to 40 year timescale, you could win a very long war. And we we don't even find the pattern because we're too busy. You know, again, you know, there's a sort of timescale separation here similar to to, you know, what happens in thermodynamic computing and thermodynamic physics. But essentially, we're so preoccupied with the dynamics that occur on small timescales, the day to day, that we kind of ignore the long trends and those get sort of hacked. Right. We slowly boil the frog. It doesn't realize it. And so in listening to you talk, it occurs to me that like there's a beautiful coherence to your approach generally. Right. Because what you're doing both at the hardware level and at the level of your philosophical project is essentially kind of moving us away from hard, rigid programming from the outside towards a kind of organic, adaptive, thermodynamic driven kind of learning and exploration of possibility space. Yes, yes. I mean, you know, again, it came from my own journey trying to understand physics and kind of discovering differentiable programming and seeing that as the way forward for everything, right? And really, you know, some of the prescriptions of IAC are to maintain variance and constantly explore across any parameter space, whether it's culture, aesthetics, policy, technology, etc. And one of the reasons is that, you know, according to Fisher's theories on evolution, you know, the speed at which you can traverse a landscape, of course, it depends on the gradient, but also depends on the variance, right? evolutionary search depends on variants because the more you fuck around the more you find out right um and so your rate of learning is faster and your rate of adaptation is faster and to me that that seemed like the main advantage of the united states is that the united states is very high variance right they have high variance individuals and they have high variance outcomes and so i view united states as a sort of high temperature search uh algorithm and they're always first to uh and i i i'm not american yet officially but you know someday i will That's my goal. But, you know, they're first to figure things out because they're always searching in very high variance. I view it as a sort of I view sort of innovation as a diffusion process in some landscape and they're in very high noise regime. So they don't get stuck in local optima. Right. Whereas, you know, I would say China is more kind of like the low temperature sampler. They're kind of the optimizer at the end. They're doing a gradient descent. Right. Once there's consensus, right? Once there's a clear gradient of direction of improvement, right? It's easy to convince a committee at that point. And then you can just execute top-down control with a lot of conviction. And so they beat us in the final stretch when sort of, you know, you have, it's kind of like Bayesian inference. You go from an unsharp prior to a sharper posterior on what's the optimal thing, right? It's like annealing, right? And so essentially China beats us at squeezing the end again because they're fully kind of, they have a lot more top-down control power and a lot more coherence there. So I see this sort of, we're kind of like a parallel tempering algorithm between the United States, the high temperature search and discovering things first. And then the optimizing and improving the technologies and scaling the manufacturing, they're not necessarily the best. And then China squeezes the last bit. How do we avoid catastrophic outcomes? Because you've been pretty dismissive about P-Doom and Doomerism generally. So how do we allow the meta search to happen, but in a way that doesn't lead to catastrophic technological outcomes? outcomes or, you know, because I can also see some kind of tension between, on the one hand, this kind of open project and on the other hand, like, you know, the participation of like proprietary closed corporate groups. Yeah, I guess, you know, for me, P90 1984 was higher than PDoom uh from ai right and so i think um you know to me i think um china is kind of china is like the ultimate monopolistic company right there there's basically one set of executives for all the companies they're acting like a conglomerate and they throw their weight around to crush um american uh smaller companies and so you know i would say that one thing about yak is that were kind of anti-monopolistic because, again, monopolies tend to be suboptimal. Again, if you have choices of hyperperameters that are over-concentrated, you're not exploring anymore. And we see that sort of motivates my question and my worry is that, you know, like very authoritarian forms of government where we impose kind of like, you know, ethnic or cultural homogeneity, like those are very good at minimizing free energy, right? If you and I are exactly alike because there's like a top-down imposition of sameness right then you know that that minimizes free energy super locally like if we're all the same it's like right well you know if your energy term has like you know if you want to agree with your neighbor and so on have no frustration then of course like that's a local optimum but as we know like what we're arguing for is embracing embracing variants right embracing different cultures having different bets Right. Like even even, you know, it could be, you know, culture, genetics. It could be ways to even train your ML models. Right. I would say that actually we've had the same response to the cancer thing. Right. Like you're not worried about the the push towards, you know, boundless growth leading to cancer type formations because it's not it's not a global optimum. It's a local optimum. Yeah. Yeah. That's right. But I want to go on a slight tangent here that, you know, I think like, for example, there's been some centralization in AI research labs, you know, even though there's like four or five players that matter. They kind of churn. I mean, it's kind of been a meme recently that researchers, they kind of all slosh around and they all churn between each other. So they're all kind of equilibrating in terms of beliefs, right? Because you have exchange of particles of researchers between the labs. And you could see that the American research labs all converge to similar amounts of performance at similar compute. And then China with DeepSeq was exploring a whole different region of hyperparameter space due to its export control constraints. and then showed us that over-concentrating our bets in hyperparameter space has risks because we're stuck in a local optimum and there might be a better non-local global optimum, right, that we're missing. And so, you know, again, it's just been our push for variance there in order to always be exploring. I think it's really important. I think it's really, you know, I think basically there's also been a reaction to sort of, yeah, it's hard to explain. It's kind of like the same pattern that happens, you know, in Western governments, in late stage corporations, in large bureaucracies. bureaucracies, there's a sort of cover your ass sort of mentality and culture and decision by committee. And so, you know, it tends to be variance reducing, right? Or variance killing and taking sort of... And that's sort of my worry. And, you know, in addition, you know, I guess we're sort of in the same kind of situation as like a gradient descent learner, where like how are we supposed to know that we're not stuck in a local optimum right like to us like we we might think hey like this is actually you know this looks pretty optimal but there's no kind of god's eye view that could tell us that actually we're just stuck in a local but that but that's why you know as long as you don't um kill variants completely and you keep some explorer you're not just purely exploiting um and you keep open-mindedness then you're you're always kind of you know spending some resources still exploring and and and potentially finding a new better way to do things just like you know i think you know some people um you know at first thought our bet was ludicrous and and you know we shouldn't have raised vc funding it's like really you know there's trillions of dollars of capital riding on the current paradigm and you don't want a startup to to raise a few tens of millions to take a bet that would be completely disruptive to everything, to this whole multi-trillion dollar bet, right? And now the tune is changing a lot because people that are planning these half a trillion dollar build-outs want to know if a technology that's a thousand X better is around the corner, you know, and could render some of their build-outs maybe miscalibrated, right, in terms of the ratio of energy to compute and density, right? Again, I don't think, you know, I think thermodynamic computing is going to work in tandem with GPUs or TPUs or whatever. Neural processing units, I think, you know, it's going to take quite a while for models to be fully ported to thermodynamic computers. So for the foreseeable future, I think these buildouts are safe and you could just have thermodynamic computing as an add-on And Gil, it's been a real pleasure to discuss these issues with you. Pleasure as always. So how do we stay abreast of what is going on at Xtropic? Do you have any kind of closing message for us? Yeah, well, so stay tuned in the coming weeks and months. For those that are interested in trying out thermodynamic computing, in the coming weeks, we're going to give access to the first users to our systems. so you could test it yourself. It's mostly going to be private alpha in the early days. Again, we can't handle that many users. We don't have that many chips. But, you know, the technology is here. It's very real. And you can kick the tires. And, you know, stay tuned for some scientific papers, some open source software that we're going to put out. And I would say, you know, start reading more probabilistic machine learning papers. If you're a grad student interested in the area, start reading about, yeah, EBMs and various textbooks. And, you know, stay tuned for more announcements from Extropic towards the end of the summer and this fall. Well, for Machine Learning Street Talk, I'm Maxwell Ramstad. We're signing off. And thanks again, Gil. This was awesome. Thanks, Max. Yeah, and fascinating. Like, really, really cool stuff. Awesome. Thank you.

Share on X Share on LinkedIn