Back to Podcasts
Machine Learning Street Talk

Pedro Domingos: Tensor Logic Unifies AI Paradigms

Machine Learning Street Talk

Monday, December 8, 20251h 27m
Pedro Domingos: Tensor Logic Unifies AI Paradigms

Pedro Domingos: Tensor Logic Unifies AI Paradigms

Machine Learning Street Talk

0:001:27:48

What You'll Learn

  • TensorLogic unifies symbolic AI and deep learning into a single language with tensor equations as the core construct
  • It provides automated reasoning, autodifferentiation, and GPU scalability - capabilities that previous languages lacked
  • Tensor algebra and logic programming are fundamentally the same, with tensor equations equivalent to logical rules
  • Domingos believes TensorLogic may be the best language for most current AI applications, though other languages may emerge in the future
  • The syntax and efficiency of TensorLogic are improvements over using raw Einstein summation notation in libraries like PyTorch

Episode Chapters

1

Introduction

Domingos discusses his long-standing goal of unifying the different paradigms of AI into a single language, and how TensorLogic represents the realization of that dream.

2

Limitations of Existing Languages

Domingos explains how current AI languages like PyTorch and Prolog lack key capabilities that TensorLogic aims to provide in a unified framework.

3

The Fundamentals of TensorLogic

Domingos describes the core insight that tensor algebra and logic programming are fundamentally the same, and how this forms the basis of the TensorLogic language.

4

Advantages of TensorLogic

Domingos outlines the key benefits of TensorLogic, including automated reasoning, autodifferentiation, and GPU scalability, and why it may be the best language for current AI applications.

5

Limitations and Future Directions

Domingos acknowledges that while TensorLogic is a significant advancement, there may be other languages that emerge in the future to address AI's needs.

AI Summary

In this episode, Professor Pedro Domingos discusses his new language called TensorLogic, which aims to unify the different paradigms of AI, including symbolic AI and deep learning. TensorLogic combines the tensor algebra used in deep networks with the logic programming of symbolic AI, providing a single construct - the tensor equation - to represent all AI tasks. Domingos argues that TensorLogic captures the fundamentals of AI in a way that previous languages have not, offering automated reasoning, autodifferentiation, and GPU scalability in a single framework.

Key Points

  • 1TensorLogic unifies symbolic AI and deep learning into a single language with tensor equations as the core construct
  • 2It provides automated reasoning, autodifferentiation, and GPU scalability - capabilities that previous languages lacked
  • 3Tensor algebra and logic programming are fundamentally the same, with tensor equations equivalent to logical rules
  • 4Domingos believes TensorLogic may be the best language for most current AI applications, though other languages may emerge in the future
  • 5The syntax and efficiency of TensorLogic are improvements over using raw Einstein summation notation in libraries like PyTorch

Topics Discussed

#Unified AI language#Tensor algebra#Logic programming#Automated reasoning#Autodifferentiation

Frequently Asked Questions

What is "Pedro Domingos: Tensor Logic Unifies AI Paradigms" about?

In this episode, Professor Pedro Domingos discusses his new language called TensorLogic, which aims to unify the different paradigms of AI, including symbolic AI and deep learning. TensorLogic combines the tensor algebra used in deep networks with the logic programming of symbolic AI, providing a single construct - the tensor equation - to represent all AI tasks. Domingos argues that TensorLogic captures the fundamentals of AI in a way that previous languages have not, offering automated reasoning, autodifferentiation, and GPU scalability in a single framework.

What topics are discussed in this episode?

This episode covers the following topics: Unified AI language, Tensor algebra, Logic programming, Automated reasoning, Autodifferentiation.

What is key insight #1 from this episode?

TensorLogic unifies symbolic AI and deep learning into a single language with tensor equations as the core construct

What is key insight #2 from this episode?

It provides automated reasoning, autodifferentiation, and GPU scalability - capabilities that previous languages lacked

What is key insight #3 from this episode?

Tensor algebra and logic programming are fundamentally the same, with tensor equations equivalent to logical rules

What is key insight #4 from this episode?

Domingos believes TensorLogic may be the best language for most current AI applications, though other languages may emerge in the future

Who should listen to this episode?

This episode is recommended for anyone interested in Unified AI language, Tensor algebra, Logic programming, and those who want to stay updated on the latest developments in AI and technology.

Episode Description

<p>Pedro Domingos, author of the bestselling book &quot;The Master Algorithm,&quot; introduces his latest work: Tensor Logic - a new programming language he believes could become the fundamental language for artificial intelligence.</p><p><br></p><p>Think of it like this: Physics found its language in calculus. Circuit design found its language in Boolean logic. Pedro argues that AI has been missing its language - until now.</p><p><br></p><p>**SPONSOR MESSAGES START**</p><p>—</p><p>Build your ideas with AI Studio from Google - http://ai.studio/build</p><p>—</p><p>Prolific - Quality data. From real people. For faster breakthroughs.</p><p>https://www.prolific.com/?utm_source=mlst</p><p>—</p><p>cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy</p><p>Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst</p><p>Submit investment deck: https://cyber.fund/contact?utm_source=mlst</p><p>—</p><p>**END**</p><p><br></p><p>Current AI is split between two worlds that don&#39;t play well together:</p><p><br></p><p>Deep Learning (neural networks, transformers, ChatGPT) - great at learning from data, terrible at logical reasoning</p><p>Symbolic AI (logic programming, expert systems) - great at logical reasoning, terrible at learning from messy real-world data</p><p><br></p><p>Tensor Logic unifies both. It&#39;s a single language where you can:</p><p>Write logical rules that the system can actually learn and modify</p><p>Do transparent, verifiable reasoning (no hallucinations)</p><p>Mix &quot;fuzzy&quot; analogical thinking with rock-solid deduction</p><p><br></p><p>INTERACTIVE TRANSCRIPT:</p><p>https://app.rescript.info/public/share/NP4vZQ-GTETeN_roB2vg64vbEcN7isjJtz4C86WSOhw </p><p><br></p><p>TOC:</p><p>00:00:00 - Introduction</p><p>00:04:41 - What is Tensor Logic?</p><p>00:09:59 - Tensor Logic vs PyTorch &amp; Einsum</p><p>00:17:50 - The Master Algorithm Connection</p><p>00:20:41 - Predicate Invention &amp; Learning New Concepts</p><p>00:31:22 - Symmetries in AI &amp; Physics</p><p>00:35:30 - Computational Reducibility &amp; The Universe</p><p>00:43:34 - Technical Details: RNN Implementation</p><p>00:45:35 - Turing Completeness Debate</p><p>00:56:45 - Transformers vs Turing Machines</p><p>01:02:32 - Reasoning in Embedding Space</p><p>01:11:46 - Solving Hallucination with Deductive Modes</p><p>01:16:17 - Adoption Strategy &amp; Migration Path</p><p>01:21:50 - AI Education &amp; Abstraction</p><p>01:24:50 - The Trillion-Dollar Waste</p><p><br></p><p>REFS</p><p>Tensor Logic: The Language of AI [Pedro Domingos]</p><p>https://arxiv.org/abs/2510.12269</p><p>The Master Algorithm [Pedro Domingos]</p><p>https://www.amazon.co.uk/Master-Algorithm-Ultimate-Learning-Machine/dp/0241004543 </p><p>Einsum is All you Need (TIM ROCKTÄSCHEL)</p><p>https://rockt.ai/2018/04/30/einsum </p><p>https://www.youtube.com/watch?v=6DrCq8Ry2cw </p><p>Autoregressive Large Language Models are Computationally Universal (Dale Schuurmans et al - GDM)</p><p>https://arxiv.org/abs/2410.03170 </p><p>Memory Augmented Large Language Models are Computationally Universal [Dale Schuurmans]</p><p>https://arxiv.org/pdf/2301.04589 </p><p>On the computational power of NNs [95/Siegelmann]</p><p>https://binds.cs.umass.edu/papers/1995_Siegelmann_JComSysSci.pdf </p><p>Sebastian Bubeck</p><p>https://www.reddit.com/r/OpenAI/comments/1oacp38/openai_researcher_sebastian_bubeck_falsely_claims/ </p><p>I am a strange loop - Hofstadter</p><p>https://www.amazon.co.uk/Am-Strange-Loop-Douglas-Hofstadter/dp/0465030793 </p><p>Stephen Wolfram</p><p>https://www.youtube.com/watch?v=dkpDjd2nHgo </p><p>The Complex World: An Introduction to the Foundations of Complexity Science [David C. Krakauer]</p><p>https://www.amazon.co.uk/Complex-World-Introduction-Foundations-Complexity/dp/1947864629 </p><p>Geometric Deep Learning</p><p>https://www.youtube.com/watch?v=bIZB1hIJ4u8</p><p>Andrew Wilson (NYU)</p><p>https://www.youtube.com/watch?v=M-jTeBCEGHc</p><p>Yi Ma</p><p>https://www.patreon.com/posts/yi-ma-scientific-141953348 </p><p>Roger Penrose - road to reality</p><p>https://www.amazon.co.uk/Road-Reality-Complete-Guide-Universe/dp/0099440687 </p><p>Artificial Intelligence: A Modern Approach [Russel and Norvig]</p><p>https://www.amazon.co.uk/Artificial-Intelligence-Modern-Approach-Global/dp/1292153962 </p>

Full Transcript

Minute Maid Zero Sugar tastes so great, it sells itself. So we didn't bother to write this ad. Minute Maid Zero Sugar. Great taste. Zero sugar. Sells itself. Oh, what fun. Holiday invites are arriving, and Nordstrom has your party fits covered. You'll find head-to-toe looks for every occasion, including styles under 100, Dresses, sets, heels and accessories from Bardot, Princess Polly, Dolce Vita, Naked Wardrobe, Coach and more. Free styling help, free shipping and quick order pickup make it easy. In stores or online, it's time to go shopping at Nordstrom. TensorLogic unifies not just symbolic AI and deep learning. It also unifies things like kernel machines and graphical models. I'm Pedro Domingos. I'm a professor of computer science at the University of Washington in Seattle and a long-time machine learning researcher. And my dream from my PhD onwards has always been to unify all the different paradigms of AI into a single one. My PhD unify two of them. My best-known research unifies a couple others of them. I wrote this book that turned into a big bestseller, surprisingly, called The Master Algorithm that is precisely about this goal and where we are towards that goal. And my latest work, which this podcast will talk about, is a new language called TensorLogic that I would say for the first time brings this dream of a unified representation, a unified solution to AI within reach. So if you want to find out, you know, how are we going to do that, watch this podcast. You know, I can set the temperature to GPT to zero and it still hallucinates. And I can have a poor deductive system that hallucinates all kinds of things. So to me, like, those are separate problems. No, very good. So precisely the problem or one of the problems with GPT is that it hallucinates even when you set the temperature to zero. What the hell, right? I want to have a mode, right? Not I, but like every Fortune 500 company, if it's going to use AI, needs to have a mode. The logic of the business is just to bathe. The security isn't violated. The customer doesn't get lied to, et cetera, et cetera. We've got to have that origin that they will not take off, right? And transformers can't do that. TensorLogic can do that precisely because in this reasoning animating space mode that I just described, if you set the temperature to zero, it does purely deductive reasoning. And by the way, the temperature can be different for each rule. TensorLogic is just based on this, to me, gobsmacking observation that an einsam and the rule in logic programming are the same thing. There is this thing called predicate invention, which is discovering new predicates, discovering new relations that are not in the data, but they explain it better. I would say that, you know, in some sense, discovering representation like that is the key problem in AI, is the holy grail. What was Turing's achievement that we now take for granted? Turing's achievement was, for which he is deservedly famous, right, is to postulate this notion of a universal machine. The amazing thing about computers is that the universal machine, which in his time was completely counterintuitive notion. What do you mean a machine that can do everything, The typewriter can type, you know, like the sewing machine can sew. You're telling me there's a machine that can type with one hand and sew with the other. What are you talking about? So like, this is the genius, right? So first step, you want to have this property of having a machine that can do anything. What we're missing to be able to do what the universe does and evolution does is universal induction. What is the Turing machine equivalent for induction, for learning? That's what I'm after. MLST is supported by Cyberfund. Link in the description. The idea of having to traffic in squishy people in order to make our systems go is not immediately appealing, let's put it that way. This episode is sponsored by Prolific. Let's get few quality examples in. Let's get the right humans in to get the right quality of human feedback in. So we're trying to make human data or human feedback. We treat it as an infrastructure problem. We try to make it accessible. We're making it cheaper. we effectively democratize access to this data. I'm a longtime fan of machine learning street talk. In fact, I was a fan of it before it was big, just like I was doing deep learning before it was big. So very close analogy. So you should definitely watch machine learning street talk. It's one of the best ways to not only learn about machine learning, but find out about what's going on at a deeper level than you see everywhere. And that is very important. So you should definitely subscribe to machine learning street talk. Professor Pedro Domingos it's amazing to have you back on MLST I've lost count of how many times we've had you on the show now so it's amazing to have you back the main reason that we've invited you today is you've just released a brand new paper a very exciting paper called TensorLogic the language of AI and fields you said take off when they find their language so you gave the example of calculus in physics and you know Boolean logic when designing circuits. What's the idea behind this paper? Well, TensorFlow logic in many ways is the goal that I've been working towards my entire professional life, because I really do strongly believe that a field cannot take off until it has really found its language. And TensorFlow logic, I believe, is the first language that really has all the key properties that you need in the language of AI. For example, it has automated reasoning right out of the box, like, for example, prologue has, right? The classic AI languages had the number of things that we just took for granted. The transparent and reliable reasoning, you didn't even have to worry about. It was just already available, right? At the same, you don't have that in PyTorch at all, right? You have all these hacks to try and do reasoning on top of it. At the same time, the lisps and the prologues, they never had the auto-differentiation, the ability to learn, right? One of the beauties of the current moment in many ways is that you barely have, you look at most papers, people barely talk about the learning because it's already implemented under the hood. So you want that as well, right? And you want the scalability on GPUs is the other thing that things like, you know, PyTorch and TensorFlow and whatnot give you. There was no language before that had all of these, and there's a number of others, but these maybe are some of the key ones. So tensor logic is basically a language which as the name implies, is a marriage, a very deep unification, not just some superficial combination, of the tensor algebra that deep networks are all built out of, and the logic programming that symbolic AI is built out of. There's only one construct in tensor logic, and it's the tensor equation. You can do everything with tensor equations. Are you saying that there's only one language of AI? Because certainly in some fields, Like physics, you gave the example of calculus. I mean, yeah, like, you know, almost all of calculus, I mean, almost all of physics, you know, involves quite a bit of calculus. There are other fields where actually there are kind of multiple languages that play, you know, almost equal roles. So I'm wondering if you think that is TensorFlow logic going to be 85, 90 plus percent of the way that we should be talking about and thinking about AI or will it be kind of a mixture of different languages? That is a very good question. And in fact, we know very well in computer science that there's no one programming language that is better for everything. There's just people who think it is. Everybody has their favorite language that they believe is the universal solvent, but it never really is. And we know also for fundamental reasons, like going back to Shannon and whatnot, that there is no language that is the most pithy for anything you might want to say. Having said that, physics is a good example because calculus is so fundamental. Feynman famously said that he thought in calculus. And the thing that I found with tensor logic is that, you know, I don't know how much of AI it's going to be or how much it should be. But what I have found, in many ways to my surprise, is that in some ways tensor logic is more than just a programming language. It really, I think, captures the fundamentals of what you need in AI in a way that going in I didn't even think was possible. All of tensor algebra can be reduced to this operation, which, you know, going back to physics, is called the Einstein summation, right? Einstein summation was something that was introduced by Einstein when he was working on relativity and got tired of writing summation science. It was all about tensors, right? General relativity is all about tensors. And he jokingly called it his great contribution to mathematics. But the bottom line, and, you know, there's this great paper by Team Rock Tashel or, you know, blog post saying, Einstein is all you need. And truly, you can do all of deep learning with just EINTSUM. All of the matrix multiplications and, you know, tensor products, all of that are instances of EINTSUM on the one hand. On the other hand, in symbolic AI, it's all about rules, right? And tensor logic is just based on this, to me, gobsmacking observation that an EINTSUM and the rule in logic programming are the same thing. they are actually the same thing. The only difference is that one is operating on real numbers and the other one is operating on booleans, but that's just a different atomic data type. And then on top of that, so to summarize, at this point, I think that it would be, I look at all different things that I and others have done in AI, and I think it would be crazy to not do these things with tensor logic. There may be other better things coming after, but at this point, I would say TensorFlow logic probably better for what people are doing across the board. But hey, that's me. I may be a little biased. First of all, shout out to Tim Rocktashill. I read that blog post from 2018 earlier that you were referring to. But I suppose the thought occurs that if it is mostly about Einsam and, you know, you might make the argument, why do we need an abstraction when we already have a great abstraction in InSum. So folks now can use PyTorch and, you know, Jax. What exactly does your abstraction allow them to do that they can't do with PyTorch? No, very good. It does several things. So first of all, and this is going to be an increasing order of importance, the syntax of InSum in this language, there's also this package called INOPS is incredibly clunky. So at a very basic level, tensor logic is just a much pithier, more compact, easier to write and understand way to write INSOMs. And you know, physics and mathematicians famously like to say that a good notation is half the battle. So this might not seem like a big deal, but my experience is that you can just think better and faster once you have this notation that like this funky procedure call with these indices and these arrows and these arguments. It's a nightmare. And, you know, the syntax of tensor logics was like, you write a ninesum like you would write a rule. There's a tensor equation with a tensor on the left-hand side and this join of tensors on the right-hand side, right? So this is one aspect. Another very important aspect, and one that I think could prove decisive, is that people don't use ninesum much because it's not very efficient. Under the hood, it's not as efficient as you You know, sometimes, you know, this could be done so much better, right? But I've done some programming of this, and I wind up, even I wind up not using Einstein because it's so slow and clunky. And all of that can be fixed. Once you have this one abstraction of the tensor equation, and you implement it on CUDA, for example, you can optimize the heck out of it, and you'll just be able to, you know, Einstein will finally be able to reach its potential, right? But actually, none of these things are actually the most important part. The most important part is that the einsam as we know it is only good for tensor algebra. Tensor logic is a language where the same construct does all the symbolic and all the numeric parts and any mix and variation between them, including learning the symbolic part and whatnot. These are all things that in einsam world just didn't exist, right? You talk about the people who knew einsam, whether in AI or mathematics or physics, and they just had no idea that any of this had anything to do reasoning. You look at all the ways that people are trying to do reasoning today and just want to pull out your hair. Let me ask a very concrete, you know, in some sense, I'm a simple, simple man. I need like a very concrete example because I completely agree with you, which is that the symbols we use, the language we use are just simplicity is so fundamental to our ability to like reason at higher and higher levels. So let's take one example from your paper, which is a logical or, logical or of a bunch of values is equivalent to an einsum within a heaviside function applied to it. Like you give this example, right? The or, just to be precise, what I did in, maybe this is an important piece of context. So if you look at, so the simplest form of logic programming is data log, right, which is the foundation of databases, right? You know, most SQL queries are variations on data log rules. And data log rules are composed of two things, joins and projections. This is like databases 101. And what I have done is I have generalized join and projection to numeric values. There's this thing which I defined called the tensor join and the tensor projection, which when the tensors are Boolean becomes the regular symbolic database one. But now the numeric version has all these things as special cases. And by the way, it's also more general than the einsam, right? So another benefit of this is that it actually goes beyond the einsam. Now, an OR, right, the way you get an OR is by having more, is just as in, so how do you get an OR in prolog or datalog? is by having multiple rules with the same head. And then those rules, they're implicitly being disjoint. So if I have A, B, C, and A, D, E, then that means A, B, C, or D, E. And the same thing happens here. And you could also, of course, just put them all in the same equation because it might be more convenient to just say, well, A, B plus C, D. So doing a NOR is a completely straightforward thing, but it's really not where the main action is. it's in the tensor joins and tensor projections. Thank you for laying all that out. Completely agreed. Makes sense. I just wanted the very example I was giving is that an INSUM over a particular index of Boolean values, then with a Heaviside function applied to it, which is just a zero if it's less than zero or one if it's greater than zero, is equivalent to a logical OR over that same index. Oh, yeah. Sorry. I see. I understand your question. so again it's more than that it's a DNF so a DNF is a disjunction of conjunctions yeah exactly and so what happens is the einsum is so in numeric land think of a dot product a dot product is just a sum of products which in Boolean land will be a disjunction of conjunctions if more than one is true you get an number that is greater than one which is you need to pass it through a step function to reduce all the values greater than one back to one yes but my question was more like okay i have these two different representations of the same the same operation at least at the element level like an an insum over an index followed by a heveside on that element is equivalent to an or over all the same boolean values of that of that index and i guess and my question or per element so my question to you is i i give up one thing which is instead of having a single symbol which is kind of like an or I've now got two operations, you know, Einsam, Heveside. And there are many examples of that, right? Like I can build every single circuit out of NAND gates. I think we discussed this like once actually. Or I can have like other kinds of gates. And it's useful to have other kinds of gates. So in your language, do you foresee people not having syntactic sugar, like an OR operator, which under the hood is Einsam, Heveside? Or would they still retain those? It's just that the fundamental, you know, the most basic, you know, constructs of the language are tensor logic. We can do everything with NAND. So why do we need high level programming languages at all? Right. The point. So there's two things that you want the language to be. First of all, you want it to be universal. So you can for some things you don't. But in general, right, for AI, surely you want the universal language. You want something Turing complete and Turing logic and tensor logic is that. But then, this is actually the most important and most difficult part. You want something that is at the right level of abstraction for the things that you want to do. And NAND definitely is not. And I can show, with a lot of examples I have in the paper, for example, you can code a transformer in a dozen tensor equations, as opposed to a vast mass of code. And then what happens when people have a language that suits their needs, is that then they just get used to that. They often wind up using it even for things that it wasn't the perfect thing for. But at that point, it's what they're comfortable with. So my guess is at the end of the day, people are just going to do everything in tensor logic. And they know, you know, in the back of their heads that, yes, there are ores going on here. And you could think of them as ores. But they just think of them as joins and projects and tensor equations. Very good. And by the way, you can implement transformers and anything else in tensor logic. It's so easy, in fact, that I fed your paper into Claude Code. and I got it to implement the whole lot this afternoon. And maybe I'll publish that on GitHub if folks want to have a look. But it's quite straightforward. But just to get the trajectory a little bit here, Pedro, you're famous for writing this master algorithm book. And in that book, you spoke about all of these different tribes in machine learning, you know, like Bayesian folks and logic folks and kernel methods and neural networks and all of this. And I guess, do you see this as a step towards unifying these things together? because now in TensorFlow, you can actually create a composition of different modalities of AI, and it just works. But this might seem a bit weird to people. I mean, can you explain what that might actually look like? Absolutely. So in a way, the master algorithm was laying out my agenda, right, was asking the question, what is the master algorithm? I did say at the outset, I'm not going to give you the master algorithm in this book. I'm just going to tell you where we are and why I think this is the central goal of AI. I would say that TensorFlow is that answer. TensorFlow, we haven't talked about that yet, but TensorFlow unifies not just symbolic AI and deep learning. It also unifies things like kernel machines and graphical models. The things that graphical models, for example, are built out of, and then you can compute probabilities with them, they are a direct... I didn't do this on purpose, but it just fell out. You know, the factors that graphical models are made of, those are just tensors. And then the marginalization and summation, sorry, the marginalization and the point-wise products that are what probabilistic differences are made of, they are just tensor joins and projections on those tensors that represent potentials in the case of vision networks, conditional distributions. So at this point, we do have this very simple language where you can do the entire gamut of AI, which honestly, I didn't think this was going to be possible going in. I thought the answer would be much more complicated. Now, is this the master algorithm? TensorLogic per se is not the master algorithm because it's just a language. I would say that it's the scaffolding on top of which you can build a master algorithm. Now, TensorLogic is not just a language. It's also the learning and reasoning facilities under the hood. So, for example, one of the best things about tensor logic is that the autograph is incredibly simple because there's just one construct is the tensor equation and the gradient of a tensor logic program is just another tensor logic program. So this is all there. So the learning and the reasoning are all there. However, you know, what I would say is that this is not the mass volume per se, but it's what we need to producing and I intend to produce it on short order. You've described a language and certainly if components of that language are too incomplete, that's a big vexed issue. We'll come back to that a little bit later. But because of computational equivalence, we can, you know, from an expressibility point of view, we can describe anything in the universe. So we've got this framework. But to me, the challenge in AI is structure learning, right? So as well as being able to express stuff, it's being able to adapt to novelty and create, perhaps from building blocks that we already have, a new structure to allow us to do something useful in that domain. And I can't quite make that leap with your technology yet. So how do we do the meta thing where we actually build the tensor logic constructions to represent the kind of world that we're seeing? Oh, very good. So I actually go into that in the paper, you know, but briefly the paper is just, you know, an informal introduction to these ideas. Inductive logic programming, right, is the field that deals with discovering rules from data. But it does this by things like greedy search or beam search, and it's a very large search space, and it's extremely inefficient, right? Which is actually one of the things that killed it, even though it could do all these things that people knowing deep learning are just painfully rediscovering. In tensor logic, this is one of the best parts of it, the structural learning falls out of the gradient descent. The gradient descent actually does structural learning And then on top of that this is actually the best part as far as the learning is concerned There is this thing called predicate invention which is discovering new predicates discovering new relations that are not in the data, but they explain it better. I would say that, you know, in some sense, discovering representation like that is the key problem in AI, is the holy grail. Everything that we know, you know, like when you look at the world, right, You don't see pixels. You don't see photons hitting your retina, right? You see objects. The objects are invented predicates. All the way up to science, right? The most, like, Newton's genius was to introduce a new quantity, which is force, and energy, and entropy, and all these, et cetera, et cetera, right? So in tensor logic, that also just happens by gradient descent, right? It's hard to believe, but let me just give you a hint as to why this is the case. There's this other thing that is folded into tensor logic, which is tensor decompositions. And tensor decompositions are a generalization of matrix decompositions. And if you think about matrix decompositions, to take that simple case, what a matrix decomposition does is it takes a matrix and decomposes into two new matrices that together are more compact but essentially reproduce the same data. And there's a generalization of that to tensors called the Tucker decomposition. There's others, but the Tucker one is the most relevant one here. And so if you write in tensor, to answer your questions very directly, if you write in tensor logic a rule schema, including a data tensor on the left-hand side. And by the way, your entire data can just be reduced to one tensor, embedded as one tensor. We can touch on that later. But you write a rule expressing that as a function of a few other tensors, and the gradient descent, just as in matrix factorization, will discover the best values for those. And then if you want to, for example, then discretize it, say like I'm going to threshold this and make it Boolean again, you will see what is the concept that they've learned, or you can leave it in numeric form. So the learning is actually extraordinarily powerful. I've always thought, and I think a lot of people in deep learning really believe this, that gradient descent can do amazing things provided you give it the right architecture to operate on. And in a way, what all these million papers are about is about finding the right architecture for gradient descent to operate on. And of course, transformers are a great leap forward, but I think TensorFlow, I think, you know, TensorFlow is an even greater leap forward. How so? Because for example, I can picture, suppose we want to get rid of Python. So like I'm over here in, you know, PyTorch and I've described all my layers kind of in the clunky syntax, and instead I'm like, no, now I have the TensorFlow logic GitHub programming language. Let me go do there. I'm still going to construct my layers, right? Because, like, for example, you do, of course, you allow for the nonlinearities, right? So after every insum, I can apply whatever kind of nonlinear function I want to ReLU or Sigmoid or whatever else, right? that's still going to be described in my program it's like i'm going to have this shape i'm some followed by this linearity feeds into this shape followed by so i'm still going to have to do that kind of like you know structuring of the network if you will except now in tensor logic and in my opinion that's one of the biggest limitations right now is these are all just divined incantation structures that people have come up with like let's put in a dropout layer here and this kind of layer there and there. We don't actually allow the machines to learn the overall topological structure. We only allow them to find weights within that structure. No, but okay, I understand your question, but tensor logic does allow that. Step one, you can encode a multi-layer perceptron, the entire multi-layer perceptron, and I do that in the paper with a single tensor equation. All the layers, provided that they all use the same nonlinearity, can be encoded in one equation. Okay, number one. You can also have different equations for different layers or typically sets of layers, you know, however way you please. But from the point of view of structure discovery, the thing to realize is that if you create, if you set up one of these very general equations that you can in tensor logic, that in some sense it can, it's a very broad classes of architecture. Then what the learning does is it discovers the architecture within that space, right? Which, if you think about it at some level, is what? A neural network, when you compare an ordinary multilayer perceptron with a set of rules, right? A multilayer perceptron is, you can take, in fact, there was a system called K-BAN in the early days that did this, very clever. it initialized a multilay perceptron with a set of rules, because each neuron is a rule, right? But it's also more flexible, because now you can have weights, right? But, you know, a neuron, a single neuron can represent a conjunction, and therefore a layer can represent a junction, and so forth. So when you're learning weights in an ordinary neural network, you can actually see it as learning the structure of a set of rules. What tensor logic is doing, this is at a more powerful level. Like that was just propositional and now this is at the full level of generality of first-order logic. But you can learn the structure and then of course then there's more than one way to do that and you can also decide how black and white you want the structure to be, what you want to leave as weights, what you want to discretize. But the structure itself can be learned by taking a tensor equation. A tensor equation is a very general thing, right? When you learn the weights of those tensors, that materializes to a specific network structure. Yeah, so I understand that. Let me bring this back to the folks who are familiar with PyTorch or traditional techniques. What you described is, yeah, I can just create a fully connected network with however many layers I want and then let SGD find all the weights. That doesn't work. It doesn't work in practice and it's not going to work with tensor logic. It's just a different representation of the same fundamental problem, which is there's too many degrees of freedom it's not going to learn anything useful this is why so much alchemy goes into structuring you know constrained networks to have certain you know built-in you know inductive bias right no absolutely so to take another example you can also do an entire conf net in just one tensor equation and and you know like the quintessential example of like yes complete connections don't work is a multi-layer perceptron for vision, right, which you replace with the covenant that actually has the local structure. That is also a tensor logic equation. Now you're saying, well, how do you choose between the covenant and an MLP, right? Very good question. And now there's a range of things you can do. You can actually these days start out with a very general structure because, I mean, GPUs and large, you know, server farms are an amazing amount of for something like this, right? So you can almost, I would say, brute force that search, provided you have the data. I'm not actually recommending you do that, right? You can also, however, and more interestingly, you can, and this is actually one of the key benefits of TensorFlow is that you can write down what you believe are properties of the, say, right now what happens in, you know, when you, for example, program a network in Python is like, you have to commit. You say, like, here's the structure. And now, you know, the learning of, The only thing that happens is the learning of the weights. In tensor logic, you don't have to do that. You can set up one of these very general structures, and then you say, let me give you a bunch of equations that are things that I believe to be true about the structure but do not completely determine it. And those just work like priors, and indeed like soft priors, right? And then again, you can turn up the temperature on this or down and say, like, you've got to obey this equation, and that one, you know, I'm sure you can override. Right. And then and then this in my experience, this is actually what is important is that the gradient descent, instead of starting from a tabular rasa, have this kind of soft knowledge. And then most importantly, like you, the developer, you, the AI researcher, you, you get to this is really the essence is that it's not this. You every every, you know, deep learning researcher or data scientists know this is like you don't do the same priority and then push the button and hope for the best. right? It's like there's an iterative loop of you set up the structure and then you learn, you get the results, and then you refine the structure. And what this does is it makes that more efficient, much more because you just have to, in your interpreter, you write one more equation or you modify an existing equation. And also the entire stack of what you learn is much more interpretable than it was before. It's actually in some ways one of the most important properties of tensor logic is that you can understand what's going on much better than you could in two ways. One is that the code is much more transparent than the whole pile of things that you have sitting under a bunch of, you know, PyTorch procedure calls. But also the result of learning, at least if you do it in certain ways that I discussed in the paper, the result of learning is transparent in the way that a transformer just, you know, can hope to be. So we've covered some interesting topics on MLST before. I mean, of course, there's geometric deep learning, which is this idea that symmetries are fundamental. We've spoken with Andrew Wilson from NYU recently about soft inductive priors, and I've just spoken with Yi Ma about his crate series of architectures. And I guess the prevalent idea here is almost Platonistic, that there are real natural patterns. And if we kind of bias the model, as you were just alluding to, that it will converge on really good representations that describe reality. Now, the alternative view is that reality is constructive and gnarly and that won't work. But you were talking about your Tucker decomposition earlier. And that's this idea that, you know, we might have a large sparse matrix. We might want to densify it. We might want to factorize it. And the factorization will kind of pull out some of these natural orderings, you know, of the universe, perhaps. And I guess I was thinking, isn't it a bit like a GZIP algorithm? them. I mean, what if these factorizations are just semantically meaningless? You know, how do you know that you've got a good one? You know, great question. And you've touched on several things there. Let me start with, you know, the geometric deep learning, right? I'm a big fan of this. In fact, you know, I gave a keynote at the second Eichler on something that I called symmetry based learning, which is in some ways an ancestor of geometric deep learning. I really do think that the universe possesses these fundamental symmetry. Actually, I don't think that. This is known, right in physics right the standard model is basically a bunch of symmetries and this is extraordinarily powerful right that such simple things could be such universal regulators that you then basically can build everything else out of right and and if you think about it in machine learning the problem is like what is the learning bias that you should start from right should you pull in a lot of knowledge should you have a very you know very vague architecture the thing about machine and there's the no free lunch theorem right that says you know if you don't assume anything you can't ever learn anything. The thing that's amazing about machine learning is that with very weak biases you can get very far, right? And I would submit that those weak biases fundamentally at the end of the day the most important ones are these symmetries. And tensor logic is precisely, you know, I think the perfect language for expressing those symmetries as the physicists will tell you, right? It's what they use in like, not the logical, you know, version, but the the numeric version, right? So I think we can discover those regularities. I have some suspicions as to what they might be, but I think, you know, we're not quite there yet. But I think once we have those regularities in some sense, you know, they will play in AI the role that the standard model plays in physics, right? Now, of course, as you say, you know, there are people who say like, oh, forget that, right? You know, going back to Marvin Minsky, right? There's like, there is no small set of AI laws or anything. It's just one damn thing after another, blah, blah, blah, blah, right? Like you're dreaming, right? And I respect that point of view, right? And, you know, we will find out empirically, but if I had to guess how this is going to play out at the end of the day, it's going to be like this. The stuff that I'm talking about gives you, you know, the 80-20, you know, it gets you 80% of the way. And then the other 20% away, you have to do a lot of these things, you have to do a lot of hacks, et cetera, et cetera. But something like TensorFlow logic still makes it much easier and faster to do those hacks than if you didn't have it. So it actually gives you a benefit both in the 80% part and in the 20% part. There are folks, you know, complexity science. There's this guy called David Krakauer. And in his book, on the first page, actually, the very first sentence, the scientific and social implications of differences between A, closed, reversible, symmetry-dominated and predictable classical domains. I think that's what you're talking about, the kind of the Roger Penrose type world, and B, open, self-organizing, dissipative, uncertain, and adaptive domains. Now, I think the latter is where all the interesting stuff in the universe is. It's where life and intelligence and all the stuff we want to model. And could it be the case that those things are not reducible in the way that you're arguing they are? I'm glad you asked that question, because this really is the crux of the matter. Also, you're probably familiar. I know you're familiar, because we've talked about it before. Steve Wolfram's notion of computationally reducibility, right? Yes. And of course, the whole notion that we now understand very well that systems are, you know, many systems are chaotic and therefore inherently unpredictable, right? And, you know, complex systems and all of that. But so where there's, you know, the whole notion that, you know, more is different, right? Like very famous, you know, notion in... Yeah, Ross Anderson. Exactly, which I'm a very strong believer in. So doesn't that contradict what I just said? Actually, no. I would say the following is, from physics all the way to AI with biology in the middle. The universe is basically composed of two things, symmetries and spontaneous symmetry breakings. God made the symmetries. The symmetries are the laws. As far as we can tell, none of these systems at any level violate the laws. Those symmetries are there. I mean, you can go into that. There's a lot to be said there, But essentially, you know, most people, the great majority of people may be accepting, you know, some, there are some exceptions, but they believe that the laws of physics apply to everything. Like my brain obeys the laws of physics, society obeys the laws of physics. The problem is that the laws of physics are useless at some point in understanding, you know, even biology, let alone psychology or sociology or AI. Why are they useless? because we have inherited from the beginning of the universe a series of spontaneous symmetry breakings right and my brain is doing spontaneous symmetry breakings one after another continuously and those seem like those then some of them die out right or become irrelevant stay the same but others balloon into very big things and that's actually what evolution is is one of these things after another and once you have that so so the the computational invisibility problem is that at some level it is true that in And although in principle, this is all predictable and reducible, in practice, it isn't. Right. But now here's the point. It's like, how do we handle that? Our brains know how to handle this in a way that AI doesn't. And the way they handle this is like, you predict, you computationally reduce everything you can to begin with. And I'm actually, I've talked with Steve, you know, at some length about this. And I'm actually much more optimistic about how much is reducible than he is. And the things are like, your overall universe is not reducible, but it's full of these irreducible pieces. And in a way, evolution is accumulating. Our brain is an accumulation of these reducible pieces. So you do that. You want the machine learning to discover it. You want the inference to exploit it. But then, after that, you have to have no choice but to just keep gathering data and using that to inform your predictions, right? In a way, the physics goal of like I give you the initial conditions and then I just predict like the Laplace's demon dream. It is a dream. But I think that the problem that some of the complex systems people have not realized is that we don't have to do that. Ask any engineer, any aerospace engineer using a Kalman filter. What you do is you predict just what's going to, or reinforcement learning, right? It's like you want to have a sense of where you're going, but at every step of time, you recalibrate your predictions with a new data that comes in. So you actually only need to predict things well enough to control them, to make them predictable, right? We humans are always controlling the world to make it more predictable. And this is what robots need to do as well. And this is sort of like what I'm trying to, you know, support with a language like light and cell logic. increasingly more of a believer in kind of Hofstetter's, you know, concepts, right? That there are multiple levels of description. And even within a level of description, there may be multiple languages, you know, to describe things at that level. And I think part of the lesson is not only do we observe, like not only do we kind of observe a particular level, and sure, we try to reduce things and come up with theories at finer grain levels, higher resolution theories or whatever, But we also observe a certain layer and we're able to, by whatever sort of miraculous mechanism, to almost pull out of thin air to abduct a theory at this level. Like, here's thermodynamics. Somehow we came up with that, right? And even if we learn theories at lower levels or higher resolution theories, actually, most of the time you don't replace those older ones. It's like within their domain of operation, you know, Newtonian mechanics is still extremely useful for lots of things that have to do with our scale, right? Our scale of activity. GR is useful at a different scale. Quantum mechanics at a different scale. So we retain all these languages. And I'm hearing that tensor logic is a great language for a certain, you know, layer of description and for activities of AI. but you're not arguing that it's the language to sort of replace all other layers, right? Like you still buy into the idea that there are other languages that are different. I'm glad you asked that question. I am absolutely arguing that tensor logic is the language to use in all these layers. And let me give you some evidence towards that. Express relativity in tensor logic. It's tensors and, you know, differentials of tensors and whatnot. That's, you know, that tensor logic does that out of the box. Do the same thing with quantum mechanics. Do the same thing with all these others. with all of the different pieces of AI that I know. And why is that possible, and why does tensor logic do that? Again, I think this gets at a very deep fact about the universe, which, you know, complex systems, people and physicists have suspected as well, which is that the universe has this amazing property without which it would not be comprehensible, that you can have a lot of complexity at one level that then organizes itself into a new level at which now a different set of laws applies, right? And in a way, what we do with computers is do that by design. But here's the key. What you want is a language in which to express this process, the whole process by which multiple levels get created, by which multiple representations get created, including different representations at the same level. For example, going back to Simon, many people, at least in I, have believed that the essence of human intelligence is your ability to switch between representations as the problem dictates. And as long as you pick one representation, you've stuck yourself in the box. But at that level, TensorFlow is a meta-representation. It's a way to construct representations. And a large language model, to take a very excellent example, what has the Transformer learned when it looks at all that text? precisely I would say where a lot of its power come from is that it has looked, you know, it's like, you know, Seb Bubek says, like it has learned this soup of algorithms. There's all these different pieces and different ways of doing things that it has gathered from different places, and it doesn't choose between them. It's the prompting and the fine-tuning and whatnot that then pull out the parts that are better for one thing or another. So we absolutely have to do this in AI. I think it also reflects a deeper truth about the universe. I think there are going to be laws of this. We're not then describing laws of the universe, and I think tensor logic at least is my best attempt at having a language in order to do both this AI and this type of scientific discovery. I also believe, and I discussed that briefly in the paper, that tensor logic is not going to be just a good language for AI. It's going to be a good language for science in general for several reasons. One of them is this, but the other one is that if you look at the difference between the equations on the page and the resulting program from implementing them. Often there's a lot of complication. In tensor logic, it's almost, you know, the tensor equation is an almost symbol for symbol translation of the equation on the page. So now you can just do, you know, science, you know, on a different level. Also, if you look at scientific computing, right, it's usually these tensor operations with some logic wrapped around it. Tensor logic does the tensor operations and the logic in one language, but more importantly, the logic now becomes learnable. You can now learn the logic as well. Let me just challenge you on this because, for example, like in your paper when you got to the RNN section, right? Like, you know, TensorFlow logics can represent RNNs, but then you hacked in star T. You're like, oh, I need this little star T here. What's star T? Well, star T is a virtual index that doesn't create new memory. That's not TensorFlow logic. You hacked in star T because you needed that in order to express RNNs, right? No, no, no, no, no, no, no. Look, great question. So there's two very important things to distinguish here. One is, which star T is not, but let me mention that first. The RNNs also illustrate that, is syntactic sugar, right? You always have syntactic sugar because, for example, in an RNN, you want to express X of T plus 1, right? And I could, you know, tensor logic is too incomplete, but I don't have the T plus 1. It's a very simple piece of syntactic sugar to add. Why wouldn't I do that, right? Again, there's an 80-20 rule of which of these contracts you want to have. But the star T is actually a completely different thing. The star T is there for computational efficiency purposes. Star T is a hint about how to implement that tensor. That saves a ton of memory. And you know this notion of a leaky abstraction. All abstractions are leaky, famously, in computer science. Tensor logic is no exception For the most part when you write tensor logic YOU DON HAVE TO WORRY ABOUT WHAT GOES ON UNDER THE hood but sometimes you want to And this is precisely one of those things The idea of the star t is that we don have four loops anymore which is great forget all of that, but sometimes I don't want to be computing a new tensor or even just a new vector for every new thing that I do because that would be a waste of memory. The star t is just saying, you know, you have one vector and you reuse it at every iteration. So you have the initial x zero, and then x one is over, it overwrites that, right? So this is a piece of the language, right? You can do everything without it, but it would be silly to not use it. All right. So let me push back on something because you mentioned it twice now, which is like the Turing completeness. So your paper relies on like Siegelman's, you know, 1995 sort of paper. She herself now, like decades later, has admitted that that thing is a total toy that has no practical relevance whatsoever, okay, because it requires like infinite precision, rational registers that encode in a fractal way, etc. And by the way, in her paper, all she demonstrated was that under these infinite assumptions that she could build a particular RNN that was a universal Turing machine. The problem with you using that for your tensor logic is two things. One, that restricts the field over which you can have your tensors. It must be one of these fields that has like infinite precision. So infinite precision, rationals, or whatever. I can't use any other fields like no modular arithmetic, which is actually what runs on, you know, GPUs, for example. And secondly is it would restrict the actual structure of the weights to her universal Turing machine. Therefore, it wouldn't be a general purpose tensor logic. Do you realize this problem? No, no, no. So actually, there is no problem there. Let me tell you exactly why, right? And let's do this in three steps. First of all, Turing completeness doesn't matter at all whatsoever, because the only difference between a Turing machine and a finite state machine is the infinite tape. And in the real world, there is no infinite tape. So if you can implement... If it doesn't matter, why do you keep mentioning it? That's part two. That is part two, right? this is actually a very interesting set of questions so let's get a part of it so Turing completeness doesn't matter what matters is that you want to be able to express any computation that you might want, that's what matters you might choose a specific language for specific purposes you want that generality you have that generality irrespective of Turing completeness so this is part one, we can debate but let's set this aside for just a second Now, but, you know, the way I don't get to change the way computer science is and Turing completeness is a shorthand for universality. I just want to show people that tensor logic is universal. And now I have a proof that tensor logic is computational universals that does not rely on the on the Siegelman construct. Right. I chose to not publish in this paper because it would take too long. Right. The beauty of that is that in one paragraph, I can just say, look, the equation in the Siegelman paper. You can implement it here, and we're done, right? I can also, you know, there's so many ways to prove that things are true and complete. So I completely agree with you and her that that construct is ridiculous, right? It's silly, right? It has no practical significance. But the reason I use it is like it's just my way of telling people in one sentence that and why, you know, TensorFlow is true and complete, right? But the real action is really elsewhere. Well, hey, I'd love for you to share. I'd love to see the other proof. Oh, I can, I can, I mean, so, so, uh, actually, um, there's even more than one other type of proof that is possible. Let me, let me tell you what, what that one is. And what's, so here's two, you know, not just, so three ways. There's the Siegelman way, right? Another one is you have a finite control with access to an infinite external tape, right? That is a much more reasonable thing in my view, right? You have a memory. Yeah, absolutely. Memory is infinite, but all that you have to do in the tensor logic is know how to access that memory. So, like, it gets back. Remember, a Turing machine is a finite control and an infinite tape, right? So if the tensor logic can realize the finite control, which obviously it can, and you give it an infinite tape, then we're done, right? And then on that note, you can even just do it the following way, right? Which, for example, like, you know, Dale Sherman has a great paper about this. is there, you know, people have come up with various, very simple ways to set up a Turing universal, you know, computer. And one of them is there's a set of rules, right, that, you know, sets up that machine, right, without going into details. And that set of rules, you know, you can just write in TensorFlow logic without even, you know, having to wake up from your sleep. So there you go. I totally agree with you. Like, and I often say to people, I'm like, a Turing machine is just, and I really hate to use the word just because it just doesn't do justice to Turing, to Alan Turing, and like the genius of his, you know, creation, the theory of computation, right? But it's just a finite control with an unbounded read-write, you know, external memory. Totally on board with that. Absolutely, tensor logic is a finite control. But then you need to add to it these operations to, you know, manipulate external memory, right? So it's kind of tensor logic plus some operations to deal with external read-write memory, no? I mean, so those operations are just read, write, move left, and move right. That's all there is. I know, but that's an extension of, I mean, at least in my view, I mean, I don't know if before you there was such a thing as tensor logic. I'm not sure. I know that a lot of people have talked about tensors for like a decade or more, but, you know, it seems like some kind of an extension to the typical, it's certainly an extension to the way tensors are used in GR. There's no rewrite to external memory in that. Of course, but that is why tensor logic is more than tensors in mathematics, right? The tensors that people in mathematics just don't do this, right? But tensor logic does because of the logic programming side, right? If tensor logic can do logic programming, then it can do everything that a computer can. Have you specified fully like all the operators in tensor logic somewhere like on a website or something? There's only two, two, tensor project, or three, right? There's tensor projection, right? There's tensor join, and there's univariate linearities. And the low-linearities are crucial, right? Tensor algebra is multilinear. Algebra is linear, tensor is multilinear, right? Totally agree. Where do the memory operations fit in there? Are they projections? Are they joins? Are they... Oh, no, I mean, like, it's... They're not even projections or joins, right? I mean, think of a trivial projection where you're not summing things. You only have one, right? That's what a write is, right? Actually, let's not even worry about tensor joins and projects. Let's just think about, you know, propositional rules, which, of course, are what you... If you want to implement propositional rules in tensor logic, all that you need is tensors with no indices, with zero indices, right? So all you're dealing with is scalars. And the write, right, is just, you know, a rule that says, you know, the target of the writing is on the left-hand side and what you want to write is on the right-hand side. Now, to get very concreed to the issue of an infinite memory, right, what is an infinite memory? An infinite memory is just an infinite vector, right, indexed by the memory address. That's all it is, right? And so how do you write this infinite memory in tensor logic? You just have the memory as your tensor on the left-hand side. It's kind of so, you know, it's so simple, it almost, there's nothing to think about. I'll have to work through some examples. And so just to finish that thought, how do you advance the tape? Well, you just increment the index. And how do you move it left? You decrement the index. It's like it's done. Well, could we come up with a solid example? Because I don't think we sufficiently described the starty function. So roughly, as I understand it, rather than it becoming a dimension, it becomes a transition function. So we don't need to model the full trajectory. But just to give an example, if I wanted to compute, you know, let's say I want to write a function. to compute the nth digit of pi or to approximate it. Would I not need to fix the size of the tensors before, right? So the way I understand it is these things have a fixed size. So how could it possibly solve unbounded problems? No, very good. So to clarify, star t is not a function. Star t is a notation about an index. So for example, if I have a vector, you know, like, you know, x of i, right? Or a better example, a matrix, M of IJ, right? This occupies, you know, if I and J are each, you know, 100, this occupies 10,000, you know, positions in memory, right? But if what I do is M IJ star, right, on the left-hand side of my tensor equation, then this is just, you know, instead of being whatever, 100 by 100, it's just 100. Because what this is saying is like, If I put the star and the j, what I'm saying is run through the i, and for every j, you overwrite the result. You can do this in either dimension, so pick one, whichever one. It just says keep overwriting the results so you lose your old one. Let me put it this way. mij star is actually a vector, is a vector where the only dimension is i. j is actually just an iterator for a for loop you see what i'm saying and concretely for example in an iron n this is what you want because x right xi is your vector and the j let's call it t right x it at every new step in time when the state evolves you don't want i mean you could but in general you just want to overwrite the old state with a new one as in any state transition system Okay. Now you're, you know, does this make sense? It does, but you're describing an accumulator. And do you lose something by losing the history? Because if you think about it, you're overwriting what went before with new information and you're just, you know, unrolling in time. Do you lose anything doing that? Oh, of course you lose. So if you don't want to overwrite it, then don't put the star in. Right. But not to answer your question about pi. right what how would I you know compute all the digits of pi right in you know infinite turing machine land right I have a vector of the digits of pi that has a start but not an end right and what the computation in tensor logic does it computes every success it's so like we didn't talk about this but there's you know how is inference done in tensor logic for chaining or backward chaining for chaining is a general they are both generalizations of the corresponding operations in in symbolic AI. If you applied for chaining to a set of rules that computes the digits of pi, actually just one rule because it's very simple, what it will do is in each iteration, it will fill in the next digit of pi, right? Now, if your vector is infinite, this will go on forever as it should. If your vector is finite, well, at some point you run out of memory and you're satisfied with the number of digits, which is what we do with any real computer in the real world. I don't want to, I always get us bogged down into Turing issues. I think we should move on, but I think it'd be fun to talk about it, you know, more at another time, or just to work through some examples. I think I'll probably work through some examples. I think this was an interesting one. There's a strange attractor with Turing conversations, and normally it goes the Schmidhuber direction where, you know, the universe is finite. There's no difference between an FSA and a Turing. And I felt that we actually had some information gain in this conversation. Well, actually, you know, so on that point, and this is a bit of an aside, it doesn't actually have anything to do with tensor logic. So I hope you don't mind me asking. But since we have a computer science professor, like I want to just run something by with you, you know. So I always get this kind of pushback from people where I'll say, for example, you know, autoregressive transformers. And I mean classic autoregression, not extended autoregression, not generalized autoregressive transformers are not turning to bleed. like deep mind admits this and they write a paper showing how you can extend them to become you know turing complete so i'll say something like that and somebody will be like um oh yeah but you know if uh if if i can't do 100 digit multiplication with with this context size all i gotta do is just have more context and then and then i'll be able to do it and i keep making a point here's the crucial difference right between so and you brought this up beautifully when you said, look, a Turing machine is a finite control with an unbounded rewrite memory. And here's the really cool thing about those Turing machines is they can run in a way where they're churning, churning, churning, churning, and then they say, out of memory. And all you got to do is just give them more memory and hit continue. You don't have to reprogram them. You don't have to retrain them when you've increased their contact size, right? That's the whole difference is that with a neural network, a traditional transformer, if you increase its context size, go back to the training board, you got to retrain it, right? Because you've run out of memory. Is that a fair point that I'm making? So this is actually extraordinarily simple. And it's to me incredibly frustrating that there's so much confusion about it, starting with computer science and theoretical computer science and now playing out in AI and transformer land. And, you you know, it just boils down to this, right? You said earlier, and I violently agree that, and correct me if I misinterpreted, but you said like, Turing completeness is not important, but that shouldn't cause us to underrate Turing's achievement. Absolutely. What was Turing's achievement that we now take for granted? Turing's achievement was, for which he is deservedly famous, right is to postulate this notion of a universal machine the amazing thing about computers is that the universal machine which in his time was an ex was a completely counterintuitive notion what do you mean a machine that can do everything the typewriter can type you know you know like the sewing machine can sew you're telling me there's a machine that can type with one hand and so with the other what are you talking about so like this is the genius right so first step you want to have this property of having a machine that can do anything. This is the foundation of computer science as of computers as a revolutionary technology, right? So point one. But point two and getting to the transformer part, right? I don't know. And unfortunately, these confusions then they build on each other and never get, it's one of those symmetry breakings, right? We went down this road of defining things a certain way and worrying about infinity and now we're stuck there, right? NP completeness is another example, but ignoring that. So the The problem with transformers, so like the real problem is the following. It's like, oh, but if you only have this many blocks, then you can only do so many computations. The thing, for example, that inductive logic programming has and we want is that you can learn things from very small examples. Like children do in elementary school. You learn to do addition on tiny examples, but then if needed, you can do addition on numbers of any length. Of course, your life is finite. You will never add infinite numbers, but that's not the point. Infinite is just a shorthand for something that's so large, it doesn't matter how large it is. And what I want in machine learning is to precisely be able to learn to handle problems, graphs, structures, knowledge bases, inference problems, whatever, of any size from very small ones. That's the limitation that a lot of these transformers have. And that's the one that you want to fix and can't fix, and TensorFlow logic helps you do that. With Venmo Stash, a taco in one hand, and ordering a ride in the other means you're stacking cash back. Nice. Get up to 5% cash back with Venmo Stash on your favorite brands when you pay with your Venmo debit card. From takeout to ride shares, entertainment and more, pick a bundle with your go-tos and start earning cash back at those brands. Earn more cash when you do more with Stash. Venmo Stash terms and exclusions apply. Max $100 cash back per month. See terms at Venmo.me slash Stash terms. Toast the holidays in a new way and raise a glass of rumchata, a delicious creamy blend of horchata with rum. Enjoy it over ice or in your coffee. Rumchata, your holiday cocktails just got sweeter. Tap or click the banner for more. Drink responsibly. Caribbean rum with real dairy cream, natural and artificial flavors. Alcohol 13.75% by volume, 27.5 proof. Copyright 2025 Agave Loco Brands, Powaukee, Wisconsin. All rights reserved. Just to cap off the discussion about Alan Turing, because I think he deserves us mentioning this, you mentioned that this was the real achievement, this universality. And I mean, it wasn't just a machine to do a typing, can't do this and that. It was even within computation, right? In his time, people didn't know this. They're like, well, what if I have a machine that just has a separate read tape and a separate write tape? I don't know. Well, how about if we add two write tapes? Does that make it more powerful? What if it's read-write? What if it's just a stack? What if it's lambda calculus? What if it's, there were so many myriad of, you know, lag systems, blah, blah, blah, all these different computational models, right? And nobody knew that they were all equivalent. And that was the real, you know, remarkable achievement. No, very good. And I mean, and to be fair, you know, Turing wasn't the only one doing things like this. And precisely now we know that there are all these things that are equivalent and then extensions on that power. But here's actually a really important point, right? The question that has been on my mind for decades is this. a Turing machine is a model of deduction. It's universal deduction. What we're missing to be able to do what the universe does and evolution does is universal induction. What is the Turing machine equivalent for induction, for learning? That's what I'm after. That's what the master algorithm is. And I know it exists. And again, just as you can have a million different versions of Turing machines that are all equivalent, you can have a million different versions of the master algorithm that are all equivalent, and that's okay. The point is that first we have to realize that there is one. We have to prove what it does. And then we can refine it with the syntactic sugars and whatnot. And that's all good. But the main point is having, you know, gotten the universal induction machine, which I think we are pretty close to. But Pedro, I know the answer. It's Bayesian tensor logic. No, I'm just kidding. No, if you're Bayesian, it is Bayesian tensor logic. This is a good segue because we are talking about reasoning and deduction. and transformers they don't really reason right um i i think of them as a kind of collection of fractured bits of knowledge maybe with a little bit of understanding two levels down but we understand many levels down and when we do reasoning what we're doing is we are respecting all of the constraints of this epistemic understanding phylogeny thing that we have and that allows us to build new knowledge right because you can build new knowledge you can create new things when you respect all of the understanding that you already have and transformers don't do that but let's talk about how this works in intensive logic so you have this temperature parameter so for example you could do something akin to deduction even in an embedding space right and and certainly with an mlp and this is where i was a bit confused because i i can i can appreciate that if we have a logical model which is in the domain of certainty we can do deduction right and then if we have something like an MLP and and we learn the weights and we turn this temperature parameter up right so it's it's actually introducing some degree of randomness why would that be anything like the kind of logical deductive reasoning we do would that not just do what neural networks do now which is they just look for similarity in some embedding space and the type of reasoning it's doing isn't actually semantically meaningful at all. I would actually say that, like, of all things in the paper, this is the most exciting and important one, is that you can do sound and transparent reasoning in embedding space with tensor logic. And how come, right? Why is that possible? And to just sort of, like, give the gist of it, here's the key, right? Is, you know, think of kernel... Let's go to kernel machines for just a second. In, like, the gram matrix, right? The similarity matrix, what is it? You're in feature space, and it's for every pair of objects IJ, the dot product of their feature representations. And now, if you embed all your objects, we already know how to do that. There's a matrix with the embedding vector for every object, whether it's a word or a token or anything else. Now, I can do the dot product of the embeddings of two objects. And let's suppose they're all unique vectors to keep things simple. right? And now what happens is that if I, you know, and let's, for the moment, let's say you're not even learning the embeddings yet, right? Let's say you just have random vectors, right? Your embeddings are random, right? That's actually already useful for a lot of things, but of course it's not where the action is, right? And now there's the following very interesting property, which is the dot product of a vector with itself is one, but the dot product of two random vectors in the high dimensional space is approximately zero. So your gram matrix, your similarity matrix, will be approximately the identity matrix. Okay? And now what happens, like, if I have a tensor logic rule that operates in this way and then it has something like a sigma non-linearity, right, then what's going to happen is that it's going to clean out that noise and it turns into the identity matrix, right? And now I have all these rules that are just operating in a purely logical mode, right? The Boolean, it's Boolean tensors going in, meaning relations, right? and its Boolean tenses going out. So that way you can do pure deduction in embedding space with these random embedding vectors. That's already something interesting. But now let's say you learn the embeddings, which of course is the whole point. When you learn the embeddings, what's going to happen by trying to minimize the loss function is that the embedding vectors of objects about which you tend to make the same inferences will get closer. Because if I'm inferring something about one object and that one is similar, or like this, like, to, you know, the gradient descent to minimize the loss is going to make them, you know, is going to increase their dot product. So you're going to wind up with a similarity matrix that has high values for objects that are quite similar, right, in the limit one, in the diagonal, and has low values for objects that are quite dissimilar. And now if you turn the temperature parameter meaning the stiffness of the sigmoid right at one extreme at zero temperature you have a step function And the sigmoid to make is discretized back to zero one So at the zero temperature extreme, you have pure deduction. But this is very... You see where I'm going with this? I do, but could I challenge a tiny bit? So when we train neural networks, we think reasoning is good. when we are building, you know, let's say we'll use the Lego analogy. So we're building these blocks and the new understanding tree that we've created is a good one if it represents the world in an abstract causal way. So I can see how you've, you know, framed this as deduction in the sense that, you know, you've got this Boolean operation and you can build from it. But what if you're building on a sandcastle? What if the component, let's say it's an MLP component, what if it just doesn't represent the way the world works? So very good. So like, again, there's more than one thing you can do with tensor logic. One of them is you can just reimplement existing things like MLPs and transformers and whatnot. And if all that you did was reimplement them, it will have all their pros and cons, right? It's the same thing, just implemented much more elegantly, blah, blah, right? What I'm talking about here and talk about in that section of the paper is doing something different. It's not an MLP. It's not a transformer. It's actually doing these things of like you embed objects You embed relations in a certain way that falls from the object. You embed the rules. You embed the reasoning, right? So this is a different process. What this different process allows you to do is that when you raise the temperature, you get to do analogical reasoning. You know, Douglas of Stader came up before. Douglas of Stader, I think, would like this because it's an analogical reason. He has this whole 500-page book arguing that all of cognition is just analogy, right? And again, this is one of the schools of thought, like this is one of the tribes in the master algorithm is reasoning by analogy. You do reasoning by analogy because what happens is you generalize from one object to an object that has a high dot product with it. So now I get to borrow inferences from similar objects. And the higher the temperature, the looser the inferences, the more analogical inferences can be. But for example, and again, Douglas goes into this in some of his books, And any mathematician, like Terence Stahl the other day, I just heard him say this, right? It's like, mathematicians reason by analogy. They notice similarities between things. But at the end of the day, you need to have a proof. In tensor logic, in this scheme, in this particular scheme of reasoning in embedding space, this is just simulated annealing. You start out with a high temperature being very analogical, and then you lower it. At the end of the day, you have a proof. It's a deductive proof that is guaranteed to be correct. but you couldn't have gotten to it because the search space is so large without the analogical part right okay but i i understand what you're saying so you you can generalize reasoning out you know outside the domain of certainty but the question i'm asking the reason why we have metaphor and analogy is there's this incredible process of evolution and intelligence and it's led to the coarse graining of all of these concepts that we use in our language and there's this rich beautiful phylogeny that kind of represents the causal reality of what's happened. And why is statistical similarity the same thing as analogy? Oh, it's not. So again, I skipped over some steps here. It isn't, right? So analogy, so the most powerful type of analysis. So kernel machines, in some sense, are the least powerful type of analogy. It's just, oh, here's a similarity or nearest neighbor, right? I have a distance function, that's not really where the action is. The action is in what is called structure mapping. Structure mapping was this thing proposed by Deidre Gentner, where you solve a problem by mapping its structure to the structure of problems that you know. And the canonical example is Niels Bohr's model of the atom, which he came up with by an analogy between an atom and the solar system. The nucleus is the sun, the planets are the electrons. Turns out to be a bad analogy, but it was crucial in the development of physics, right? And there's also this whole subfield of AI called case-based reasoning, where I'm a help desk, you come up with a problem, and I don't try to solve it from scratch because I don't need to. That would be wasteful. I go to my database of similar cases, and I find one, and then I tweak it. So structure mapping is an extraordinarily powerful thing, but it's this combination of similarity and compositionality, which kernel machines per se don't have. But tensor logic does. The point in tensor logic is that you do have all the parts of the kernel machines but all the compositionality of the symbolic AI. So again, the structure mapping just comes out of the box. You don't need to do anything more to have structure mapping and all the power of analogical reasoning that comes with it. Yeah, could I suggest a good analogy is to ad lib? Do you think that's fair? It's like you've got the general structure there and you can plug in parts into the blank spaces and you get a solution, right? That's one mode in which things can function, right? You can also... Okay. Yeah. The whole process of structure mapping of case-based region can actually be very rich. I can combine, for example, two big pieces, but that's one example, yeah. Oh, yeah, yeah. No, that's fair. I mean, yeah, it has this nice nested structure property. While we're on this topic, let me ask you about something that I was confused about in the paper. So I don't understand your connection between hallucination and deduction or determinism. Because in my mind, I can set the temperature to GPT to zero and it still hallucinates. And I can have a poor deductive system that hallucinates all kinds of things. So to me, those are separate problems. I just kind of misunderstood. Very good. So precisely the problem or one of the problems with GPT is that it hallucinates even when you set the temperature to zero. What the hell? Right. I want to have a mode. Yeah, yeah, yeah. Right. Not I, but like every Fortune 500 company, if it's going to use AI, needs to have a mode where the logic of the business is just to bathe. The security isn't violated. The customer doesn't get lied to, et cetera, et cetera. We got to have that or AI that they will not take off. Right. And Transformers can't do that. Tense logic can do that precisely because in this, you know, reasoning-inimating space, you know, mode that I just described, if you set the temperature to zero, it does purely deductive reasoning. And by the way, the temperature can be different for each rule. And I think this is what almost all applications are going to have, is like there are some rules that are the mathematical truths or logic that you must guarantee will not be violated. They are the laws, right? And those have infinite temperature. And then there's all these others that are more qualitative reasoning and like more accumulating evidence, maybe stuff that you mine from the web. And those, you know, those will have higher temperature. And that temperature parameter, you know, can be learned in some rules and not others, right? So now you have this whole spectrum between the deductive and the more, you know, or even fantasizing, like truly hallucinating at the far end of the high temperature, right? But precisely the point that I'm making in the paper is that, you know, with LLMs, the best that you can get at zero temperature is still a lot of hallucinations. Then there's things like rag, but all they do is retrieve, and even then you still hallucinate, right? Compare, you know, tensor logic in this mode with rag, right? And it doesn't just retrieve things. It computes the deductive closure of your knowledge, which is an exponentially more powerful thing to have, right? And with zero hallucinations. Well, it is if the model represents the world. You know, because what does hallucination mean? Or actually, what does slop mean? My definition of slop is when a creative artifact is produced by something that doesn't understand. So if I understand the domain deeply, that artifact looks incoherent to me because it was generated by a process that doesn't understand the world. And isn't it even the same with TensorFlow logic that, you know, deduction is great, but if the model isn't a good one, then wouldn't that just be a hallucination as well? Oh, absolutely. But let's make some distinctions here, right? The only claim I'm making here, because that's the only one you can make, is that tensor logic at zero temperature in this mode will give you the soundness properties that logic has, right? Soundness in the technical sense of soundness. All that means is that you only reach conclusions that truly logically follow from the premises. You don't say anything about whether the premises are valid or not. If the premises were hallucinated, so will the conclusions be, right? There's no magic there, right? But that is a very important property to have. Again, if I give to a transformer a bunch of, you know, true facts, it still hallucinates. And that's what I can guarantee will not happen in tensor logic. Now, coming up with the true facts, well, that's a different part of the game. You can write them down. You can learn them. You can refine them. You never know for sure if you have the perfect model. And, of course, that's more the machine learning and knowledge acquisition part, right? So I do, I think, have a very important guarantee here of non-hallucination. but it's not a guarantee that the model that you're working on came from the real world. That's a whole other neck of the woods. Who's going to adopt this first? How are we going to bootstrap this as a community? How do you see this progressing? Very good. So the last section in the paper is discussing adoption and what needs to happen and things like that. Let's suppose that everybody agrees tensor logic is a beautiful, perfect language and what we need for AI. Just for that reason, that would not be enough to make it take off, sadly. right? Because, you know, there's a very, you know, things get, you know, people are still using COBOL these days, right? I rest my case, right? So legacy, there's this irony in computer science or in the information technology and so the like, it moves faster than anything else, but at the same time, you know, things never die, right? You can't kill them. You can't kill COBOL, right? And I really do believe, you know, I like Python, right? I program in Python. It's very nice in many ways better than Fortran for some things, et cetera, right? Even though it was never, or NumPy, if you will, but you get the point, right? It's like, for AI, it's just a terrible thing. But like, I'm a Python programmer, like I, you know, general is like, okay, your tensor logic is nice. I'm not going to rewrite all my code, right? Forget that, right? So what is going to make it happen, right? But now we can look at what has made this happen in the past, right? And it's several things. One is that, for example, look at how Java took off, right? Java took off at the time of the internet because it was the language of networking. Allegedly, you could debate that, but like people wanted to do things that it was, you know, very hard to do with, you know, things like C and blah, blah, blah, and C, right? And so Java took off, right? And we are in exactly... It was the language of embedded programs and web browsers. Like that was the only option we had, right? Exactly, right? So, and, you know, there's big arguments about this, but not relevant to us here. The point I'm trying to make here is we are precisely at a, also very relevant. Why did languages like LISC and ProGuard, you know, fall out, right? Because they were better for AI than, you know, Fortran or C or whatever, right? Or Java. Is that like, they were niche languages. And the network effects of the more widely used languages and all their aspects just completely overrode that, right? We understand that very well now. People didn't in the 80s. But now we're in a different ballgame now. Now the big technology, the center of everything is AI, right? If you have a better language for AI, that is the one that is going to have the biggest users. And moreover, if you have a language that solves the big pains, right, to adopt a new language or a new anything, right, you know, like a new app, right, it needs to solve some big pain, right? Is there a big pain that TensorFlow logic solves? Well, hell yeah, it solves solution, potentially, okay? It solves all this is subject to empirical verification, but it potentially solves hallucination. It solves the opacity. Right. Like we're in this world right now where there's like multi-billion corporations and systems that are like they're driven by this black box. And nobody I've talked with CEOs of big tech companies that say, like, you know, I can't sleep at night because I don't know what this thing is going to do. And the people who trained it have left the company and who knows it. So if we can make a dent in that, people will converge to it very, very quickly. Also, I think when people have the experience of how easy it is to use TensorFlow compared to the big pile of stuff that lies under PyTorch and whatnot, I think they will actually be very, very motivated to migrate very quickly. And then, you know, like there's several things. There's like developing the open source community and vendor competition and whatnot. But there's a couple of other important things here, one of which is the following. TensorLogic is ideally suited for AI education. It's one language which has very little extraneous stuff, and you can teach the entire gamut of AI very well and do the exercise. It'll be a language that the professors, the TAs, and the students will like. And history shows, going back to things like Unix, that if you have something like that that takes off in computer science education, then people go to industry and say, like, I want to use this because it's what's good, it's what I like. And a generation later, it's what everybody is using. And one more thing is the following. The transition to tensor logic from Python doesn't have to happen all at once. You can have, for example, and I already have actually, and others have, again, because it's very easy to do. You read the paper and you do that in the next whatever, 30 minutes. You can write a preprocessor that just converts tensor equations into Python. And again, all it does is a one-to-one mapping between the syntax of tensor logic and INSUM. Then making things efficiently, as we discussed, is another matter. but from this point of view of developer uptake, right? And there's a long history of people doing this with different languages, right? It's like you have a preprocessor that lets you write into equations, but then it converts those equations into PyTorch or Python or just NumPy, let's say Python, right? And then you do everything else in Python that you did before. You don't lose anything. You don't lose any existing code. It's just that a set of things, and in particular, reasoning, have now become much easier than they were before. And then once you have this little ball, people are like, oh, but I can do this, and let me have that piece of syntactic sugar. And before you know it, people are like, well, I don't need all that Python stuff anymore. I'd just rather live in TensorFlow. The Subaru Share the Love event is on from November 20th to January 2nd. For 18 years, Subaru and its retailers have supported over 2,700 local charities through the Share the Love event. When you purchase or lease a new vehicle during the 2025 Subaru Share the Love event, Subaru and its retailers will make a minimum $300 donation to charity. Visit Subaru.com slash share to learn more. So good, so good, so good. Give big, save big with Rack Friday deals at Nordstrom Rack. For a limited time, take an extra 40% off Red Tag Clearance for total savings up to 75% off. Save on gifts for everyone on your list from brands like Vince, Cole Haunt, Sam Edelman, and more. All sales final and restrictions apply. The best stuff goes fast. So bring your gift list and your wish list to your nearest Nordstrom rack today. So you said about AI for education and TensorFlow, it's a declarative language, which means it's the what, not the how. It's this incredible coarse graining that screens off a lot of unnecessary detail. But is it unnecessary, I guess is the question. Do you think that people learning about AI should know about how the underlying things work? And certainly folks working at Google, they might need to do some domain-specific optimizations for certain components of the machine behind the scenes. Do you think that we can screen off all that detail? Great question, but actually let me start by correcting something. Tensor logic, like Prolog and Datalog, has actually both declarative and procedural semantics. You can look at a tensor equation, that's actually the whole beauty of logic programming in some senses, you can look at a tensor logic equation as like, it's an equation. It's like Einstein's equation, it's a statement about the world. But you can also look at it and treat it as a function call. The left hand side is the call and the right hand side is the body, the body, which is a bunch of other calls on the way to combine them. So you can, and in fact, most of the time, in my experience that I've used TensLogic so far, I tend to use it in procedural mode. It's a set of equations. It's a bunch of statements like you would have in any imperative language. So very important to bear that in mind. Now, but to the heart of your question, which I think is very important, when you're teaching people something. I mean, I would actually say this is the tragedy of computer science education from high school to intro courses to the most advanced things is that you want to teach them the beauty of what you can do and the essence of the algorithms and so on. But then you and particularly they, the students, they spend all their time bogged down in all this crap, all these details where you get the semicolon wrong and the program doesn't work anymore and they hate it. And they decide that computer science is not for them. Or at best, they waste 10 times more time than they should, right? So precisely the whole point of having the right abstraction is to avoid that. Now, so I would say this is one of the best features of tensor logic is to do that for AI. Now, but you also say correctly that like, well, a lot of the time, you need to go beyond that level of abstraction. And for example, from a point of view of efficiency and so on and a lot of things, right? But I would say, and again, you know, like, we won't know until tensor logic is used widely and we see what happens, but tensor logic is a language that's at some level. It's like C, right? It's very low level. The beauty in my mind, again, this gets back to multiple levels of life, you can use it to say very high level things. You can also to express, you can also to express the lowest level possible computations, right? Like a tensor equation is something that you can map onto a GPU with almost no change, right? And then optimize the heck out of, right? Like, in fact, you know, I've joked like with folks at NVIDIA that, you know, CUDA is a nice moat, but TensorFlow could be the end of that moat. I sometimes feel like, and I'm not sure exactly how much money was spent on bigger and bigger transformers, you know, deeper and deeper, you know, wider and more data and whatever transform, more parameter transformers. But it's got to be a lot, like a trillion dollars or something like that. And I feel like sometimes we've spent a trillion dollars to learn yet again lessons that people could have learned if they had taken certain, you know, basic courses in computer science. Like, I'm wondering if you sometimes feel like that and what lessons, if any, you think people should have known before spending a trillion dollars. I violently agree with that. In fact, the paradox of the current moment in AI is that on the one hand, this is super exciting, right? This is what we've worked all our lives towards, right? It's like the dream is happening. I used to tell people, you know, when I went into grad school, that one day machine learning is going to take over the world and be like, what? And I'm like, see, it is taking over the world. There, take that. So on a more serious note, like Transformers are a great leap forward. And, you know, anybody who's used the chatbot is like, wow, look at the things this can do. This is great. But at the same time, the sheer amount of like wastefulness and stupidity and ignorance going on is just unbelievable. It's like, why are you reinventing? Why? For example, I've talked with people at, for example, OpenAID that do the reasoning. And many of them are very good people, so I'm not trying to pick on anybody. But it's like, oh, what is reasoning? We need to figure that out. And then they say a bunch of stuff that is completely wrong. And I'm thinking to myself, why don't you spend an afternoon reading a couple of chapters of Russell and Norvig and save $100 billion in wasted compute? Please, just do that, right? And in a way, you know, part of what I'm trying to do with TensorFlow is make things go in that direction because the current direction is just too damn painful. And it's not just that it's painful. This is going to end badly. Right. People are spending all right. It's like in a way, like, you know, spending all this money on data centers is not wasted because it's not like the fiber that went dark. Right. We in AI have an appetite for unlimited compute. Right. But they're spending all this money prematurely on stuff that isn't ready for that yet. Right. The demand is probably not going to be there. And we're going to look back onto the angle like, wow, 99.9% of that was completely wasted because of a lot of the reasons that we've been talking about, including like you didn't know how to do reasoning. So you brute force that, et cetera, et cetera. So like, you know, we got to change the direction of this ship. it's like that well-known quote from um you know matt damon and goodwill hunting right like uh you know to paraphrase it you know you've wasted a trillion dollars on an education you could have got for a buck 50 in late fees at the library exactly exactly well uh professor pedro dominguez it's an absolute honor to have you on the show thank you so much for joining thanks for having me thank you so you want to start a business you might think you need a team of people and fancy tech skills, but you don't. You just need GoDaddy Arrow. I'm Walton Goggins, and as an actor, I'm an expert in looking like I know what I'm doing. GoDaddy Arrow uses AI to create everything you need to grow a business. It'll make you a unique logo. It'll create a custom website. It'll write social posts for you and even set you up with a social media calendar. Get started at GoDaddy.com slash arrow. That's GoDaddy.com slash A-I-R-O. Hey, Ryan Reynolds here. Wishing you a very happy half-off holiday. Because right now, Mint Mobile is offering you the gift of 50% off unlimited. To be clear, that's half the price, not half the service. Mint is still premium unlimited wireless for a great price. So, that means a half day. Yeah? Give it a try at mintmobile.com slash switch. Upfront payment of $45 for three-month plan equivalent to $15 per month required. New customer offer for first three months only. Speed slow after 35 gigabytes of network's busy. Taxes and fees extra. CementMobile.com. Always a pleasure.

Share on XShare on LinkedIn

Related Episodes

Comments
?

No comments yet

Be the first to comment

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies