AutoGrad Changed Everything (Not Transformers) [Dr. Jeff Beck]

Machine Learning Street Talk · 18.4K views · 640 likes

Analysis Summary

30% Low Influence

mildmoderatesevere

“Be aware that the guest's framing of Bayesian inference as the 'only right way' is a philosophical stance presented with the weight of mathematical certainty to make alternative AI approaches seem fundamentally flawed.”

Transparency Mostly Transparent

Primary technique

Human Detected

100%

Signals

The content is a long-form interview featuring natural human speech patterns, including authentic disfluencies, personal history, and spontaneous intellectual exchange. There are no signs of synthetic narration or AI-generated scripting.

Speech Disfluencies Transcript contains natural stutters, filler words ('uh', 'um'), and mid-sentence corrections ('I I I uh was at a talk').

Personal Anecdotes The speaker references specific personal memories, such as attending a talk by Zuben Ghahramani and his PhD studies at Northwestern.

Conversational Dynamics Natural back-and-forth interruptions, agreement cues ('Yeah', 'Right'), and collaborative building of ideas between the host and guest.

Domain Expertise Nuance The speaker provides highly specific, non-formulaic explanations of Bayesian inference and sensory motor tasks that reflect deep subject matter expertise.

Worth Noting

Positive elements

This video provides a high-level technical explanation of how automatic differentiation and Bayesian priors function as the bedrock of modern machine learning and cognitive science.

Be Aware

Cautionary elements

The use of 'revelation framing' where the guest presents a specific mathematical framework (Bayesianism) as the 'only right way' to view the world, which may lead viewers to dismiss valid alternative theories of mind.

Influence Dimensions

How are these scored?

About this analysis

Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.

This analysis is a tool for your own thinking — what you do with it is up to you.

Analyzed March 23, 2026 at 20:38 UTC Model google/gemini-3-flash-preview-20251217 Prompt Pack bouncer_influence_analyzer 2026-03-08a App Version 0.1.0

Transcript

So my PhD is in mathematics uh from Northwestern University. I studied pattern formation in complex systems in particular combustion synthesis which is all about burning things that don't ever enter the gaseous phase. Basian inference provides us with like a normative approach to empirical inquiry and encapsulates the scientific method at large. Right? I just believe it's the right way to think about the empirical world. I I remember I was I I I uh was at a talk many years ago by Zuben Garammani um and he was explaining deer the durisho process prior. This is when the Chinese restaurant process and all that stuff was like relatively new. Um, and his explanation of it, it so resonated with me in in terms of like, oh my gosh, this is the algorithm that summarizes [music] what what the science how the scientific method actually works, right? You get some data, right? You get then you get some new data and you sort of say, oh, how is it like the old data? And if it's similar enough, then you sort of lump them together and then you sort and you build theories and you properly test hypothesis in the fashion. That's that that's that's the essence of the basian approach is it's about explicit hypothesis testing and explicit models in particular generative models of of the world conditioned on those hypotheses. >> It I I believe it is it is the only right way to think about how the world works and it's the and and and it encapsulates the the structure of the scientific method. I mean, if I'm being perfectly honest, what actually convinced me the brain was the the brain was basian had a lot more to do with behavioral experiments done by other people. My principal focus was on well, how does the brain actually do this? So, I'm referring to experiments, you know, showing that like humans and animals do optimal Q combination. We're surprisingly efficient in in terms of like the information that comes using the information that comes into our brains with regards to again these low-level sensory motor tasks. >> Oh, interesting. So it's almost like we we're so efficient that the only explanation that makes sense is that we must be doing basian analysis. >> Yeah. More or less. I mean it's a bit more precise than that. It's it's not just efficiency. It's you know like the Q combination experiments I think are really compelling. And so the idea behind a Q combination experiment is that I give you two pieces of information about the same thing. Um and one one piece of information is more reliable than the other. And the degree of reliability changes on a trial bytrial basis. So you never know a priori that like say the visual cue as opposed to the auditory queue is going to be the more reliable thing. And yet nonetheless when people combine those two pieces of information they take into account the the relative reliability on a trial bytrial basis and that means that they're optimal in a sense. Now we have to like be super careful with our words. They're relatively optimal because they're not actually using 100% of the information that the computer like your the visual information that you use. You don't use 100% of the information that the computer provided you >> right but you know there is some loss between the computer screen and your brain mediated in principle by but it the system behaves as if right it has optimally combined those two cues. It has taken into account uncertainty. This also is because like how we really do think about the world like we take into account uncertainty all the time in our decisions. Right? You know this if you're ever driven in the fog, you're aware of this. >> The 90 90% of what the brain does is decide what to ignore, >> right? And because if if we didn't, right, we'd be screwed, right? We get we receive an insane amount of information, most of which does not even we don't even bother to process, right? So >> yeah. >> Yeah. >> Is is that definitely the case though? Do do you think that we could actually be processing more information than we know? We are definitely processing more information than comes out in behavior. >> Yeah. Um and a lot of that is is because you know we are continually learning and like learning you know you have to you know you you close your eyes for five years and your visual system decays >> right you lose fidelity right it it forgets it requires constant input simply to maintain this understanding of like the low-level statistics of the visual world right with like without input like you're you know you're w so the question is is that is that using all the information or is it just using like the low-level information and it's information that we don't like directly perceive but is still that is still but is still definitely being used in a sense >> when it comes to you know you know but it's but what is it being used for it's being used to track these sort of low-level statistics that that we sometimes need but don't always need and so this is why I say that that like you know when we say context matters you know you can think of that in terms of like we're able to flexibly switch between tasks which means having a lot of resources and having a lot, you know, maintained and having them still be in good working order just in case we need them, right? And this is why like these self-supervised or unsupervised learning approaches that are ubiquitous for like starting, you know, getting your LLM to give you, you know, sort of your reasonable prior over over uh language is the sort of stuff that your brain is definitely doing. So in a sense it is using everything >> but it's not really using all of the information that's present. Right? That's sort of I think the argument that I want to make. >> The idea of having to traffic in squishy people in order to make our systems go is not immediately appealing. Let's put it that way. >> This episode is sponsored by Prolific. >> Let's get few quality examples in. Let's get the right humans in to get the right quality of human feedback in. So, so we're trying to make human data or human feedback, we treat it as an infrastructure problem. We try to make it accessible. We're making it cheaper. We effectively democratize access to this data. >> What do you think about these broad sort of metaphorical idealizations? You know, the the big one is that the brain is a computer. The probably the the more popular one is that the brain is a prediction machine. It will always be the case that our explanation for how the brain works will be by analogy to the most sophisticated technology that we have. Is that how's that for a non-answer, right? So, [laughter] so you know, you know, a couple thousand years ago, right? How'd the brain work? It was like levers and pulleys, man. I mean, duh. Don't be ridiculous. Why? That was the, you know, at some point in the middle ages, it became humors, right? Because fluid dynamics was like the you know was the kind of techn you know the technology that was like the most advanced or technology that took advantage of of water power was like the most advanced technology that we had. Now the most advanced technology is computers. So duh that's exactly how the brain works. >> Philosophers used to think that the universe was a machine. >> Mhm. >> And we we interviewed Chsky about this as well, you know, because he talks about the the ghost in in the machine, you know, and and the ghost is all of the bits in the machine that we don't understand. But um do you think now that that we can think of the universe as a machine? >> I think that that that is a very convenient way to think of the universe. Right? So when we model the universe as having like causal structure, right? Do we do so because it actually has causal structure or because that's a really convenient class of models with which to work? I think that it's you know it it has causal structure, right? But also it's a convenient class of models, right? So like a good example is large language models, right? So they're all, you know, most most but not all are auto reggressive in [music] terms of their predictions. All right. Well, why? Like why is it auto? Oh, it's because it's mathematically convenient. It's a compact way to like take the past and make a prediction about about the future. [snorts] Does it mean that that's actually the way language works? No, I don't think it's actually the way language works, but it's a it's a computationally convenient model. In physics we have like there are in fact like momentum is a good example like why why do we need momentum in order to describe we don't you know we don't observe momentum directly right we only you know you're just looking at videos you just you know the position of the ball right um you know you want to infer the velocity well you just take the difference between two adjacent positions and then that gives you but you don't ever directly observe like the momentum and this is you know in a mechanical in a mechanical uh setting. So why did we choose momentum? Well, we chose momentum because that's the variable that that you know that if we knew what it if we knew what momentum was now everything is marovian, right? Everything is it's a now there's like a simple like causal model that describes how the world works. We picked that model because we picked that particular hidden variable because it's what rendered the model causal. Does that mean that's how the the universe works or was that just a a a computationally convenient choice? I'm gonna stay agnostic on that one. But I do like that it's a computational that ended up working out. Right. So, >> and just quickly riff on the the benefits of having models that preference causal relationships. >> So, the nice thing about So, when you have a causal relationship, it reduces the number of variables you have to worry about and track. That's the beauty of having a cause. It's like it's like a marov. It's the same it's the same argument with with momentum and and marov models. We chose to have that hidden variable because it's the thing that made the model simpler, right? Right? It made the calculations easy. Now we can just like go forward in time, just make predictions in a in a in a totally like iterative fashion. That's what makes causal models great. The other thing that makes causal models great is if you do ever intend to sort of, you know, act or behave, right? Then you still need to be, you know, you need to be able to um predict the consequences of your action. the the more tightly linked your actions or your affordances are to the things that causally impact the world, the more effective those actions are with respect to your model, but hopefully also with respect to reality. And so we we prefer causal models, you know, in part because they are relative, you know, relatively speaking, simpler to execute, right, in in in a simulation form, but also because they they point directly to well, where should I intervene, you know, where should I go in and and you know, and how should I choose my like series of actions that will give me the desired lead me to the desired uh conclusion or goal? >> What's the difference between micro micro um causation and macro causation? I think the difference between micro and macro is a single letter. >> No. Um, so >> we could just model the light cone at the particle level. >> Oh yeah. So, so that's how physicists. Yeah. >> Yeah. I mean that's the way physicists see the world and and we see the world in terms of populations and people and all these macroscopic things and and we still reasonably do experiments and we do interventions and we we do randomization. >> To truly identify a causal relationship, you have to do an intervention, right? Um, you know, the classic example, this is also in the in lung cancer, right? It's like, I forget how long ago this was, but at one point there was this belief that alcoholism caused lung cancer, but it was actually because they were in poor health because they were alcoholics and they smoked a lot more than the rest of the population. Right? So, you do need to do that kind of intervention to discover a causal relationship. However, right, the causal relationships that we care about are the ones that mesh with our affordances. Right? If you know identifying a microscopic causal relationship is super that's great right but unless you have really tiny tweezers it's not very helpful right what you need to do is you need to identify the causal relationships that are present in the domain in which you are capable of acting we care about the causal relationships at the macroscopic level because that is where we live we live in the macros at the macroscopic level most of our actions are at the now one of the best things about humans is our ability to extend the domain of our affordances with technology, right? We have like nuclear power because what we did was we acquired the ability to take tweezers, you know, at that at that scale and like, you know, make these things happen, right? We figured out how to take advantage of causal relationships at that level, not because we have those abilities, but we were able to create the tools that that that gave us access to that space. It all depends on what the problem it is that you're trying to solve and the [snorts] causal relationships that you always care about will be the ones that are related to the actions that you are capable of performing. Now that said, there's clearly a great advantage in understanding the microscopic causal relationships, right? If for no other reason than that might lead to us discovering a way to expand our affordances into the into, you know, into another aspect of the microscopic domain. >> Is is this just instrumental? you know, is is is this just something that it's a little bit like we we we say that agents have intentions and representations and it's just a great way of understanding things, but for all intents and purposes, it's not it's not actually how it works. >> Well, I I think that that sentence ended on a rather definitive statement with which I don't think we could I would agree, but the rest of it is it all in you're asking like the bas the the you know the scientific anti-realist if it's all instrumental. So, yeah. Yeah, it's it's all instrumental, right? I mean we we you know we the the things that we care about are the things that that you know again back to affordances right so you know we need to understand causal relationships at the scale that we can manipulate right that's what that's what matters most right because that allows us to have effective actions in the world in which we actually live to the extent that we care about other scales right it's it is because simp we simply wish to expand you know our domain of influence Right. >> The mind is quite an interesting example. So let's say um I want to move my hand and my my mind willed it. So it's top down causation. Now I can't act in the world of my mind. But it seems it seems macroscopically intelligible. You know we think about our minds. So maybe the mind is a special case. I don't know. >> Well the mind is a special case. I'll agree with that. I think of like downward causation from well I guess from an instrumentalist perspective, right? It's like I'm not saying downward causation is the thing. I'm saying that downward causation is one of is is like how it all works. I would take it from more from the perspective that um downward causation if discovered downward causation is what justified your macroscopic assumption. So what do I mean by that? I mean that like suppose I'm in the following situation. I got a bunch of microscopic elements and they're all doing stuff and I'd like to draw a circle around them and call that a macroscopic object. Now I am justified in doing so if that particular description of the macroscopic at the macroscopic level right has the downward causation property right it's it's it is a way of sort of saying oh that was a good you d that circle you drew that was a good circle right because it's summarized the behavior of the system as a whole right in a way that rendered the microscopic behavior irrelevant to further for further consideration Yes, I I can think of some situations where we do this. I mean, we might identify an aspect of culture or a meme and we might say that is responsible for violence or something like that. You you still have to show that it has that property, right? And I think in you know in intentionality is a tough one, right? because you know it's it's a variable that has a lot of explanatory power but it's not but but it's not one that evolves so when I think of a a good macroscopic variable it's one that I understand how it evolves over time that's what makes it a good macroscopic I can just write down a simple equation and it says you know pressure volume temperature right they are going to do this over time and like taking any little microscopic measurement becomes like totally irrelevant Right? But what made it useful wasn't just that the the microscopic measurements are irrelevant, right? It's that I had an equation that describes how it would have behaved, you know, that's also fairly accurate. So I have a nice deter, you know, relatively deterministic model that, you know, that that is at the macroscopic level, right? And so when we talk about like intentionality, I think it's it, you know, yes, it can be used as an explanatory variable, but it's only good to the extent that we understand how that intentionality changes over time, right? It's a long-term prediction. And this is why like, you know, the jurist prudence example made me really uncomfortable because it's sort of like saying, well, you know, what you're kind of doing is you're saying this is a bad person, right? And I don't know how we would necessarily like identify like that intentionality except in a very indirect way, right? That is that that then they're stuck with but then you know because it's only good as a macros cover variable if it we can make predictions about how that variable changes over time and we're not doing that. We're saying you're stuck with it, right? And I just that's why it sort of makes me a little uncomfortable. >> I did I did actually notice that um the active inference community has quite a rag tag. It's it's got very diverse. >> Yeah. >> So in in a way you see people rubbing up against each other that you normally wouldn't and that can create >> arguments I suppose. >> Yeah. Well, I think you know this was this was this was Carl's influence. So what did Carl actually discover, right? He's got this link between information theory and and and you know and statistical physics that in [snorts] some way gives you this sort of uniform mathematical framework that's widely applicable to a huge number of situations. It has a lot of sort of things that are baked into the how we think about the world is kind of like baked into it and so it can be applied in a whole bunch of different areas. >> And Carl spent a lot of time basically evangelizing various different aspects of the scientific community. It's like oh look you can apply this to epidemiology you can apply this to the social sciences. You can apply this to physics. you can apply you know and just sort of in and you know wrote a ser this is one of the reasons I think he's so prolific is because he's basically you know written variations on the same paper right but just applied in different domains and he did this and this was intentional right because he wanted to show that this is a un is a nun uniformly applicable mathematical framework and I think he's largely right about that um as a result right there's all these people from all these different communities that have been pulled into his sphere that think about the world very differently and it makes for some very entertaining conversations at the pub. >> Yes, even in our Discord server, you know, we we've got people thinking about it in terms of crypto, even in terms of Christianity, phenomenology, um psychology. It's it's really interesting. But yeah, it's it's uh >> but that's the beauty of constructing like a a nearly uniformly applicable mathematical framework, right? Exactly. You get to you get to suddenly this is what one of the things I love I I mean this is what I love about the community in fact is that we now have a relatively common language to discuss a huge variety of different things. >> Yeah. >> Um now of course that means we often end up talking cross purposes but that's half the fun right? So I often ask people in the business like what what what changed like what's you know what's what you know why did we have this like massive explosion in um you know in AI development over the last several years. Um, and I get three there there are three common responses and I agree with every single one of them. Autograd, right? The transformer, but why the transformer is something that I I often disagree with with people about. Uh, transformer architecture. Um, and just the amaz the the the ability to scale things up in a manner that we haven't really seen before. I actually u the reason why I say transformer comes with an asterric is because a lot of the things that transformers have that that people believe that the transformer enabled um I think really resulted more from scaling and my the point you know the point of evidence that I like to site is like mamba mamba which is a state which is a traditional state space model it's basically a common filter but like on steroids they scaled it way up and yet and now it's you know got they've you know mistral has their very nice like coding agent and it works pretty darn well, right? They got a lot of the same functionality with a completely with a you know a completely different architecture simply by virtue of scaling. So transformers get a get an aster. I think that the biggest thing was autograd right and autograd turned um the development of artificial intelligence um from being uh something that was done by like carefully constructing your neural networks and then writing down your learning roles and going through all that painful process that was t took forever and they turned it into an engineering problem. It made it possible to experiment with different architectures, different networks, different nonlinearities, different structures, different ways of like getting your memory in there in different way and all this fun stuff that allowed people to just start trying things out in a way that we couldn't do it before. And then we what did we did? We we suddenly discovered, oh, it turns out back prop does work. I mean, when I was a young man, like back prop was considered a non-starter for two reasons, right? One is it's not brain-like, which is true, right? brain does not use back prop. And the other one was a vanishing gradient. Oh, you'll never solve the vanishing gradients problem. And it's like, oh, it'll always be unstable. And and yet, nonetheless, once we turned into an engineering problem, started playing around with tricks and hacks and certain kinds of knowledge and rel and that we discovered that oh no, in fact, like there are ways around this. You just, you know, we just, you know, weren't going to discover them by like playing with equations. We had to actually start. We turned it into an engineering problem. as soon as it got turned into an engineering problem, you know, that's what enabled the hypers scaling, which is what led to all of this all all of this, you know, these great developments over the last several years. What got lost in the mix, though, was the notion that that that that there's more to artificial intelligence than just like function approximation. We got really good function approximators, but that's not the only thing you need to develop like proper AI, right? You need models that are structured like the brain is structured. You need models that um you need you need models that are structured like how we conceive the world is structured. Certainly if you want to have models that think the way we think. And that that got lost in the shuffle and we're starting to see you know as as as we're starting to see the limitations and the faults and flaws of of of these approaches. um and starting to see them not living up to the hype which I think is like now it's at standard that like like AGI is no longer I don't know if you read the other day at least according to you know the experts in the field at the top at the of of the best companies in the business like AGI is no longer like a huge priority right and that they're they're they're dialing back the rhetoric surrounding that um in part because I think that they've begun to realize that like just function approximation isn't going to deliver or that was just hype Right? We do need to do something different. We do need to start get, you know, bringing in what we know about how the brain works, right? If we're ever going to get to something that is a humanlike intelligence. And that was the starting point for us, you know, about a year or so ago is that we were sort of like, yes, let's do the same thing for cognitive models. Like, let's talk about let's take what we know about how the brain the brain actually works. Let's take what we know about how people actually think about the world in which they live and start building an artificial intelligence that thinks like we do by incorporating these principles. And this means this means basically creating a you know a modeling and coding framework for building brain-like models at scale. And that's like the critical element because obviously scaling was a was a big part of the solution. And right now most of the work in the active inference space as I'm sure you're aware is not at scale. There's very little like active inference work that is active inference at scale. Most of the models are like relatively small toy grid worldy type models. Um and part of the reason for that is that you know it is in fact difficult to scale basian methods. Now that also has now begun to change, right? We now have a lot of great u mathematical tools and a lot of great frameworks for approximating basian inference. You'll never do it exactly. We're approximating basian inference. Um which I believe is how the brain works, right? Basian brain and all that. um that allows us to build these kind of structured models that that that that are structured both after the brain how the brain is structured and how the the world that we live in is actually structured. Hence the the the this notion that what we need to build get the ne to the next layer of of AGI and I also don't like that term and don't intend to use it very often. Um what we need to get to the next level, right, is is is this um uh is this framework um that allows us to build the uh the kinds of models that we know people actually use and just make them bigger and more sophisticated and and and and so on and then take advantage like hyperscaling basian inference is part of it, but also like it's you know constructing models um of the world as it actually works. The way the world actually works, right, is what is is, you know, provides us with the structure of our own thinking, right? The atomic elements of thought is how I like to phrase it. Um, are models of the physical world in which we live. And the physical world which we live is a world of macroscopic objects that, you know, um, that have specific relations and interact in certain ways that we understand, right? Um, you know, I'm looking around the room for a good example, right? You sit on a chair, right? That's an example of a relationship. It holds you up and all that fun stuff. Um, and those are the kinds of, you know, that understanding of the physical world was necessary for us in you know, for us to have in order to survive. Dogs have it too, right? Language isn't what make, you know, isn't isn't all that special, right? Well, it's it's actually quite special. But but um uh those are the models that form that that that understanding of the world in which we live is where we get our the models that form the the the the models that form the atomic elements of our thoughts out of which we have composed more sophisticated models that have allowed us to do all this great systems engineering, build this great technology that we've got. So that's what we want to do, right? is we want to is is we're focused on building cognitively inspired models that are based on our understanding on on on the way the world in which we live actually works because we believe intelligence must be embodied. Building a framework for for putting those models together and experimenting with them at scale all in approximately basing way because we believe that's how the brain works. It's not just about putting your AI into a robot. It's about giving that giving the robot a model of the world that is like our model of the world, a model that is object- centered. It's dynamic. It's a it's largely causal, right? Um it's, you know, that's that's that's the big difference. And I think that the the the sort of sparse structured models is another sort of key differentiating component. Like when you think about how like a transformer and LLM work a transformer takes every word in the document and says now how does this word relate to every other word and it does it many many many many times right it's a it you know it's it's very much word same thing with like your your your generative um uh vision language action models they operate in pixel space they are microscopic models now yes do they have an implicit notion of sort of macroscopic yes they must because they work right but it's implicit it and it's not implemented with the kind of sparse structure that actually exists in the real world and in our conceptualization of it. And that's the thing that we are going to that we are we are saying no no no look like if we want an AI that thinks like us right then we are going to build models that are structured like both like the real world is structured they have this sparse causal macroscopic structure to it um and so should our models and so should and and the only way to do that is not just to like put a robot in the real world but to put a robot with a model that is structured in that fashion into the real world no one's using the XLSTM not many people are using Mambber because why not all you need to do is just scale the transformer as much as possible. So, um, you know, many people just really think you just magically get these things for free, right? >> So, I think you could argue that that with enough data that's the right kind of data, one of these like really big supercaled models will uh obtain an implicit representation of the world that is more more or less correct. Now having an implicit representation is great if if your only goal is to just represent the world. If your only goal is to just predict what's going to happen. But it turns out people do something which is very different. People are creative. People can solve novel problems. They can't. It's not just about mining old problems and figuring out where I can move some words around and get a and get an answer that looks more or less right. Right. We actually are capable of creating. We're capable of inventing new things. The way that we invent, I think, is exemplified by by like systems engineering, right? How does systems engineering work? Well, I I you know, I know how, you know, I I know I'm repeat, you know, I know how an air air foil works to create lift. I know how a jet engine works to create thrust, right? And I can take those two bits of information to invent something brand new, which is an airplane, right? That kind of systems engineering was predicated upon having this sort of model of the world that was relational. Right? Here's the wing. I can put a jet on it. I can like I don't know you don't staple it on. I'm sure you use rivets or something. Right? I know how to put things together. I know how to construct new relationships and new objects. An AI that that is designed for systems that is designed to do systems engineering will have a object- centered or system centered understanding of the world and will know how those all of the objects relate so that it can sort of start experimenting with different ways to combine them. It's absolutely it you know wi-i without that the only thing you will ever be able to do right is just retool solutions for new purposes and it won't and even that is I think is a generous interpretation of what a purely predictive model is going to do right so this is how I like to think about like you know the the principal advantage of taking this object- centered approach right is that it enables systems engineering >> what is a grounded world model >> that is So, so I I feel like that's a trick question. I was I actually had this conversation with with with one of my friends Maxi uh and co-conspirators the other day. Um uh in some sense every model is grounded. It's grounded in the data that it was given. Now, okay, so that's like a true statement. It's like okay, yeah, but that's not what we want. And when we often use the word like a grounded world model, it's we say that it's grounded in something. And that something is not just the data that it saw. >> So example vision language models. A vision language model is like is a way of grounding the visual model in the linguistic space. And this is the approach that we're taking. This is what lang chain does, right? It's all about taking you know models and everything becomes a blank language model right you know vision you know a you know whatever everything becomes and what what you're when you do that what you're doing is you're saying that that you're grounding all of your models in like a common linguistic space so that they can communicate with one another right via language. >> Now why did we choose language? Well, we chose language because like honestly I think it's because we wanted models that we could talk to, right? We wanted we wanted a model that like you know it was really all about the in making the interface convenient for us and which is great. That's totally something you want. But it begs the question, what's the right domain in which to ground your models? Now I like grounding models like so we also use the phrase like you know one of those like ground truth and of course ground truth is the thing you made up in Oprior said was ground truth right. So what's ground truth? What is the what is the right do you know domain in which to ground models in order to get them to think like we do? That's the relevant question. And so my my view is is that if you know again if you want AI that thinks like we do you need to have it grounded in the same domain in which we are ground >> and we are grounded in this domain. Right? This is why the embodied bit is such an important thing. Um we want models that um that are are grounded in the physical world in which we evolved. And the reason for this is because that is the world that provides us with these atomic elements of thought. A single cell like lives in a in a soup, right? And it has uh you know and it it it it's you know whatever model it has of the world to the extent that it has one or it behaves as if it has one um that model is is the model of its environment, right? If it didn't understand the environment in which it lived to some extent, right? Then it wouldn't it wouldn't be able to continue to exist and function in that environment. So you can sort of say that a cell has a model that's grounded in chemistry, right? Of the chemistry of the soup in which it lives. You know, when we talk about like that that is a prerequisite for its survival. Now we talk about like mammals and bigger animals and things that live in the macroscopic world that includes other animals, right? And you know and all that. So what's that model? What's what's what's the world the the the you know well at the very least we can say that whatever models we have a significant subset of them are grounded in that world right and that world we know has properties that that we can understand it is object- centered it's relational it's all this you know all this stuff [snorts] um and so the ground the the the the the grounded bit is more about like properly grounded grounded in the domain in which in in which we are grounded as a route to to to creating, you know, AI, you know, an a a AI models that in fact think like we think, right? That's the grounding that we that that that we're particularly focused on. If you had to choose the domain in which to ground your models, what would you choose? Right? I don't think language is the right one. Language is an incredibly poor description of both our thought processes and reality. I tell the story all the time, right? So you ever you ask any cognitive scientist or psychologist who's done some experimental work with humans, right? The you know you you put them in a in a chair, you make them do some tasks, you carefully monitor their behavior, you look at what they did, right? And then you have a nice way of and then you you know that informs your theory of that behavior or however that works. Then you and you know and if you do the experiment well, you have a very good model of how they made whatever decisions they made throughout the course experiment. And then you go back and you ask them what why did you do what you did? and they give you an explanation. It sounds totally reasonable. It also is completely inconsistent with an accurate model of their behavior. Self-report is the least reliable form of data, right, that one gets out of a cognitive or psychological experiment. And so, we don't want to rely on that. We don't want to ground our models in what we know is an unreliable representation both of the world and of our thought processes, right? We want to ground it in something that's a good model of our world. And that's why we we've we've chosen to focus on like mac, you know, models that are grounded in the domain of macroscopic physics as opposed to language. >> Can you speak a little bit more to the the limitations with current active inference? >> A nearly uniformly applicable information theoretic for describing objects and agents, right? It's it it really is inspired by statistical physics and its links to information theory. And when you take those two mathematical structures, throw in a little like marov blankety thing, so you can talk about macroscopic objects, you kind of have a very generic um widely applicable uh mathematical framework that you can throw up many problems. And a lot of what has gone on in the active immers community over much of the last 20 years has been demonstrating that um it's it's like uniformly applicable. So there's been a lot of breth and not a lot of depth, right? And part of you know and and I think that you know of course like that's you know that's appropriate right given you know if you really want to make the the argument that everyone should be using this you show see in this in this domain it works on your like toy examples but the people doing that right kind of you know the active community has had has this habit of showing like like see like oh this basically like I I can handle this like psychological phenomenon I can model this cognitive phenomenon on. Oh, and look, like it's a good post talk description of this neural network's behavior and things like that, right? They're they've been showing that, but they've they've never really sat down and and and and like tried to tackle any really big really hard problem because the emphasis has been on evangelism. You couple that with the fact that there is this strong bias within the active community towards being as basian as possible. And so, of course, they also like shun the really hard problems because basian inference, you know, is has been historically challenging to scale. There have been a lot of developments over the last few years that you know um that have come you know out of the machine learning community as well you know but mostly out of the basian machine learning community um that that have really made it possible to start scaling basian inference in ways that we we we we really weren't able to do it before um uh and you couple that with a desire you know to sort of stop the evangelizing and start solving really hard problems with these methods um and you've got a way to prove that like active inference really can live up to its promises Yeah, it was a similar thing with um constraint satisfaction. You know, in the 1970s there was that light hill report and people said symbolic AI will never work and they wrote it off. Apparently, just that there are all these empirical methods that have been discovered in the last 20 years that just make it massively more scalable and tractable and is it the same thing here? Like are there some specific techniques that have dramatically improved the tractability of active inference? >> Uh well, I would just sort of I would lump it all into the into the basian inference category. There have been a number of developments over the last um I would say eight eight yeah eight years or so um that have made uh basian inference significantly more tractable than it used to be. Um some of it had to do with you know work in the sort of um Gaussian process space. uh my my my current favorite trick is is you know normalizing flows um which is a great way of ensuring that you have like access to sophisticated likelihoods but nonetheless result in tractable probability distributions um uh uh there's the work um you know I mean I've been using I've been using like natural gradient methods for a very long time which allow you to like massively speed up gradient inference and in some situations completely eliminate the need to do gradient inference and instead like you know do coordinate descent and allowing you to take massive jumps in parameter space and not actually lose the the ability to do learning of sophisticated model in in a sophisticated modeling scenario. Um I also like the fact that like the the natural gradient stuff has been getting some great acronyms recently like basian online natural gradient for bong for short. I just think these guys these guys get me every time. I I wish I was that clever honestly but like it's it's but there's been a lot of developments in that space as well. um you know making you know uh in in addition additionally there's been a lot of developments in like rapid sampling methods uh conditional sampling methods constraint methods like that that that that that have really improved things and I think that like one of the problems again with the active earns community you know historically that that I think is now starting to change has been a hesitance to use these sort of certain approximate methods there's been this this focus on like straight up old school message passing >> and you As soon as you sort of, you know, you know, if relax the desire to be as basian as possible, it opens up a lot more possibilities for for scaling this stuff up >> when we're now talking about agents that are, you know, interacting with the world around them. And that that still presumably needs a lot of data. >> So, so we've we've got a couple of tricks. Um, one of the nice things about taking an explicitly object- centered approach is that you can you don't have to train all of your models. You don't have to train just one model at a time, right? This is this is my favorite trick and I think that we we you know, this is one of those things I think we're going to be seeing a lot more of in the near future. But um you know so if you want to train you know a vision model to understand like YouTube videos or something you know really complicated like that you basically take one big model and you train it on a ton of data right you just keep training keep training keep train and eventually it sort of gains this implicit and it does it get an implicit sort of sort of object- centered understanding. Another way to go, right, is is to is to, you know, train objects in specific domains that are so these are smaller data sets. Like I'm only going to worry about like the Zillow problem like the inside of people's houses, right? And that's going to have a much smaller set of objects, right, that it has to that it has to learn an implicit distribution over. And you can do this with one big neural network, right? train and you know there's a really great like you know Gaussian splatting paper where they trained a massive neural network that is able to like you know sort of make predictions about what's going on inside people's houses and some nice language models. Um but obviously it has an understand that's limited to a house and the objects that are inside a house. >> If you have an explicitly object- centered model then you end up not just with one model that understands a house. you end up with one model that's actually thousands and thousands of little models each of which right you know um uh sort of explains like a single object or object class within the house right so you got like a book model so all books come in different shapes and colors right but there's just one like book model and the beauty of doing this is that is that that book model you have to be a little clever about the how you structure the interactions between these things but if you're a little bit clever um about um how you describe the relationships between objects within this modeling framework. You gain the ability to train a model just on the insides of houses, a model like just on you know just on like um you know parks and park benches and take the objects that were discovered in this space and the objects that were discovered in this space and put them into a combined environment that has objects of both of those kinds and it still works. Right? That's the advantage of the of taking an object- centered approach or a what I like to refer to as the lots of little models approach. >> Some of these things are a little bit weird. You know, some cultures have, you know, maybe one culture doesn't have the notion of time and some cultures might see two objects as one. So, is is there a potential problem here that there's some ambiguity that we need to overcome? >> I'm not going to say that there's not the potential problem for ambiguity that we need to overcome. What I will say instead is that is that the the additional constraint that we're imposing it's not just about objects it's also about their relationships. Now think about physics. This is this is why we this is why the physics discovery stuff is such a big part of it right in physics um is in particular like you know Newtonian mechanics um you can you know let's pretend we're living in a world of rigid bodies right so all I need to worry about is like weight and shape of things and that defines a particular object type but I also need to know how they interact and so in in in Newtonian mechanics we have like you know what we can do is we can take these objects we can watch them like bouncing off of each other and doing all these sorts of things and we can quickly infer that like oh like their interactions are all governed by a in a single language which is the language of forces and force vectors right um that lang that that language of interaction right is really what makes it work right otherwise we'd just have like you know pictures of things that's all we would have got what you're empirically discovering is sort of a generalized notion of forces that describe the relationships between things and you cons the constraint that you place in order to avoid the problem of things being too brittle, right? Is that well, they all have to use the same class of forces together in order to interact. We're stuck with that. But by being flexible about our definition of what a force is and the having the ability to discover new kinds of forces, not just like literal force vectors, right? Um gives us the ability to sort of generalize um without be without becoming too brittle. you're talking to this interaction dynamic. So there's a graph of interactions which might possibly represent affordances in the macroscopic domain and by doing analysis on the interaction graph you and and sort of simplifying the the analysis as much as possible you get a principled way to partition the world up. >> That's right. And so so there's there there it's it's all about having um interactions and interaction classes. So it's not like there's not just one adjacency matrix, right? There's an adjacency matrix that also specifies there's there's one for every type of interaction that that's possible. Um that's what gives you the additional flexibility. The other thing that gives you the additional flexibility is being is being a little bit basian about things, right? It may very well have been that all of your observations of this object when it was in a house were like really simple. It was all just it sits on a shelf, right? And so what do you know? Well, what you know is that that object sits on a shelf, but you have to be, you know, which is one kind of interaction, right? That's just the, you know, it has a force pushing down, there's a force pushing up. You don't know anything about like the weight and you have nice, but but if you keep error bars about that, if you keep error bars about the other kinds of interactions that you have seen, but are agnostic, right, about the specific deals for this particular object, it gives you the flexibility to say, well, I'm going to put in this environment, I can make some predictions about how it's going to behave, right? But if I throw a bowling ball at it, I'm going to be making some, you know, assumptions about how it might behave. But I, but once the bowling ball hits it, right, I might have to revise those assumptions. This is the other critical element of the approach we're taking, which is continue, which is you have to have some kind of continual learning element. This is something that really doesn't exist in contemporary AI, right? And you know, you know, when you build your big model, you've spent millions of dollars training it and then you're done. Right? Yes, someone else can come along and fine-tune it a bit for a particular task, right? Which is great, but at the end of the day, when you're at the deployment phase, you turn learning off. Um whereas in this approach, we're saying no, no, you one of the things that's critical, a critical aspect of the way we think about the world and the way we learn about the world is that it's continual and it's interactivist, right? So there's, you know, and that needs to be true of the objects that we're discovering as well. We've learned classes of interactions, but just because we haven't seen a particular class of interactions previously doesn't mean we say the others never happen, right? We still allow for that possibility, right? And then do continual learning the quick with with rapid updates when we see something happen. We see, you know, a new interaction. Now the what makes that work right is the fact that you've specified that there are certain set of kinds of interactions some of which you previous observed some of which you still don't know about and might observe soon and then you can update your posterior beliefs about whether or not that object interacts in that way. >> What would the architecture of such a system look like? I'm I'm imagining it'll be distributed, right? So, you know, we have all these have all these different agents and and then we have the consistency problem because maybe this agent has empirically learned that these two things are a book, but the agent over there, you know, just thinks this one thing is a book and then there's how many objects are there? Would it become intractable? You know, like realistically, I don't know what. >> So, so from a simulation perspective, the way that this gets simulated is remarkably the way the remarkably like the way a video game engine simulates the world. The only difference being is this abstract notion of forces. So, so how does a video game represent the world? Well, you have all these assets, right? And each asset is basically a shape. Maybe a texture, color, something like a fork is an asset, right? Or a little like three-legged stool is an asset. Um, and it has a bunch of properties, but it's basically, you know, that that you know that are associated with um its shape, color, mass, all of this stuff. And then it has a set of interaction rules, which are like Newtonian forces, force vectors. Then you've got other things like water and sand that have like special rules for them because if you just try to because otherwise you know you need a macroscopic rule to describe them otherwise the compute would be insane and stuff like that. So it's very similar to that, right? We, you know, what we do when you take this lots of little models approach, what you end up with is the moral equivalent of a giant list of video game assets, right? And then when it goes to modeling a particular environment, right? When you find the agent that you're talking about that has this lots of little models model in its head, what it does is is it it um it sort of looks at the scene and says, "Oh, okay. I need to worry about these 10,000 little models right now, and that's it. I don't need the rest of it. Right? And then it just sort of operates in that space running something that looks a lot like a video game simulation. Right? So it's it's that sparsity um is what makes is what makes this little this lots of little models approach. Right? You may have a million little models, but at any given time you only need a tiny fraction of them and you just instantiate those. >> The thought occurs though that in a game engine um all of these particles are they're in they're in the engine. I can say what what are the forces between these two parties. >> Yeah. It's called cheating. >> Well, yeah. Because you know when you deploy an agent in the real world, you can't just ask, well, what's the force vector between Jeff and and the light? >> That's right. Yeah. You have to learn those. You know, does this model disco, you know, if you take a video game engine as ground truth? Are we capable of discovering the video game the assets and their properties that were in that in that in that game engine? >> So, what would your input be? Would it just be the pixels? >> Yeah, why not make it hard? Like if you know it would be cheating to sort of like start out with something that already segments the image for you, >> right? If if you can't solve the hard problem from, you know, from from the bottom up, then like it's not a hard problem. Why'd you do it? >> If I understand correctly, a successful implementation of the technology you're talking about would be let's start with a game engine and we almost treat the AI like a black box. So it has input, you know, like I can move left, I can move right, pan, up, down, I can interact with objects and and then maybe there's some kind of a score function. I'm not sure, but you know, it it can learn inside the game engine and it will build up this internal model library that represents things in the world in the game engine. And if it's learned a sparse robust model library, you could in principle take the same learned model and apply it to a robot in the real world and it would generalize. >> That's the idea. And that's and that's that's the problem that we're trying this is like one of the critical missing elements in in the robotics space is that you know if you you know training models in uh simulated environments does not translate really very well to real world environments. Um, this could have this could be a result of of of a situation where the um, you know, the simulated environment is just too impoverished, but it could also be a result of of a situation where the artificial environment just isn't actually a very accurate representation of the real world, right? And I think it's I I think it's largely the latter, right? is that is that these you know the these um I mean also coupled with the fact that the robot that the artificial agent's internal model looks nothing is not structured like the mo the world that it actually is being trained to function in I think those are the two big those are the two biggest problems but it could also be the you know what do you need in order to address those well one is you need a good model for the robot's brain that has the structure of the world in which it lives the other thing is you need a mapping from real world data to simulated data and right now what we're typically using is video game engines. Now, video game engines are great. I know I certainly enjoy them on a 10-hour a week basis. Um the uh the the problem with them though is that they weren't trained to be realistic physics, right? Most of them were were were designed to be plausible. They were designed to look good to the user. Um, part of this has to do, you know, and and there's a lot of like tricks and hacks and things that are thrown in to deal with the fact that the equations of Newtonian mechanics um are very stiff, right? When collisions happen, if you're just a little bit wrong about that, you know, then things can, you know, then weird stuff can happen and and non non-physically realistic things can occur. [snorts] Um, so if if you if you had the ability to construct an environment that had good enough physics that accurately represented the real world and trained your robots in that domain where they have these models in their heads, so they're actually capable of learning, you know, the quote unquote ground truth that you've implemented in the in the simulated world, then I I I I believe that they will generalize better to functioning in the real world. And this is this is absolutely critical I think for robotics going forward if for no other reason than right now. I mean like large language models, all these self-supervised models, um the way that we're currently training robots to like put your put your groceries away and things like that is all by training them to mimic human behavior, right? It's expert trajectory learning. They're not really learning the physics of their environment. They're learning to mimic human behavior without like crushing the eggs, right? And so with you know you know if you want them to be able to generalize across domains across tasks you need to get rid of reliance on expert trajectory learning. And so that's the and that only happens when you move to something that is explicitly model based with a model that accurately represents the world in which they live. >> Once you've got a core set of models that work in the world. Is that the value of the AI? >> Yeah. So once you have a core set then then you have the ability to like deploy your agent out there in the real world and it can handle situations that it couldn't previously it handle. One of my one of my co-conspirators likes to talk about um the uh cat in a warehouse problem. Right? So what do we have? So now we've we've we've we've got an AI agent that has been trained to like manage a warehouse, right? And so it understands things like forklifts and boxes and workers hopefully, you know, and and all and it knows how to. And then one day, one day something comes along has never seen before. It's a cat, right? Cats don't belong anywhere. A cat comes along, right? And so the the model has no seen a cat before, right? Because that's the environment in which it was trained. This is the one of the beauties of this approach. So it's the cat comes in the warehouse and it's like, "What the hell is this?" and like it's you know it's screwing with my system and um because we're taking this sort of like you know free energy based approach right one of the critical elements is that is is tracking surprisal. So when a cat comes along, doesn't know what a cat is, the surprisal signal goes crazy. And then it says, "Okay, stop." Right? Don't run over the cat, right? Let's figure out what's going on. And what it can do is it can take a picture of the cat and it can fire it off to a server somewhere that has a huge bank of models and has been pre-trained on on model selection to to a small extent. Um, and say and it says, "What the hell is this?" And then the big bank of models says, "Oh, I think it's, you know, here's like seven or eight things it could possibly be." And it's different kinds of cats. Maybe there's a dog thrown in, whatever. And then it ports those little models over to the to to the to to the warehouse model. And then it sort of does some proper hypothesis testing, watches the cat behave for a little bit. Ah, it's a cat. Puts the other models, sends the other models back because it doesn't need them anymore, right? It's figured out that this is what it is. And now it's incorporated understanding of the cat into the system. This is the beauty of taking an explicit. This is another beauty of taking an explicitly object- centered approach. It gives the model the ability to be um to to to know what it doesn't know. This that comes from the active inference component. Know what it doesn't know. When it doesn't know it, it can go phone a friend. That's another way to describe it. The friend will respond by saying, "Oh, it's a cat." And then and it can take the model of cat, incorporate it into its warehouse model, right? And now it understands that. This is really great from a compute. There's a huge compute advantage to this, right? If we had started with one big model that already knew what all a cat was. Think of how many parameters I would have. It'd be huge. This model is very frugal in the sense that it only knows it only needs to know two things. What it needs to know about the environment in which it in which it exists, right? And when it sees something, it doesn't know and then it can just go pull. So that's the idea is that you have this massive bank of models. But when you instantiate a particular for a particular use case, you don't need them all. >> Yeah. >> Right. You just need the ones that are relevant to that environment. But you c but these cap these models are continuously tracking surprise or uncertainty. And when it sees something it doesn't know before, it's smart enough to say I don't know what that is. How and when should deep learning be combined with this? I mean my my my naive perception of Beijian inference is right now if you have a photograph from a camera and it's like you know 300 pixels squared or something you know that that would be a challenge for basian inference. So I'm thinking could you could you uh just use a like a vision language transformer or something and use that as part of the Beijian framework or could you even use deep learning models as a way of bootstrapping the knowledge acquisition in the in the Bayian framework. So the reason why I mentioned normalizing flows is because that's technically that's a deep learning tool. It just happens to be a deep learning tool for which the output that that takes in an image and turns it into something that is easy to deal with from a probabilistic reasoning perspective. Are we going to use deep learning tools? Yes, the ones that are fit to purpose for sure. Um, and you know, that's that that's a that's a great example of of of one where we're taking sort of like, oh well, like why wouldn't we use this if it's compatible with our framework? >> Many folks in the audience won't know what a normalizing flow is. Can you just give us a quick update on that? >> Well, so okay, so well, we all we we've got a pretty good handle on how diffusion models work these days, right? It's, you know, you take your image, you just add a bunch of noise to it, make it Gaussian, and then you learn an inverse transformation. It's the same thing, right? What you're doing is you're learning a mapping from a probability distribution that is easy to deal with like a Gaussian distribution and you're learning a mapping from that distribution onto the thing you actually are observing the thing you care about. So in this case it could be an image. In fact I actually don't think we should call them diffusion models. It's a normalizing flow. The diffusion is it's it's should be referred to as a diffusion training protocol for a normalizing flow. So to some extent we will be using some of those tricks as well. You could say, "Yeah, we're going to use diffusion models if you're going to make me roll my eyes and say that." >> Jeff, >> what is your approach to alignment? >> Well, I I I typically like to talk to people about their beliefs and values and figure out what it is, how it is that they come came to form them and then try to convince them to adopt my values. The beliefs that these systems have, right, the belief that our artificial artificial uh systems have are not the same as our beliefs. and they're and their the reward functions that we specify for these artificial agents um are definitely not the same as our reward functions. Now, there's a few exceptions. It's like like like go chess, right? Any game where you either win or you lose, the reward function is obvious. Um but in general, like in complicated situations, reward functions um uh it's not so obvious what the reward function should actually be. You know, I know that there's this definite belief that like reward is all you need and I and and and there's some truth to that, but the question is, well, where'd your reward function come from? Now, from a philosophical perspective, you know, there there is no normative solution to the problem of reward uh function selection. I like say barring divine intervention. Um, and that which is just another fancy way of saying that like your values and my values might be different and it's really difficult to say who's are better, right? Um, from a practical perspective, right, um, a situation that I like to point out is like if you're talking about self-driving cars, obviously you'd like to penalize your self-driving car if it drives over a squirrel, right? But if it had to choose between a squirrel or a cat, you probably most people would wanted to choose the squirrel. And the way you would do that in RL model is you say minus 10 points for a squirrel, minus 50 for a cat. Where did those numbers come from? It's completely ambiguous, right? there that's relatively arbitrary. They're they're kind of sort of made up. Um and so relying on on arbitrarily selected reward functions like seems like a terrible idea. We also know that like things can go horribly wrong. And I know everyone's sick of this example, but like you know, you all, you know, when when you rely on reward, you're you're effectively like, you know, making wishes from like a malevolent genie or, you know, you run the risk of saying, "Hey, hey, Skynet, end world hunger." And it's like, "No problem, kill all humans, right? Those are if you don't specify your reward functions very carefully, you can get very degenerate behavior." So the goal of alignment in an RL setting is to get you is is is to get it would be to somehow get my reward function or perhaps humanity's collective reward function right into the AI agent. This is really really really hard. It's really really really hard because measuring reward functions is really really really challenging. Um the approach that we're taking is we're saying well so how do people actually do this? How do how do we as humans um construct alignment? Well, the first thing we do is we try to figure out what other people's reward functions are. The problem of reward of of reward function identification is conflated by the fact that people have different beliefs. Action, which is what we can observe other people doing, is a combination of of their beliefs, right? And their and and their and their reward function or their values. Um, and so just sort of like taking those, you know, so the problem of course is is that you only observe people's actions. Like there's a difference of opinion about what to do, right? And so you want to figure out why. And it could be because your beliefs are different or it could be because your values are different. The only the but it's ambiguous. You can't tell. It's ma mathematically it's not even possible to separate these two. Belief and value are fundamentally conflated when all you observe is action or decision. The way that we solve this problem as people is we talk about our beliefs. Right? I tried to I ask you well why do you think this is the action? And then you tell me like oh well it's because you know this fact this fact and this fact suggest that if I do this then this will happen right and then I can go in and say ah I see so maybe the reason for the disagreement in our beliefs or in our in our decision right is because you're not aware of this fact and I'd forgotten about this fact and so what we do is say let's incorporate all of these things together and then sort of see and then sort of and then you would still say like well I still think we should do X and I'm like no it's still definitely why and we continue this conversation until we both until each of us has a very reasonable model of the belief formation mechanism that the other person has at which point the only cause for disagreement is a disagreement about the reward function. AI systems are completely illeible and that's almost a good thing because if we actually understood how flawed they were, they would be banned, right? Well, they're amoral. It's we have no idea how to put morality into them. The the the smart safe thing to do is is is is to remove decision-m from their capabilities and to simply use them as oracles or prediction engines, right? And then we can just say, hey, like you know what would happen if I did X, Y, and Z? And then it just sort of tells you well this is the this is the ultimate outcome and then we're like oh okay well then maybe a b and c were better choices right and things like that um that eliminates them from the consider from the considerate from from participating in the val in the you know in the actions sorry that that prevents them from using their reward function right and you can get that just by like training them to just to just do good prediction that's totally great but that doesn't give us the kind of automation that we really want right what we really want are decision you know artificial agents that are decision makers that can act on our behalf. Um, and so it's either going to be human in the loop or it's going to be something like what I propose where we figure out how to like solve the alignment problem in in that fashion. >> But Jeff, you're an old school cognitive guy. So for someone like you, would you always think that in the absence or in the lie of explicit cognitive models that we would never be able to say that these things actually had beliefs or intentions? I think that what what allows us to currently say that they don't have beliefs or intentions actually stems a lot from from our knowledge of how they actually work. Right? I for example have no problem concluding that you have beliefs and intentions. though it may very well be that that conclusion is drawn from the fact that I really don't know how you work. I have an intuitive feel for it. I assume you work the way I work. I have beliefs and intentions. You know, that's my perspective of myself. And so I conclude the same about you. >> Um it's uh it's kind of like emergence. Emergence is such a funny concept, right? Is that um there's this whole class uh this whole like branch of the emergence literature that defines an emergent phenomenon as anything that I didn't predict, right? which is a remarkably anthropocentric and I would argue ignorancebased definition of of emergence and I don't like it. That's for for those reasons. Um the same sort of thing, you know, goes with, you know, you know, I think this the sort of converse of that is what's going on here. It's like we we know that these that these algorithms do not have like the capability to do anything other than predict and so we don't believe they have intentions. But something like strong emergence usually means like you know causal irreducibility. >> Whatever definition of emergence you end up going with it shouldn't be ignorance based. It shouldn't be based on like oh well there's you know and so you know that includes explanations of emergence in that that involves things like well the only way I could have discovered this was by simulating it therefore it is an emergent phenomenon. I don't even like that. I don't even like that. um I am more sympathetic to that. But I I prefer definitions of emergence that are sort of more pragmatic, right? That are sort of like, oh no, an emergent fin this is why I like downward causation as like a fundamental feature of emergent behavior. Um mostly because downward causation that is it it's not only a nice explanation of when you can you know a nice sort of fairly rigorous definition of when you can say a phenomenon is emergent. It also comes with a practical tool. It tells you you don't need to model the microscopic phenomenon. Last time we spoke about Lennia in Game of Life, didn't we? >> Yeah, that was Oh, that's so I'm still playing with that by the way. Oh, yeah. Some of the Lennia one my one of my favorite Lennia simulations. This is not particle Lennia. This is the traditional Lennia. And so what they do is they have a a field and they're obstructions and they're squares and circles and things like that. And then they have these little creatures that are sort of like a little amieba like swimmers like they've got like fins in the back and everything and they swim and they'll hit one of these obstructions which will cause them to deform and kind of look like oh it's going to die. Hey, that's so sad. And then it reforms and becomes itself again. >> And so we thought of this as like a really nice abstract environment in which to to test the um you know some of the properties of the physics discovery algorithm because one of the nice things about the approach we've taken is is that as the thing as a little swimmer goes and hits hits something, it's possible that it loses its identity when it like deforms into something new and then reforms into itself. And we wanted to see if the approach we've taken captures that. And it more or less does, right? It hits the obstruction, right? The ether, you know, it it it changes its identity into an object of a different type >> and then reforms and comes out the other side and then regains its identity back. >> Fascinating. Quick aside, you know, we spoke last time about Alex Mor Vinsurf and he had this um you know convolutional cellular automter with the gecko, you know, the self-healing gecko and he has now written a new paper with his friends at Google and it's using logic gates. So it's like an emergentist logic gate thing that draws a Google logo and I haven't read it in detail but it looks amazing. So definitely look at that. And now you're taking your system and you're applying it in like something like a game of life basically. But you still expect it to work. >> Yeah. Well, well, so I I there are forces in in in Lennia, right? It's there, you know, there's the the the the rule that that causes the pixels to change, right? H has a few properties, right? It's it's radially symmetric. >> Yeah. >> Right. And um um and it can flip sign, but it's like any radial symmetry can work. Um and then and so it has like a polarity. It has like you know and you can think of that I mean it is a force in a in in a sense right and it's even a force that's like kind of like real forces. It's like a weird kind of charged particle thing. Um and so I I still think that you can you know it's you the the approach that we're taking is basically just discovering what are the effective forces between it's you know we're not worried about the microscopic forces. We don't care. Right? That's the whole point of a macroscopic physics. you know that there are microscopic forces that govern the behavior of the system as a whole, but what you're interested in are the things that that that make predictions on the scale you care about. And so what you're doing is you're discovering the effective rules that describe the interactions between not just the particles that make or the pixels that make up the the the little floater or whatever flyer. Um but the rules that govern its interactions with other floaters or physical objects like obstruct they put obstructions in the domain and things like that. And uh Keith really loves cellular automter because they are too incomplete and they have this miraculous ability to arbitrarily expand their memory. So you can have a grid size that's this big and you can just add more memory and you can add more memory and you don't have to train your thing from scratch. And using some of these approaches we've just been talking about you can actually train the update rules learn the update rules of stoastic gradient descent. So do you think in the future we might actually have an AI system which is running inside a cellular automter? >> That is a very good question. So the snarky response is to say, don't we all, don't we already? I mean, what's, you know, we got it running on a computer and at the end of the day, a computer is just a whole bunch of logic gates. So isn't it already a cellular automata? >> Well, it's in the same class of algorithms. >> Yeah. >> But there seems to be, you know, a cellular automter that has this emergentist thing. So it what it what it does is not how it's programmed. And it feels that it feels like there's a trick that the the way it's programmed is an order of magnitude less complicated than the thing it does, right? >> So it feel it feels like a a magical bridge to do stuff which is more complicated than we could explicitly program or learn. >> I [snorts] agree. But that also sounds a lot like a computer, [laughter] >> you know? It's like well what can you do with like a lot of Now I guess the difference between a computer and a cellular automata is that a computer you know you program it you tell it exactly like you're you specify something whereas in a cellular automata the way that you're train if you're training to do something in particular like so for example find a bunch of discrete objects that go in a certain direction and then you're allowed to like tweak the rules that govern the local interactions until you get something that more or less does that. Um that's just programming in a sort of backhanded way. Yeah. >> Um but uh I think that's I mean it it those systems are very interesting because it is remarkable that really dumb simple rules right can lead to like really interesting sophisticated behavior. But I the thing that I find interesting though isn't the fact that like like complicated you know complicated stuff can result from like simple local rules. What I find interesting what I'm more interested in rather is like well what are what are the properties of the resulting large scale objects >> right? Is that how is that related to the small scale objects? What's the mathematical description of those big things? The things that the things that are that have emerged. I'm less concern less interested in how they precisely emerge. This probably is because of my bias towards taking a a human cognitive approach. Most of the people don't act you you know when you look at the game of life when most people think, "Oh, that's really cool. Like look at these pretty pictures and all these little creatures and they're doing fun things." They don't really care about the low-level rules, right? The thing that captures their imagination is the macroscop the the high level the macroscopic level behavior of these things. >> Um though it is cool that you can get them from simple rules. >> Yes. Yes. No, as as you say we can program computers but it there's the legibility ceiling. We can do program synthesis. Doesn't work very well. >> It will they're going to they're going to I have confidence that that's that's not one of those things that I'm going to like outright poo poo. Yes. I do have confidence that that there's a lot of that you know that is a rich that is a new area. It's a, you know, relatively new and they haven't really, you know, there's a lot and there's, you know, there are a lot of there's a, it has a lot of promise. That's what I'm going to say. Um, and to some extent, the approach that we're taking is compatible with program synthesis, right? We're taking this object- centered description of the world. Um, and the reason we're doing that is because we want to automate systems engineering. Well, what's systems engineering? Oh, that's like taking this object and attaching to this one, attaching to this one until you get something that does something really cool, right? Program synthesis, right, is an abstract way of doing that, right? Is that you start with one program, you attach it to another program, attach it to another program, and so on and so forth. There is this problem of just understanding the program though. I mean, um, going back to dream coder, and I'm sure um, Kevin and Josh have put other ones out more recently. Some of the programs which are learned are just really complicated. You know, they had examples of like I think drawing towers and drawing graphs and stuff like that. and you just saw this, you know, huge um confection of rules that are being composed together and it it it's great. It has many good properties that it's a program, but it doesn't really make sense to us. >> Yeah. To a large extent, I suspect that there are ways around that that are related to how it is that like your AI coding agent actually works. So for example, right when they when they're doing this program synthesis, what they don't currently have access to is the is, you know, is is the kind of data set that like GitHub has access to. They don't have access to a whole bunch of really well-written programs that were that do exactly what they were intended to do. Um there was a a paper in Nature. This was actually one of those situations where um you know neuroscience is is making interesting statements about machine learning uh from Tony Zador and what he had done is they' taken a whole bunch of neural networks that did a variety of different things and then they came up with a way of genetically encoding them um for the purposes of seeing if like okay so what's the so it's like oh I had to have a layer that did this and then a layer that did this and and then and then what I'm going to do is I'm going to like compactly represent the weights in each layer and come up with a representation of that and Then I'm just going to look at a whole bunch of different neural networks and solve a whole bunch of different problems. Say, are there any like patterns that are present in these neural networks such that when I have a new problem I'm interested in, I can sort of just, you know, take something that understands this genetic code, maybe mutate it a little as a way of sensibly, you know, traversing the space of possible neural networks until I find the best one. Program synthesis could in principle exploit the same trick. They just need the data set to do it. >> Yeah. Yep. What are what are humans in a world where everything can be done by a robot? >> Yeah.

Video description

Dr. Jeff Beck, mathematician turned computational neuroscientist, joins us for a fascinating deep dive into why the future of AI might look less like ChatGPT and more like your own brain. **SPONSOR MESSAGES START** — Prolific - Quality data. From real people. For faster breakthroughs. https://www.prolific.com/?utm_source=mlst — **END** *What if the key to building truly intelligent machines isn't bigger models, but smarter ones?* In this conversation, Jeff makes a compelling case that we've been building AI backwards. While the tech industry races to scale up transformers and language models, Jeff argues we're missing something fundamental: the brain doesn't work like a giant prediction engine. It works like a scientist, constantly testing hypotheses about a world made of *objects* that interact through *forces* — not pixels and tokens. *The Bayesian Brain* — Jeff explains how your brain is essentially running the scientific method on autopilot. When you combine what you see with what you hear, you're doing optimal Bayesian inference without even knowing it. This isn't just philosophy — it's backed by decades of behavioral experiments showing humans are surprisingly efficient at handling uncertainty. *AutoGrad Changed Everything* — Forget transformers for a moment. Jeff argues the real hero of the AI boom was automatic differentiation, which turned AI from a math problem into an engineering problem. But in the process, we lost sight of what actually makes intelligence work. *The Cat in the Warehouse Problem* — Here's where it gets practical. Imagine a warehouse robot that's never seen a cat. Current AI would either crash or make something up. Jeff's approach? Build models that *know what they don't know*, can phone a friend to download new object models on the fly, and keep learning continuously. It's like giving robots the ability to say "wait, what IS that?" instead of confidently being wrong. *Why Language is a Terrible Model for Thought* — In a provocative twist, Jeff argues that grounding AI in language (like we do with LLMs) is fundamentally misguided. Self-report is the least reliable data in psychology — people routinely explain their own behavior incorrectly. We should be grounding AI in physics, not words. *The Future is Lots of Little Models* — Instead of one massive neural network, Jeff envisions AI systems built like video game engines: thousands of small, modular object models that can be combined, swapped, and updated independently. It's more efficient, more flexible, and much closer to how we actually think. Whether you're an AI researcher, a robotics enthusiast, or just curious about how minds — biological or artificial — actually work, this conversation offers a refreshingly different perspective on where intelligence comes from and where it's going. Rescript: https://app.rescript.info/public/share/D-b494t8DIV-KRGYONJghvg-aelMmxSDjKthjGdYqsE --- TIMESTAMPS: 00:00:00 Introduction & The Bayesian Brain 00:01:25 Bayesian Inference & Information Processing 00:05:17 The Brain Metaphor: From Levers to Computers 00:10:13 Micro vs. Macro Causation & Instrumentalism 00:16:59 The Active Inference Community & AutoGrad 00:22:54 Object-Centered Models & The Grounding Problem 00:35:50 Scaling Bayesian Inference & Architecture Design 00:48:05 The Cat in the Warehouse: Solving Generalization 00:58:17 Alignment via Belief Exchange 01:05:24 Deception, Emergence & Cellular Automata --- REFERENCES: Paper: [00:00:24] Zoubin Ghahramani (Google DeepMind) https://pmc.ncbi.nlm.nih.gov/articles/PMC3538441/pdf/rsta201 [00:19:20] Mamba: Linear-Time Sequence Modeling https://arxiv.org/abs/2312.00752 [00:27:36] xLSTM: Extended Long Short-Term Memory https://arxiv.org/abs/2405.04517 [00:41:12] 3D Gaussian Splatting https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/ [01:07:09] Lenia: Biology of Artificial Life https://arxiv.org/abs/1812.05433 [01:08:20] Growing Neural Cellular Automata https://distill.pub/2020/growing-ca/ [01:14:05] DreamCoder https://arxiv.org/abs/2006.08381 [01:14:58] The Genomic Bottleneck https://www.nature.com/articles/s41467-019-11786-6 Person: [00:16:42] Karl Friston (UCL) https://www.youtube.com/watch?v=PNYWi996Beg