

Deep Learning is Not So Mysterious or Different - Prof. Andrew Gordon Wilson (NYU)
Machine Learning Street Talk
Episode Description
<p>Professor Andrew Wilson from NYU explains why many common-sense ideas in artificial intelligence might be wrong. For decades, the rule of thumb in machine learning has been to fear complexity. The thinking goes: if your model has too many parameters (is "too complex") for the amount of data you have, it will "overfit" by essentially memorizing the data instead of learning the underlying patterns. This leads to poor performance on new, unseen data. This is known as the classic "bias-variance trade-off" i.e. a balancing act between a model that's too simple and one that's too complex.</p><p><br></p><p>**SPONSOR MESSAGES**</p><p>—</p><p>Tufa AI Labs is an AI research lab based in Zurich. **They are hiring ML research engineers!** </p><p>This is a once in a lifetime opportunity to work with one of the best labs in Europe</p><p>Contact Benjamin Crouzier - https://tufalabs.ai/ </p><p>—</p><p>Take the Prolific human data survey - https://www.prolific.com/humandatasurvey?utm_source=mlst and be the first to see the results and benchmark their practices against the wider community!</p><p>—</p><p>cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy</p><p>Oct SF conference - https://dagihouse.com/?utm_source=mlst - Joscha Bach keynoting(!) + OAI, Anthropic, NVDA,++</p><p>Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst</p><p>Submit investment deck: https://cyber.fund/contact?utm_source=mlst</p><p>— </p><p><br></p><p>Description Continued:</p><p><br></p><p>Professor Wilson challenges this fundamental belief (fearing complexity). He makes a few surprising points:</p><p><br></p><p>**Bigger Can Be Better**: massive models don't just get more flexible; they also develop a stronger "simplicity bias". So, if your model is overfitting, the solution might paradoxically be to make it even bigger.</p><p><br></p><p>**The "Bias-Variance Trade-off" is a Misnomer**: Wilson claims you don't actually have to trade one for the other. You can have a model that is incredibly expressive and flexible while also being strongly biased toward simple solutions. He points to the "double descent" phenomenon, where performance first gets worse as models get more complex, but then surprisingly starts getting better again.</p><p><br></p><p>**Honest Beliefs and Bayesian Thinking**: His core philosophy is that we should build models that honestly represent our beliefs about the world. We believe the world is complex, so our models should be expressive. But we also believe in Occam's razor—that the simplest explanation is often the best. He champions Bayesian methods, which naturally balance these two ideas through a process called marginalization, which he describes as an automatic Occam's razor.</p><p><br></p><p>TOC:</p><p><br></p><p>[00:00:00] Introduction and Thesis</p><p>[00:04:19] Challenging Conventional Wisdom</p><p>[00:11:17] The Philosophy of a Scientist-Engineer</p><p>[00:16:47] Expressiveness, Overfitting, and Bias</p><p>[00:28:15] Understanding, Compression, and Kolmogorov Complexity</p><p>[01:05:06] The Surprising Power of Generalization</p><p>[01:13:21] The Elegance of Bayesian Inference</p><p>[01:33:02] The Geometry of Learning</p><p>[01:46:28] Practical Advice and The Future of AI</p><p><br></p><p>Prof. Andrew Gordon Wilson:</p><p>https://x.com/andrewgwils</p><p>https://cims.nyu.edu/~andrewgw/</p><p>https://scholar.google.com/citations?user=twWX2LIAAAAJ&hl=en </p><p>https://www.youtube.com/watch?v=Aja0kZeWRy4 </p><p>https://www.youtube.com/watch?v=HEp4TOrkwV4 </p><p><br></p><p>TRANSCRIPT:</p><p>https://app.rescript.info/public/share/H4Io1Y7Rr54MM05FuZgAv4yphoukCfkqokyzSYJwCK8</p><p><br></p><p>Hosts:</p><p>Dr. Tim Scarfe / Dr. Keith Duggar (MIT Ph.D)</p><p><br></p><p>REFS:</p><p><br></p><p>Deep Learning is Not So Mysterious or Different [Andrew Gordon Wilson]</p><p>https://arxiv.org/abs/2503.02113</p><p><br></p><p>Bayesian Deep Learning and a Probabilistic Perspective of Generalization [Andrew Gordon Wilson, Pavel Izmailov]</p><p>https://arxiv.org/abs/2002.08791</p><p><br></p><p>Compute-Optimal LLMs Provably Generalize Better With Scale [Marc Finzi, Sanyam Kapoor, Diego Granziol, Anming Gu, Christopher De Sa, J. Zico Kolter, Andrew Gordon Wilson]</p><p>https://arxiv.org/abs/2504.15208 </p>
Processing in Progress
This episode is being processed. The AI summary will be available soon. Currently transcribing audio...
Related Episodes

The Mathematical Foundations of Intelligence [Professor Yi Ma]
Machine Learning Street Talk
1h 39m

Pedro Domingos: Tensor Logic Unifies AI Paradigms
Machine Learning Street Talk
1h 27m

Why Humans Are Still Powering AI [Sponsored]
Machine Learning Street Talk
24m

The Universal Hierarchy of Life - Prof. Chris Kempes [SFI]
Machine Learning Street Talk
40m

Google Researcher Shows Life "Emerges From Code" - Blaise Agüera y Arcas
Machine Learning Street Talk
59m

The Secret Engine of AI - Prolific [Sponsored] (Sara Saab, Enzo Blindow)
Machine Learning Street Talk
1h 19m
No comments yet
Be the first to comment