W28 - "The Bitter Lesson"

I only recently realized the significance of the short essay “The Bitter Lesson.” Published on Sutton’s personal blog in 2019, it summarizes the biggest lesson from 70 years of AI research: the real drivers of AI progress are general methods that fully exploit compute, not domain-specific tricks that rely on human expert knowledge. It cites examples in speech recognition, natural language processing, and computer vision. Early research largely depended on expert-crafted features and rules, whereas later statistical methods and the rise of deep learning far outperformed those approaches.

The evolution of complex systems and intelligence follows much the same pattern: progress comes not from human design but from mechanisms that can adapt to the environment and discover and scale solutions on their own.

I also read a 2019 book — Deep Learning: The Core Driving Force of the Age of Intelligence. The author, Terrence Sejnowski, was an early proponent of neural networks and co-invented the Boltzmann machine with Hinton, so he’s quite authoritative. The book was popular at the time: AlphaZero’s successes had already drawn broad attention from markets and policymakers to AI. BERT and GPT were then niche terms known mainly within the field, and the book doesn’t even mention them.

Back then I bought the print edition to ride the wave. Although it’s a popular science book, without prior exposure it felt distant and hard to truly engage with. Now, opening it again, the experience is completely different. What felt like heat at the time now seems more like concentrated interest rather than a full mainstream breakout. That difference makes reading it today quite interesting. Having witnessed AI’s development over these years, many names and terms from that era suddenly feel familiar, and many earlier scenes echo in later developments.

It’s a curious experience: what once felt alien and abstract has become something I can understand and discuss.

Last updated