From Cortical Columns to Transformers: A Thousand Brains in Modern AI

Posted May 11, 2025 Updated May 19, 2025

By Samira Ghodratnama

4 min read

As an AI researcher, I’m always seeking fresh perspectives on how we can advance our understanding of intelligence and, ultimately, replicate it in machines. One of the most thought-provoking books I’ve recently read is A Thousand Brains: A New Theory of Intelligence by Jeff Hawkins. In this book, Hawkins proposes a radical rethinking of how the brain works. He argues that intelligence arises from the brain’s ability to build “models” of the world from many different viewpoints. This theory has profound implications for AI and machine learning.

Hawkins’ model aligns closely with how we think about multi-modal systems in AI. Just like the brain, modern AI models like transformers build multiple representations of the same input. It suggests that the next frontier for AI will not be just improving neural networks, but improving the way we structure models to understand the world from multiple angles—just as the human brain does. For AI researchers, this opens up new possibilities in both architecture and training methods. If we can build systems that replicate this multi-viewpoint approach, we could push the boundaries of general AI and make systems far more adaptable and context-aware.

Introduction: A Thousand Brains Theory in a Nutshell

Jeff Hawkins’ A Thousand Brains: A New Theory of Intelligence proposes that human intelligence arises not from a single centralized model, but from thousands of parallel models built by cortical columns. Each cortical column acts as an independent learning unit, creating its own model of the world through sensory input and movement. The integration of these models via a voting mechanism results in coherent perception and understanding.

Key Concepts:

Cortical Columns as Units: Around 150,000 cortical columns in the human neocortex learn models independently and in parallel.
Reference Frames: Each column builds an object-centric coordinate system, allowing the brain to recognize objects from any viewpoint.
Distributed Learning & Consensus: Knowledge is shared and consensus is reached via communication between columns.

Cortical Columns vs. Transformers: Attention through Many Models

Transformers leverage parallel processing and multi-head attention, which resonates with Hawkins’ idea of multiple models working together:

Each transformer attention head extracts certain aspects of the data, akin to how cortical columns specialize in viewpoints.
Transformers lack explicit reference frames; positional embeddings are used instead of object-centric maps.
Knowledge in transformers is distributed across the network, similar to how the brain spreads information across many columns.

Key Parallel: Transformers use multi-head attention to integrate perspectives, loosely echoing how columns vote to build consensus.

Major Contrast: Cortical columns form persistent object models and use reference frames, something transformers don’t yet fully replicate.

Hawkins’ emphasis on reference frames is being reflected in embodied AI and multi-modal agents:

Embodied Agents: AI agents now actively explore environments to gather information from different viewpoints.
3D Understanding: Robotics and vision models employ internal object-centric maps to generalize recognition across different angles.
Multi-Modal Systems: Separate modality experts (e.g., vision, touch, language) work together, comparable to how different cortical columns coordinate.
Sensorimotor Learning: The trend towards agents that interact with environments rather than passively observing aligns with Hawkins’ theory.

Example Technologies:

CLIP’s image-text matching shows early forms of cross-modal consensus.
Neural Radiance Fields (NeRF) and object-centric networks enable spatial understanding.

Modular Systems and Mixture-of-Experts: Many Models, One AI

Hawkins’ brain-inspired vision of decentralized intelligence is increasingly mirrored in modular AI and MoE architectures:

Sparse MoE Models: DeepMind’s Mixture of a Million Experts uses many small expert networks, activating only a few per input, much like columns firing selectively.
Ensemble Learning: Multiple independent models reduce noise and enhance robustness, a principle already inherent to the brain.
Neural Module Networks: Models designed for specific sub-tasks that combine to answer complex queries reflect the cooperative nature of cortical columns.

Comparison:

AI MoE systems use learned gates to route input to the right experts.
The brain relies on distributed reciprocal communication and probabilistic voting among columns.

Alignment and Contrasts with Mainstream Deep Learning Paradigms

Alignments:

AI research is moving towards distributed architectures, multi-modal fusion, and object-centric representations.
Active exploration in embodied AI mirrors human sensorimotor learning.

Contrasts:

Brains learn locally and continually; deep learning depends heavily on global error gradients and large static datasets.
The brain shows graceful resilience and continuous learning without catastrophic forgetting.
Symbol grounding remains a gap; current AI models struggle to tie language concepts to real-world interaction and spatial reference frames.

Conclusion

Jeff Hawkins’ Thousand Brains Theory offers a blueprint for next-generation AI:

Modularity + continuous learning
Multi-modal perception + spatial grounding
Distributed decision-making + consensus

While transformers and MoEs are early steps, Hawkins challenges us to design AI systems that better reflect the brain’s structure: resilient, flexible, and capable of real-time learning. Bridging these ideas can lead us toward more generalizable and human-like artificial intelligence.

References

Hawkins, Jeff. A Thousand Brains: A New Theory of Intelligence

AI, Neuroscience, Theory of Intelligence

This post is licensed under CC BY 4.0 by the author.