The structure of brains
Brains consist of billions of neurons, each connected by synapses that regulate how signals flow between cells. Embedded in this network are ion channels (tiny pores that help regulate neurons’ electrical signals), glial cells (support cells that provide energy and structure), and hundreds of neuropeptides (molecules that can modify processes like synaptic strength and gene expression). When we learn, many — likely most — of these components are modified in some way.
Given this immense complexity, it might seem implausible that brains collectively implement something as conceptually simple as gradient descent. Yet, I believe they do, albeit in a biologically grounded and decentralized manner (as written as a paper with my friend Blake Richards).
Gradient descent
At its core, gradient descent is an optimization algorithm that continuously updates a system’s parameters (e.g., synaptic strengths) to improve performance. After an action is taken — like making a quick turn while skiing — an error signal indicates how suboptimal or successful that action was. Parameters that should have been larger (to better support the movement) are incremented, while those that should have been smaller are reduced.
Mathematically, if we let θ represent the set of parameters and L(θ) be a loss function measuring how “wrong” or suboptimal the output is, the update rule for gradient descent looks like this:
where η is a learning rate and ∇L(θ) is the gradient of L with respect to θ. Despite its simplicity, gradient descent is extremely powerful for learning, both in artificial neural networks and, I argue, in biological brains.
Why Should the Brain Perform Gradient Descent?
1. Biological Plausibility
A wealth of studies suggest that neural circuits could implement gradient-like learning locally, guided by activity-dependent feedback. For example:
Contrastive Hebbian learning, a learning rule where neurons that fire together adjust their connections, describes how local activation differences drive error-based synaptic changes.
Dendritic error signals propose that specific synaptic and dendritic mechanisms can carry information about errors directly to the synapses that need adjusting.
Predictive coding suggests that reducing prediction errors can be interpreted within an optimization framework.
Although these approaches don’t always explicitly claim “the brain does gradient descent,” the local synaptic updates and error-reduction objectives are often reminiscent of gradient-based methods. They provide plausible biological pathways for the brain to systematically tune itself toward better performance.
2. Computational Efficiency
Among learning algorithms, gradient descent stands out for its sample efficiency. Methods lacking gradient information — such as random search, evolutionary strategies, or purely trial-and-error forms of reinforcement learning — tend to require more interactions to converge.
Zeroth-order (black-box) optimization, which does not rely on derivatives, approximates gradients by testing how small random changes to the parameters affect performance. For example, an update might look like:
where δ is a random perturbation. Because this approach essentially guesses the gradient rather than computing it directly, it usually requires more attempts to converge on a good solution. And the number of extra attempts it needs grows as learning becomes more high-dimensional.
By leveraging actual gradients, fewer samples are needed to refine parameters effectively. In high-dimensional spaces — such as the countless interacting neurons and synapses in the brain — leveraging gradients drastically reduces the trial-and-error needed to improve performance, making gradient-based methods extraordinarily efficient. Given the evolutionary pressures to use energy and time efficiently, it’s reasonable to suspect that biological learning might mirror these gradient-based advantages.
Conclusion
While it’s unlikely the brain runs textbook backpropagation as in artificial neural networks, multiple lines of evidence (including elegant mathematical ones) point toward gradient-based optimization strategies operating in neural circuits. From local synaptic plasticity rules to broader error-reduction frameworks, biology appears to leverage principles akin to those that make gradient descent so successful in machine learning. Understanding these mechanisms may yield deeper insights into both human cognition and the next frontiers of artificial intelligence.
Also read
Lots of great relevant research by:
Geoffrey Hinton
Randall C. O’Reilly
Blake Richards
Tim Lillicrap
Walter Senn
Sander Bothe
Yoshua Bengio
Benjamin Scellier
Xiaoqing Zheng
Yuhan Helena Liu
And many others
And what is the objective function?