Prospective Learning: Anticipating a Changing Future

Trying to write down what learning is about in the real world.

Mar 11, 2025

Imagine a rat navigating a maze. As it scurries through, it is not only looking for food — it is also learning to avoid future dangers. Sometimes, a sudden drought might make its usual water source vanish; at other times, predators might lurk around corners. This rat, like many organisms in nature, must continuously adapt to a world that is constantly changing. For about three years, Timothy Verstynen, Joshua T. Vogelstein, and myself (with help from many colleagues) tried to formulate more precisely what we felt was missing in Biological and Artificial Intelligence - learning for a future: prospective learning. In a recent computational spelling out of this idea, the Chaudhari and Vogelstein labs. constructed a precise formulation, but in this blog post, I rather want to formulate it from a more reinforcement learning perspective, which many of my readers may be more accustomed to.

In this blog post, we will explore what prospective learning is and how it differs from the more familiar framework of reinforcement learning (RL). We will use a biological example — a rat in a dynamic environment — to illustrate the concept, making it accessible to both biologists and computer scientists.

The Limitations of Standard Reinforcement Learning

Traditional reinforcement learning (RL) assumes that the environment remains constant — that is, the rules and conditions governing the system do not change as the agent interacts with it. For example, in a partially observed Markov decision process (POMDP), the environment is characterized by unchanging dynamics and a cost function, which simplifies the learning task. The objective is typically expressed as:

\(J_{\text{RL}}(\pi) = \mathbb{E}\!\left[\sum_{k=1}^{\infty} \gamma^{\,k-1}\; c\bigl(o_{t+k},\,\pi(\tau_{t+k})\bigr)\right]. \)

Here,

π is a stationary policy that maps observation histories τ (what the agent has seen so far) to actions
c(o, a) is the cost incurred when taking action a after observing o
γ is a discount factor that de-emphasizes costs in the distant future, and
the dynamics, how the state changes and what observations are made are assumed constant over time.

For our rat, this would mean that the layout of the maze, the locations of water, and the presence of predators are all unchanging. This fixed-world assumption has been widely discussed in the literature, which shows that standard RL frameworks struggle when the environment is dynamic. In the real world, however, this is rarely true.

Enter Prospective Learning

Prospective learning shifts the perspective. Instead of assuming a static world, prospective learning acknowledges that both the environment’s dynamics and the loss function can evolve over time. Imagine our rat again — but now the maze is dynamic. Today, the water might be plentiful; tomorrow, a drought could strike. Similarly, the risk from predators might increase or decrease depending on the time of day or season. Before diving into the equations, let’s outline the main idea: we are transitioning from a world where conditions are fixed to one where both the environment’s rules and costs evolve over time.

To capture this idea formally, we start by letting the environment at time t be represented by

\( \mathcal{E}_t = (\theta_t, c_t), \)

here,

θₜ defines the POMDP’s dynamics (how states transition and what observations are generated), and
cₜ is the cost function (or loss) at time t

Unlike standard RL, where these remain constant, here we assume that the future environment evolves (in a Markovian way, generalizations are possible beyond that) according to a probability model:

\(p\bigl(\mathcal{E}_{t+1:\infty} \mid \mathcal{E}_t\bigr).\)

in other words, we can assume that the parameters of the world (and tasks/losses) evolve according to Markovian (or alternatively arbitrary) dynamics. A different way of talking about this is that over short periods of time, we have a POMDP world and over longer periods of time the POMDP changes as another POMDP. Now, you may argue that a POMDP over a POMDP can be written out as a POMDP with an augmented state space. However, by doing this explicit separation of timescales allows us, I think, to think more concretely about the dynamics of the world.

In simpler terms, this model estimates the probability of various future scenarios based on the current state, making it easier to plan ahead. Because the world is changing, a single stationary policy is no longer sufficient. Instead, the agent must select a sequence of policies — one for each future time step. The prospective RL objective becomes:

\(J_{\text{prospective}}\Bigl(\pi_{t+1:\infty}\Bigr) = \mathbb{E}\!\left[\sum_{k=1}^{\infty} \gamma^{\,k-1}\; c_{t+k}\Bigl(o_{t+k},\,\pi_{t+k}(\tau_{t+k})\Bigr) \,\Bigg|\, \tau_{\le t},\, \mathcal{E}_{t+1:\infty} \sim p\bigl(\cdot\mid\mathcal{E}_t\bigr)\right]. \)

In this formulation,

The cost at time t+k is given by the evolving function c_t+k,
The dynamics at time t+k are governed by parameters θ_t+k, and
The agent selects a different policy π_t+k at each future time t+k to best adapt to the upcoming changes. The cost at time t+k is given by the evolving function c_t+k,
The dynamics at time t+k are governed by parameters θ_t+k, and
The agent selects a different policy π_t+k at each future time t+k to best adapt to the upcoming changes.

For our rat, this means that its strategy to navigate the maze isn’t fixed — it may run differently today than it would tomorrow as it anticipates potential droughts or the presence of predators. While the equations may seem complex at first glance, they fundamentally express one idea: planning for a future where conditions change over time.

p.s.: My friend and collaborator Joshua T. Vogelstein mentioned he will present a more complete and precise PL version of RL soon.

The field is already (slowly) moving towards prospective learning

While meta-learning, multi-task learning, continual learning, and curiosity-driven systems have each made significant contributions to solving specific adaptation challenges, prospective learning aims to build upon these advances by integrating their insights to address the evolving nature of both environmental dynamics and cost functions. For example, meta-learning frameworks excel at rapidly adapting to task-specific variations, typically under a stable cost structure, and prospective learning extends this idea by also accounting for changes in the cost function over time. Similarly, while multi-task learning leverages shared representations to enhance performance across related tasks — often assuming tasks are independently drawn from a fixed distribution — prospective learning considers a more structured evolution of tasks and environmental dynamics over time. Continual learning methods have made significant progress in mitigating catastrophic forgetting, and prospective learning builds on this progress by additionally planning for predictable changes in the environment and costs. However, both typically overlook the predictable evolution of environmental dynamics. Systems driven by intrinsic curiosity effectively promote the exploration of novel states, and prospective learning seeks to complement this behavior with robust long-term planning that adapts to evolving objectives. Methods aiming at understanding causality try and learn aspects of the world that are unchanging. Rather than replacing existing methods, the prospective learning framework aspires to unify their strengths by anticipating both shifts in environmental dynamics and fluctuations in cost functions, ultimately providing a more comprehensive approach to tackling complex, real-world scenarios.

Bridging Biology and Machine Learning

A Biological Analogy

Think of the rat’s brain as a sophisticated prospective learner. As the rat moves through its environment, its brain not only processes immediate sensory inputs but also continuously predicts future conditions. For example:

Anticipation of Droughts: If the rat has experienced a pattern of drying conditions, it might start favoring paths that lead to underground water reserves, even if those paths take a longer route.
Avoidance of Predators: Similarly, if it senses signs of a predator or recalls an encounter from the past, it might alter its route preemptively to avoid dangerous areas. And it may actively explore for shortcuts it could use if the need should arise to run away.

This dynamic adjustment and the underlying curiosity of animals is crucial for survival in a non-stationary world. Prospective learning in machine learning aims to mirror this capability: to adapt continuously as the world changes.

Implications for Computer Science

For computer scientists, prospective learning offers a new way to design algorithms for environments where the underlying distribution or objectives change over time. Traditional methods often require retraining or manual intervention when the environment shifts. Emerging partial solutions like multi-task learning only really deal with sub-problems. In contrast, a complete prospective learning framework inherently plans for change, leading to more robust and adaptive systems — whether that’s in robotics, finance, or even language modeling.

Conclusion

Prospective learning represents a significant departure from traditional RL by acknowledging that the world is dynamic and the dynamics of the world are dynamic. By incorporating a model for how both the environment’s dynamics and cost functions change, prospective learning enables an agent to plan ahead and adapt its policy over time.

Whether you’re a biologist fascinated by animal behavior or a computer scientist exploring advanced learning paradigms, prospective learning offers a rich framework that captures the essence of adapting to a changing world — just as nature intended.

Konrad’s Substack

Discussion about this post

Ready for more?