On Predictive planning and counterfactual learning in active inference
Given the rapid advancement of artificial intelligence, understanding the foundations of intelligent behaviour is increasingly important. Active inference, regarded as a general theory of behaviour, offers a principled approach to probing the basis of sophistication in planning and decision-making. In this paper, we examine two decision-making schemes in active inference based on ’planning’ and ’learning from experience’. Furthermore, we also introduce a mixed model that navigates the data-complexity trade-off between these strategies, leveraging the strengths of both to facilitate balanced decision-making. We evaluate our proposed model in a challenging grid-world scenario that requires adaptability from the agent. Additionally, our model provides the opportunity to analyze the evolution of various parameters, offering valuable insights and contributing to an explainable framework for intelligent decision-making.
Introduction. Defining and thereby separating the intelligent “agent” from its embodied “environment”, which then provides feedback to the agent, is crucial to model intelligent behaviour. Popular approaches, like reinforcement learning (RL), heavily employ such models containing agent-environment loops, which boils down the problem to agent(s) trying to maximise reward in the given uncertain environment Sutton and Barto [2018]. Active inference has emerged in neuroscience as a biologically plausible framework Friston [2010], which adopts a different approach to modelling intelligent behaviour compared to other contemporary methods like RL. In the active inference framework, an agent accumulates and maximises the model evidence during its lifetime to perceive, learn, and make decisions Da Costa et al. [2020], Sajid et al. [2021], Millidge et al. [2020]. However, maximising the model evidence becomes challenging when the agent encounters a highly ’entropic’ observation (i.e. an unexpected observation) concerning the agent’s generative (world) model Da Costa et al. [2020], Sajid et al.
Discussion / Conclusion. An additional advantage of the mixed model proposed (and the POMDP-based generative models) is that we can probe the model parameters to understand the basis of intelligent behaviour demonstrated by agents through the lens of active inference. Models that rely on artificial neural networks (ANNs) to scale up the models Fountas et al. [2020] have limited explainability regarding how agents make decisions, especially when faced with uncertainty. In Fig.8: (A), we can probe to see the evolution of the risk (Γt) in the model (associated with the CL method scheme as defined in Isomura et al. [2022]). We can observe that the model’s risk quickly tends to zero when the easy grid is presented and solved; however, it shoots up when faced with the environment mutation. Similarly, the evolution of the bias parameter (that balances the DPEFE and CL method in the mixed model) is shown in Fig.8: (B). Here, we also observe how the agent consistently maintains a higher bias to the DPEFE model when it has a higher planning ability (i.e. the agent with a planning depth of N = 50 compared to bias in agents with N = 25, and N = 5).