Pontificia Universidad Católica de Chile Pontificia Universidad Católica de Chile
Wang A., Li A.C., Klassen T.Q., Toro Icarte R. Mcilraith S.A. (2023)

Learning Belief Representations for Partially Observable Deep RL

Revista : Proceedings of the 40th International Conference on Machine Learning
Volumen : 202
Páginas : 35970-35988
Tipo de publicación : ISI Ir a publicación

Abstract

Many important real-world Reinforcement Learning (RL) problems involve partial observability and require policies with memory. Unfortunately, standard deep RL algorithms for partially observable settings typically condition on the full history of interactions and are notoriously difficult to train. We propose a novel deep, partially observable RL algorithm based on modelling belief states — a technique typically used when solving tabular POMDPs, but that has traditionally been difficult to apply to more complex environments. Our approach simplifies policy learning by leveraging state information at training time, that may not be available at deployment time. We do so in two ways: first, we decouple belief state modelling (via unsupervised learning) from policy optimization (via RL); and second, we propose a representation learning approach to capture a compact set of reward-relevant features of the state. Experiments demonstrate the efficacy of our approach on partially observable domains requiring information seeking and long-term memory.