Be Considerate: Avoiding Negative Side Effects in Reinforcement Learning
Revista : Proceedings of the 21st International Conference on Autonomous Agents and Multiagent SystemsPáginas : 18-26
Tipo de publicación : ISI
Abstract
In sequential decision making — whether it’s realized with or without the benefit of a model — objectives are often underspecified or incomplete. This gives discretion to the acting agent to realize the stated objective in ways that may result in undesirable outcomes, including inadvertently creating an unsafe environment or indirectly impacting the agency of humans or other agents that typically operate in the environment. In this paper, we explore how to build a reinforcement learning (RL) agent that contemplates the impact of its actions on the wellbeing and agency of others in the environment, most notably humans. We endow RL agents with the ability to contemplate such impact by augmenting their reward based on expectation of future return by others in the environment, providing different criteria for characterizing impact. We further endow these agents with the ability to differentially factor this impact into their decision making, manifesting behaviour that ranges from self-centred to self-less, as demonstrated by experiments in gridworld environments.