Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity
Creating reinforcement learning (RL) agents that are capable of accepting and leveraging task specific knowledge from humans has been long identified as a possible strategy for developing scalable approaches for solving long-horizon problems. While previous works have looked at the possibility of using symbolic models along with RL approaches, they tend to assume that the high-level action models are executable at low level and the fluents can exclusively characterize all desirable MDP states. Symbolic models of real world tasks are however often incomplete. To this end, we introduce Approximate Symbolic-Model Guided Reinforcement Learning, wherein we will formalize the relationship between the symbolic model and the underlying MDP that will allow us to characterize the incompleteness of the symbolic model. We will use these models to extract high-level landmarks that will be used to decompose the task. At the low level, we learn a set of diverse policies for each possible task subgoal identified by the landmark, which are then stitched together.
To illustrate the importance of accounting for possible incompleteness in symbolic models, consider a simple household robotics domain (henceforth referred to as the Household environment) where the task is for a robot to visit a particular location represented by the green block (Fig. 1). The robot can do that by picking up the red key, then recharging, then opening the door to visit the final location. A human user could potentially help the robot in its learning process by providing various pieces of information related to the task. For example, providing information such as the location of the destination and the fact that the door is locked and would require a key to open. However, such information, potentially provided as a symbolic model, need not be a complete representation of the task and may even contain incorrect information. For one, they may have forgotten to mention the fact that there are multiple keys in the house, and that only one of them can open the door. They may thus have incorrectly specified that the robot can use any of the keys. In this case, the symbolic model only partially specifies the prerequisites for opening the door. Secondly, the user may not be a robotics expert and might not know that this particular robot model has limited battery capacity and would require recharging itself in the middle of the task (by visiting the charging dock). In this case, features related to the charging dock and the robot’s battery level may be completely missing in the symbolic model. Thirdly, the user would expect the door to remain ajar after the robot enters the room, but in reality the door will be automatically closed once it enters the room. In this case, the symbolic model might incorrectly specify that the effect of the action of passing through the door is both that the robot is in the destination room and the door is still ajar. As a result, existing approaches (Illanes et al., 2020; Lyu et al., 2019) that expect a correct and complete model will fail due to multiple reasons.