In our recent AAAI 2023 paper, Misspecification in Inverse Reinforcement Learning, we study the question of how robust the inverse reinforcement learning problem is to misspecification of the underlying behavioural model. We provide a mathematical framework for reasoning about this question, and use it to derive necessary and sufficient conditions describing what types of misspecification each of the standard behavioural models are (or are not) robust to. We also provide several results and formal tools which can be used to study the misspecification robustness of any behavioural models that may be newly developed. Inverse reinforcement learning (IRL) is an area of machine learning concerned with inferring what objective an agent is pursuing, based on the actions taken by that agent. It is typically assumed that the behaviour of the observed agent is described by a (stationary) policy π, and that its objectives are described by a reward function, R. One of the central challenges in reinforcement learning is that, in real-world situations, it is typically very difficult to create reward functions that never incentivise undesired behaviour.
