Offline reinforcement learning (RL) is a data-driven approach to learning policies from previously collected data. However, recent studies have shown that imitation learning, which requires high-quality demonstration data, often outperforms offline RL even when it has plenty of data. To address this, previous research has proposed various methods to estimate more accurate value functions from offline data. Researchers from the University of California Berkeley and Google DeepMind have made two surprising observations in offline RL, offering practical advice for domain-specific practitioners and future algorithm development.
