Reinforcement learning (RL) is a machine learning technique that uses offline datasets to train and evaluate RL and bandit policies, reducing the need for risky and unethical real-world tests. Researchers are now focusing on evaluating the risk-return tradeoff during online policy deployment, rather than just the accuracy of OPE methods.
