This article discusses the integration of reinforcement learning algorithms into recommendation systems to improve the accuracy and adaptability of personalized recommendations. The proposed OAUC model combines Q-learning and area under curve evaluation method to consider historical user behavior and adapt to changes in user interests. The experiment uses DDPG RL algorithm to optimize news recommendation strategy and evaluates the model’s performance using QAUC curve. The paper highlights the importance of RL in personalized recommendation and provides new ideas for recommendation algorithm development.
