is a set of rules that the agent follows to decide which action to take. The agent updates its policy based on the rewards it receives.
The agent also uses a value function to measure how good a given state is. The value function is a measure of how much reward the agent can expect to receive in the future if it takes a certain action in a given state. The agent updates its value function based on the rewards it receives.
The agent uses the policy and value function to decide which action to take in a given state. Over time, the agent learns to take actions that maximize the total reward.