waste of computation
policy evaluation is a fixed-policy version of value iteration
full (MDP) problem solved in one step
-> value iteration solution by bellman equation (consider every action for each state)
-> policy evaluation + policy improvement (take only one action for each state)
we aren't given the MDP
(meaning that the transition matrix is given???)













网友评论