Title
Approximate Modified Policy Iteration
Abstract
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analyses that unify those for approximate policy and value iteration. On the last classification-based implementation, we develop a finite-sample analysis that shows that MPI's main parameter allows to control the balance between the estimation error of the classifier and the overall value function approximation.
Year
Venue
DocType
2012
ICML
Journal
Volume
Citations 
PageRank 
abs/1205.3054
6
0.47
References 
Authors
10
4
Name
Order
Citations
PageRank
Bruno Scherrer160.47
Victor Gabillon21169.51
Mohammad Ghavamzadeh381467.73
Matthieu Geist438544.31