Approximate Modified Policy Iteration - Citegraph

Paper Info

Title
Approximate Modified Policy Iteration

Abstract
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analyses that unify those for approximate policy and value iteration. On the last classification-based implementation, we develop a finite-sample analysis that shows that MPI's main parameter allows to control the balance between the estimation error of the classifier and the overall value function approximation.

Year	Venue	DocType
2012	ICML	Journal
Volume	Citations	PageRank
abs/1205.3054	6	0.47
References	Authors
10	4

Authors (4 rows)

Cited by (6 rows)

References (10 rows)

Name	Order	Citations	PageRank
Bruno Scherrer	1	6	0.47
Victor Gabillon	2	116	9.51
Mohammad Ghavamzadeh	3	814	67.73
Matthieu Geist	4	385	44.31

1