Policy gradient stochastic approximation algorithms for adaptive control of constrained time varying Markov decision processes

TitlePolicy gradient stochastic approximation algorithms for adaptive control of constrained time varying Markov decision processes
Publication TypeConference Paper
Year of Publication2003
AuthorsAbad, F. J. V., and V. Krishnamurthy
Conference NameDecision and Control, 2003. Proceedings. 42nd IEEE Conference on
Pagination2823 - 2828 Vol.3
Date Publisheddec.
Keywordsadaptive control, approximation theory, augmented Lagrangian methods, average cost finite state Markov decision process, constrained time varying Markov decision processes, constraint handling, decision theory, gradient estimation schemes, gradient methods, gradient projection primal methods, Markov processes, policy gradient stochastic approximation, time-varying systems, weak derivatives
Abstract

We present constrained stochastic approximation algorithms for computing the locally optimal policy of a constrained average cost finite state Markov decision process. The stochastic approximation algorithms require computation of the gradient of the cost function with respect to the parameter that characterizes the randomized policy. This is computed by novel simulation based gradient estimation schemes involving weak derivatives. The algorithms proposed are simulation based and do not require explicit knowledge of the underlying parameters such as transition probabilities. We present three classes of algorithms based on primal dual methods, augmented Lagrangian (multiplier) methods and gradient projection primal methods. Unlike neuro-dynamic programming methods such as Q-Learning, the algorithms proposed here can handle constraints and time varying parameters.

URLhttp://dx.doi.org/10.1109/CDC.2003.1273053
DOI10.1109/CDC.2003.1273053

a place of mind, The University of British Columbia

Electrical and Computer Engineering
2332 Main Mall
Vancouver, BC Canada V6T 1Z4
Tel +1.604.822.2872
Fax +1.604.822.5949
Email:

Emergency Procedures | Accessibility | Contact UBC | © Copyright 2020 The University of British Columbia