Self learning control of constrained Markov chains - a gradient approach

TitleSelf learning control of constrained Markov chains - a gradient approach
Publication TypeConference Paper
Year of Publication2002
AuthorsAbad, F. V., V. Krishnamurthy, K. Martin, and I. Baltcheva
Conference NameDecision and Control, 2002, Proceedings of the 41st IEEE Conference on
Pagination1940 - 1945 vol.2
Date Publisheddec.
Keywordsapproximation theory, constrained average cost finite state Markov decision process, constrained Markov chains, decision theory, gradient approach, gradient estimation schemes, gradient methods, learning systems, locally optimal policy, Markov processes, self learning control, self-adjusting systems, stochastic approximation algorithms, time varying parameters, weak derivatives
Abstract

We present stochastic approximation algorithms for computing the locally optimal policy of a constrained average cost finite state Markov decision process. The stochastic approximation algorithms require computation of the gradient of the cost function with respect to the parameter that characterizes the randomized policy. This is computed by simulation based gradient estimation schemes involving weak derivatives. Similar to neuro-dynamic programming algorithms (e.g. Q-learning or temporal difference methods), the algorithms proposed in the paper are simulation based and do not require explicit knowledge of the underlying parameters such as transition probabilities. However, unlike neuro-dynamic programming methods, the algorithms proposed can handle constraints and time varying parameters. The multiplier based constrained stochastic gradient algorithm proposed is also of independent interest in stochastic approximation.

URLhttp://dx.doi.org/10.1109/CDC.2002.1184811
DOI10.1109/CDC.2002.1184811

a place of mind, The University of British Columbia

Electrical and Computer Engineering
2332 Main Mall
Vancouver, BC Canada V6T 1Z4
Tel +1.604.822.2872
Fax +1.604.822.5949
Email:

Emergency Procedures | Accessibility | Contact UBC | © Copyright 2020 The University of British Columbia