Q-learning algorithms for constrained Markov decision processes with randomized monotone policies: Application to MIMO transmission control

TitleQ-learning algorithms for constrained Markov decision processes with randomized monotone policies: Application to MIMO transmission control
Publication TypeJournal Article
Year of Publication2007
AuthorsDjonin, D. V., and V. Krishnamurthy
JournalIEEE Transactions on Signal Processing
Volume55
Pagination2170–2181
ISSN1053-587X
Abstract

This paper presents novel Q-learning based stochastic control algorithms for rate and power control in V-BLAST transmission systems. The algorithms exploit the supermodularity and monotonic structure results derived in the companion paper. Rate and power control problem is posed as a stochastic optimization problem with the goal of minimizing the average transmission power under the constraint on the average delay that can be interpreted as the quality of service requirement of a given application. Standard Q-learning algorithm is modified to handle the constraints so that it can adaptively learn structured optimal policy for unknown channel/traffic statistics. We discuss the convergence of the proposed algorithms and explore their properties in simulations. To address the issue of unknown transmission costs in an unknown time-varying environment, we propose the variant of Q-learning algorithm in which power costs are estimated in online fashion, and we show that this algorithm converges to the optimal solution as long as the power cost estimates are asymptotically unbiased.

URLhttp://dx.doi.org/10.1109/TSP.2007.893228
DOI10.1109/TSP.2007.893228

a place of mind, The University of British Columbia

Electrical and Computer Engineering
2332 Main Mall
Vancouver, BC Canada V6T 1Z4
Tel +1.604.822.2872
Fax +1.604.822.5949
Email:

Emergency Procedures | Accessibility | Contact UBC | © Copyright 2020 The University of British Columbia