Access

You are not currently logged in.

Access your personal account or get JSTOR access through your library or other institution:

login

Log in to your personal account or through your institution.

Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach

Keith W. Ross and Ravi Varadarajan
Mathematics of Operations Research
Vol. 16, No. 1 (Feb., 1991), pp. 195-207
Published by: INFORMS
Stable URL: http://www.jstor.org/stable/3689856
Page Count: 13
  • Download ($30.00)
  • Cite this Item
Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach
Preview not available

Abstract

We consider finite-state finite-action Markov decision processes which accumulate both a reward and a cost at each decision epoch. We study the problem of finding a policy that maximizes the expected long-run average reward subject to the constraint that the long-run average cost be no greater than a given value with probability one. We establish that if there exists a policy that meets the constraint, then there exists an ∈-optimal stationary policy. Furthermore, an algorithm is outlined to locate the ∈-optimal stationary policy. The proof of the result hinges on a decomposition of the state space into maximal recurrent classes and a set of transient states.

Page Thumbnails

  • Thumbnail: Page 
195
    195
  • Thumbnail: Page 
196
    196
  • Thumbnail: Page 
197
    197
  • Thumbnail: Page 
198
    198
  • Thumbnail: Page 
199
    199
  • Thumbnail: Page 
200
    200
  • Thumbnail: Page 
201
    201
  • Thumbnail: Page 
202
    202
  • Thumbnail: Page 
203
    203
  • Thumbnail: Page 
204
    204
  • Thumbnail: Page 
205
    205
  • Thumbnail: Page 
206
    206
  • Thumbnail: Page 
207
    207