## Access

You are not currently logged in.

Access your personal account or get JSTOR access through your library or other institution:

## If You Use a Screen Reader

This content is available through Read Online (Free) program, which relies on page scans. Since scans are not currently available to screen readers, please contact JSTOR User Support for access. We'll provide a PDF copy for your screen reader.

# Markovian Sequential Replacement Processes

Howard M. Taylor, III
The Annals of Mathematical Statistics
Vol. 36, No. 6 (Dec., 1965), pp. 1677-1694
Stable URL: http://www.jstor.org/stable/2239109
Page Count: 18
Preview not available

## Abstract

A sequential control process is a dynamic system which is observed periodically and classified into one of a number of possible states. After each observation one of a number of possible decisions is made. These decisions are the "control"; they determine the chance laws of the system. A replacement process is a control process with an additional special action, called replacement, which instantaneously returns the system to some initial state. Let X denote the state space of the system, assumed to be a Borel subset of finite dimensional Euclidean space. The case where X is finite has been treated by Derman [8], and thus X is considered infinite here. Let B be the σ-algebra of Borel sets in X. Let {Xt; t = 0, 1, 2, ⋯} be the sequence of states and {Δt; t = 0, 1, 2, ⋯} be the sequence of decisions. In a replacement problem it is assumed that there is a distinguished state x0 ε X with X0 = x0 with probability one. For any time t let St be the history of states and decisions up to and including time t. Let A be the set of possible actions, excluding replacement, where: A1⚬ It is assumed that the action space A is a finite set with n((A) elements. Since A is finite, assume A = {1, 2, ⋯, n(A)}. Let $k_0 \not\varepsilon \mathbf{A}$ denote the replacement action. The action k0 instantaneously returns the system to state x0, and it may be followed by some action k ε A which "acts on" the state x0. The pair (k0, k) itself constitutes a possible action. A decision at time t is either a choice of an element k ε A or a choice of a pair (k0, k) with k ε A. Let A0 be the total action space, where: $\mathbf{A}_0 = \mathbf{A} \cup \{(k_0, k); k \varepsilon \mathbf{A}\}.$ There are 2n(A) elements in A0. Let $\Xi = \{\xi; \xi = \langle\xi_1, \cdots, \xi_{2n(\mathbf{A})}\rangle, \xi_j \geqq 0, \sum\xi_j = 1\}$ be the simplex of all probability distributions on A0. A sequential control rule is a function $D(s_{t - 1}, x) = \langle D_1(s_{t - 1}, x), \cdots, D_{2n(\mathbf{A})} (s_{t - 1}, x)\rangle$ of histories st - 1 and present states x with values in Ξ. The interpretation is: At a history of St - 1 = st - 1 and a present state Xt = x, decision j ε A0 is taken with probability Dj(st - 1, x). In order that the integrals later to be written have meaning it is necessary to restrict attention to control rules D(st - 1, x) which are Biare functions of their arguments. Let R be the space of all such control rules. A sequential control process is not specified until a "law of motion" is given. A2⚬ It is assumed that for every x ε X and k ε A there exists a probability measure Q(·; x, k) on B such that for some version Pr{Xt + 1 ε B ∣ St - 1, Xt = x, Δt = k} = Q(B; x, k); for every B ε B and history St - 1. For every B ε B and k ε A, Q(B; ·, k) is assumed to be a Baire function on X. It is assumed that Q(·, x, k) is absolutely continuous with respect to some σ-finite measure μ on B, and possessing a density q(·, x, k), also assumed to be a Baire function in x. Since X0 = x0 a.s., once a rule R ε R is specified, the sequences {Xt, t = 0, 1, 2, ⋯} and {(Xt, Δt); t = 0, 1, 2, ⋯} are stochastic processes. The previous Assumption A2⚬ imposes a structure similar to that of a Markov process in that the law of motion does not depend on the past history, but only on the present state. In a manner similar to Derman [9], the process {(Xt, Δt); t = 0, 1, 2, ⋯} will be called a Markovian sequential replacement process. It is not true that {Xt; t = 0, 1, ⋯} nor even {(Xt, Δt); t = 0, 1, ⋯} will always be Markov processes; whether they are or not will depend on the rule R. Two assumptions particular to the development in this paper and insuring the ergodicity of the process are: A3⚬ For every x ε X and k ε A it is assumed that $\lim_{x' \rightarrow x} \int |q(y; x, k) - q(y; x', k)| \mu(dy) = 0.$ A4⚬ For every compact set $G \subset \mathbf{X}$ it is assumed that $\sup_{x \varepsilon G}\int_G q(y; x, k) \mu (dy) < 1$ for all k ε A. The last assumption, A4⚬, is stronger than needed, as may be seen in the examples in Section 4. However, it is easily verified and seems natural in many applications of the theory. Let w(x, k) be the immediate cost whenever the system is in state x ε X and decision k ε A is made. It often occurs that the cost in an actual situation is a random variable whose distribution is determined by knowledge of the state and decision. In such a case, with some loss in generality, attention is restricted to w(x, k) representing the expected one stage cost under the appropriate distribution. Let K(x) be the cost of replacing a system in state x. If w0(·, ·) is the cost function defined on X × A0 then the relationship is: $w_0(x, k) = w(x, k)\quad\text{for} k \neq k_0$ and $w_0(x, (k_0, k)) = K(x) + w(x_0, k)\quad\text{for} k \varepsilon \mathbf{A}.$ A5⚬ Assume that K(·) is bounded and continuous with 0 ≤ K(x) ≤ M for all x ε X. For every k ε A assume that w(·, k) is a non-negative continuous function on X with $\lim \inf_{x \rightarrow \infty} w(x, k) \gg 0$ (For the limiting operation here, a neighborhood of ∞ is the complement of a compact set.). The notation a ≫ 0 means that a is much greater than zero, but not necessarily infinite. One needs $\lim \inf_{x \rightarrow \infty} w(x, k)$ large enough so that Lemmas 3.2, 3.3 and 3.4 will hold. Intuitively, one needs the cost of continuing sufficiently large for some states so as to ensure that the expected time to a replacement action is finite. It should be noted that $\sup_{x \varepsilon X} \min_{a \varepsilon \mathbf{A}_0} w_0(x, a) \leqq M_0$ where M0 = M + mink ε A w(x0, k). Let Pt(B, a ∣ x, R) = Pr{Xt ε B, Δt = a ∣ X0 = x, R} for B ε B, x ε X and a ε A0. Let the appropriate density be labeled pt(·, · ∣ x, R) where pt(y, a ∣ x, R)μ(dy) = Pr{Xt ε dy, Δt = a ∣ X0 = x, R}. Two common measures of effectiveness of a Markovian sequential decision process are the expected total discounted future cost and the average cost per unit time. The first, abbreviated to "discounted cost" assumes a discount factor α ε (0, 1), with the interpretation that a unit of value n periods hence has a present value of αn. For a starting state of X0 = x0 the objective is to choose a rule R so as to minimize ψ(x0, α, R) = ∑∞ t = 0 αt ∫ x∑a ε A0 w0(x, a)pt(x, a ∣ x0, R)μ(dx). The second criterion, abbreviated to "average cost" examines the function $\varphi(x_0, R) = \lim \inf_{T \rightarrow \infty} T^{-1} \sum^{T - 1}_{t = 0} \int_{\mathbf{X}} \sum_{a \varepsilon \mathbf{A}_0} w_0(x, a)p_t(x, a \mid x_0, R)\mu(dx)$ Section 2 presents the solution of the problem under the discounted cost measure. Building upon the work of Blackwell [4] and Karlin [12], Derman [9] has shown that an optimal non-randomized stationary rule exists for the case where X is denumerable. Blackwell [5] recently has given a complete discussion of the general use. The rule is characterized by a functional equation of the dynamic programming type. Iterative methods for solving such functional equations are now almost commonplace. Section 3 uses the known results in the discounted cost model: (a) to show the existence of a non-randomized stationary solution in the average cost case, (b) to show the existence of a functional equation characterizing the solution in the average cost case, and (c) to show that the average cost solution is the limit, in some sense, of the discounted cost solutions as the discount factor approaches unity. Section 4 presents some applications of the theory. The attempt is to show how the work of several authors fits into this general theory of control of replacement processes. For example, while supporting one claim in a quality control paper by Girshick and Rubin [10], the theory also provides a counter example for another of their claims.

• 1677
• 1678
• 1679
• 1680
• 1681
• 1682
• 1683
• 1684
• 1685
• 1686
• 1687
• 1688
• 1689
• 1690
• 1691
• 1692
• 1693
• 1694