If you need an accessible version of this item please contact JSTOR User Support
Mutual Information, Metric Entropy and Cumulative Relative Entropy Risk
David Haussler and Manfred Opper
The Annals of Statistics
Vol. 25, No. 6 (Dec., 1997), pp. 24512492
Published by: Institute of Mathematical Statistics
Stable URL: http://www.jstor.org/stable/2959041
Page Count: 42
You are not currently logged in.
Access your personal account or get JSTOR access through your library or other institution:
If you need an accessible version of this item please contact JSTOR User Support
Abstract
Assume $\{P_\theta: \theta \in \Theta\}$ is a set of probability distributions with a common dominating measure on a complete separable metric space $Y$. A state $\theta^\ast \in \Theta$ is chosen by Nature. A statistician obtains $n$ independent observations $Y_1,\ldots,Y_n$ from $Y$ distributed according to $P_{\theta^\ast}$. For each time $t$ between 1 and $n$, based on the observations $Y_1,\ldots,Y_{t1}$, the statistician produces an estimated distribution $\hat{P}_t$ for $P_{\theta^\ast}$ and suffers a loss $L(P_{\theta\ast}, \hat{P}_t)$. The cumulative risk for the statistician is the average total loss up to time $n$. Of special interest in information theory, data compression, mathematical finance, computational learning theory and statistical mechanics is the special case when the loss $L(P_{\theta^\ast}, \hat{P}_t)$ is the relative entropy between the true distribution $P_{\theta^\ast}$ and the estimated distribution $\hat{P}_t$. Here the cumulative Bayes risk from time 1 to $n$ is the mutual information between the random parameter $\Theta^\ast$ and the observations $Y_1,\ldots,Y_n$. New bounds on this mutual information are given in terms of the Laplace transform of the Hellinger distance between pairs of distributions indexed by parameters in $\Theta$. From these, bounds on the cumulative minimax risk are given in terms of the metric entropy of $\Theta$ with respect to the Hellinger distance. The assumptions required for these bounds are very general and do not depend on the choice of the dominating measure. They apply to both finite and infinitedimensional $\Theta$. They apply in some cases where $Y$ is infinite dimensional, in some cases where $Y$ is not compact, in some cases where the distributions are not smooth and in some parametric cases where asymptotic normality of the posterior distribution fails.
Page Thumbnails

2451

2452

2453

2454

2455

2456

2457

2458

2459

2460

2461

2462

2463

2464

2465

2466

2467

2468

2469

2470

2471

2472

2473

2474

2475

2476

2477

2478

2479

2480

2481

2482

2483

2484

2485

2486

2487

2488

2489

2490

2491

2492
The Annals of Statistics © 1997 Institute of Mathematical Statistics