E‐Article

Entropy Maximization and the Spatial Distribution of Species

Bart Haegeman1 and Rampal S. Etienne2,*

1. INRIA (Institut National de Recherche en Informatique et en Automatique) Sophia Antipolis‐Mediterranée, Research Team MERE (Modélisation et Ressources en Eau), Unité Mixte de Recherche Systems Analysis and Biometrics, 2 place Pierre Viala, 34060 Montpellier, France;

2. Community and Conservation Ecology Group, Centre for Ecological and Evolutionary Studies, University of Groningen, P.O. Box 14, 9750 AA Haren, The Netherlands

Abstract:

Entropy maximization (EM, also known as MaxEnt) is a general inference procedure that originated in statistical mechanics. It has been applied recently to predict ecological patterns, such as species abundance distributions and species‐area relationships. It is well known in physics that the EM result strongly depends on how elementary configurations are described. Here we argue that the same issue is also of crucial importance for EM applications in ecology. To illustrate this, we focus on the EM prediction of species‐level spatial abundance distributions. We show that the EM outcome depends on (1) the choice of configuration set, (2) the way constraints are imposed, and (3) the scale on which the EM procedure is applied. By varying these choices in the EM model, we obtain a large range of EM predictions. Interestingly, they correspond to spatial abundance distributions that have been derived previously from mechanistic models. We argue that the appropriate choice of the EM model assumptions is nontrivial and can be determined only by comparison with empirical data.

Submitted November 27, 2008; Accepted October 3, 2009; Electronically published February 18, 2010

Keywords: spatial abundance distribution, scale transformation, prior distribution, random‐placement model, broken stick model, HEAP model.

Introduction

 

Entropy maximization (EM) is an inference technique that originated in statistical mechanics (Jaynes 1957, 2003). The philosophy behind EM inference is to provide the probability distribution (which we denote P(x)) over system configurations (which we denote x) that corresponds best to the available information. Because a probability distribution with higher entropy encodes less information, the probability distribution that corresponds best to the available information, formulated in terms of constraints, can be found by maximizing the entropy subject to these constraints.

Recently, several studies have explored EM inference in ecological problems (Shipley et al. 2006; Banavar and Maritan 2007; Pueyo et al. 2007; Dewar and Porté 2008; Haegeman and Loreau 2008; Harte et al. 2008). Most attention has been paid to EM inference of species abundance distributions (Banavar and Maritan 2007; Pueyo et al. 2007; Dewar and Porté 2008), but Harte et al. (2008) provide an exception: they apply EM to simultaneously predict, with a minimal number of assumptions (constraints), several macroecological patterns, such as species abundance distributions and species‐level spatial abundance distributions, that together give species‐area relationships. In this article, we zoom in on EM inference of spatial abundance distributions.

We treat the spatial abundance distribution of a given species in a simple, spatially implicit manner. We divide a spatial region in M cells and then describe the arrangement of N individuals over these cells. This description is spatially implicit because we do not take into account the correlations that might exist between neighboring cells. As a consequence, all abundance distributions that we consider are unchanged with random permutation of the cells. Note that this is also the approach taken in different mechanistic models (Coleman 1981; Harte et al. 2005, 2008; Conlisk et al. 2007).

However, this description of spatial abundance distribution by itself does not suffice to apply the EM algorithm. The complete specification of an EM problem requires a number of additional assumptions. These assumptions might appear incidental on first sight, but we show here that they have a major effect on the EM prediction: it turns out that there is not a single EM prediction for the spatial abundance distribution but a myriad of distributions that are obtained under various assumptions, none of which seems to stand out as the most plausible. Interestingly, most of these distributions were obtained as the outcome of process‐based models, including the discrete broken stick model (MacArthur 1960), the random‐placement model (Coleman 1981), and the single‐division model (Conlisk et al. 2007).

To introduce the general framework of the EM approach, we distinguish the formulation of the EM problem from its solution. Whereas the solution of an EM problem can be found with a purely technical recipe, the formulation of the EM problem requires a number of assumptions that can have an important effect on the solution. We first present different EM assumptions for the prediction of the abundance distribution over spatial cells. The corresponding EM problems are solved systematically in the subsequent section.

Formulating the EM Problem for Spatial Distributions

 

Formulating the EM problem consists of three steps (Haegeman and Loreau 2009): specifying (1) system configurations, (2) the prior distribution over the system configurations, and (3) the constraints on the system configurations. We will discuss these three steps in order.

Specifying the System Configurations

An EM problem formulation starts with specification of the system configurations. For the case of spatial abundance distribution, there are two obvious and simple ways to do so. The first is to specify for every individual to which cell it belongs. Such a system configuration is denoted by a vector m: its nth component, mn, gives the cell to which the nth individual belongs. The number of components of m gives the number of individuals,

The second way to specify the system configuration is to specify for every cell the number of individuals it contains. Such a system configurations is denoted by a vector n: its mth component, nm, gives the number of individuals belonging to cell m. The number of components of n equals the number of cells M. The number of individuals in configuration n is

We consider both configurations m and n, and we will show that the seemingly innocent choice between them can lead to completely different EM predictions. We will call them labeled and unlabeled configurations, respectively, because configurations m presuppose that individuals are labeled (we know for each individual to which cell it belongs), whereas configurations n do not require labels for individuals (we merely count the number of individuals in each cell; their identities are lost).

We illustrate the difference between labeled and unlabeled system configurations for a region with $M=2$ cells and $N=3$ individuals. An example of a labeled configuration is the vector $\mathbf{m}=( 1,\: 1,\: 2) $ , meaning that the first individual belongs to cell 1, the second individual belongs to cell 1, and the third individual belongs to cell 2. There are eight labeled configurations with $M=2$ and $N=3$ : (1, 1, 1), (1, 1, 2), (1, 2, 1), (2, 1, 1), (1, 2, 2), (2, 1, 2), (2, 2, 1), and (2, 2, 2).

An example of an unlabeled configuration is the vector $\mathbf{n}=( 2,\: 1) $ , meaning that the first cell contains two individuals, and the second cell contains one individual. There are four unlabeled configurations with $M=2$ and $N=3$ : (3, 0), (2, 1), (1, 2), and (0, 3).

Clearly, there are more labeled than unlabeled configurations. In fact, every labeled configuration corresponds to exactly one unlabeled configuration, but a given unlabeled configuration can correspond to several labeled configurations. In this example, the link between the two types of configurations is Note that not all unlabeled configurations correspond to the same number of labeled configurations: vector $\mathbf{n}=( 3,\: 0) $ has one labeled configuration, whereas vector $\mathbf{n}=( 2,\: 1) $ has three. In general, the number ℳ(n) of labeled configurations that correspond to a given unlabeled configuration n is given by a multinomial coefficient,

Specifying the Prior Distribution over the System Configurations

Next, one must specify a prior distribution on the set of system configurations. This prior distribution corresponds to the probabilities one would attribute to the configurations if no constraints were imposed on the system. Typically, an uninformative prior is chosen, giving equal probability to all configurations. We denote the prior distribution P0(x), where x represents the configuration (either labeled, $\mathbf{x}=\mathbf{m}$ , or unlabeled, $\mathbf{x}=\mathbf{n}$ ). For an uninformative prior, the distribution P0(x) is simply a constant.

We could, however, choose any distribution for the prior. A particular choice for the labeled configurations m could be where n(m) is the unlabeled configuration that corresponds to the labeled configuration m. This prior, defined on system configurations m, makes all unlabeled configurations n equally probable. We thus observe that specifying the system configuration and specifying the prior are in some sense interchangeable.

Specifying the Constraints on the System Configurations

Finally, we have to specify the constraints we want to take into account in the EM problem. We consider two types of constraints: hard and soft constraints. A hard constraint restricts the set of system configurations to a particular subset, thus ruling out configurations that fall outside this subset. In other words, all configurations not satisfying the constraint have zero probability. For spatial distributions, one could consider only configurations with a specified number of individuals N. This can be formulated as with the function N(m) for labeled configurations given in equation (1) and the function N(n) for unlabeled configurations given in equation (2).

A soft constraint does not restrict the system configurations but acts on statistics of the system configurations. For spatial distributions, one could impose that the mean number of individuals in the EM solution has a specified number of individuals N. This can be formulated as where P(x) is the EM probability distribution we are trying to solve for and N(x) is given by equation (1) for labeled configurations m and equation (2) for unlabeled configurations n. Thus, a soft constraint does not completely rule out some configurations but effectively assigns differential nonzero probabilities to them.

General Recipe for Solving the EM Problem

 

Once the set of system configurations, the prior distribution, and the hard and/or soft constraints have been specified, the EM problem can readily be solved. The solution consists of finding the probability distribution P(x) that maximizes the relative entropy $H( P\vert P_{0}) $ subject to the constraints. The relative entropy with respect to the prior distribution P0(x) is given by For an uninformative prior (i.e., P0(x) independent of x), maximizing relative entropy $H( P\vert P_{0}) $ is equivalent to maximizing Shannon entropy H(P),

Solution methods for this maximization problem are well known. If all constraints are of the hard type, maximizing (relative) entropy is particularly simple. Configurations that satisfy all constraints have a probability proportional to the prior distribution P0(x); configurations that do not satisfy all constraints have zero probability. Hence, the EM solution reads if x satisfies all constraints, where Z is a normalization constant given by where the sum is over all vectors x that satisfy the hard constraint. This guarantees that

If there are soft constraints, one can use the technique of Lagrange multipliers. For the soft constraint (6), the EM solution can be written in terms of a corresponding Lagrange multiplier α, if x satisfies all hard constraints, with the normalization constant given by where the sum runs over vectors x that satisfy all hard constraints. The Lagrange multiplier α must be determined by imposing the soft constraint (6). The latter constraint can be rewritten as which often allows one to solve explicitly for α.

It should be noted that the same EM problem can be written in different ways: different combinations of system configurations, prior distribution, and constraints can yield equivalent EM problems. A first example concerns exchangeability of constraints and system configurations: instead of imposing a hard constraint on the set of system configurations, one could equivalently start out with a smaller set of system configurations. A second, less trivial, example concerns the exchangeability of the prior distribution P0 and the specification of the system configurations, to which we already alluded in the previous section: the EM problem formulated in terms of labeled configurations m with an uninformative prior $P_{0}( \mathbf{m}) \propto 1$ is equivalent to the EM problem formulated in terms of unlabeled configurations n with an informative prior $P_{0}( \mathbf{n}) \propto \mathstrut{\cal M} ( \mathbf{n}) $ . Thus, an alternative definition of the set of system configurations can be mimicked by introducing an appropriate prior distribution.

The latter fact is particularly important for the next section. It implies that the EM problem with labeled configurations and an uninformative prior is not equivalent to the EM problem with unlabeled configurations and an uninformative prior. The ratio between the two EM solutions is given by the multiplicity factor (eq. [3]). To see that this factor drastically modifies the EM distribution, consider the example of $N=6$ individuals distributed over $M=3$ cells. The multiplicity factors for the most and least even distributions, $\mathbf{n}_{1}=( 2,\: 2,\: 2) $ and $\mathbf{n}_{2}=( 6,\: 0,\: 0) $ , respectively, are $\mathstrut{\cal M} ( \mathbf{n}_{1}) =90$ and $\mathstrut{\cal M} ( \mathbf{n}_{2}) =1$ . In other words, whereas the EM problem in terms of unlabeled configurations assigns the same prior probability to n1 and n2, the EM problem in terms of labeled configurations assumes that the even configuration n1 is a priori 90 times as probable as the clustered configuration n2. Therefore, these two EM problems will lead to very different predictions (the EM prediction for labeled configurations will give relatively more weight to evenly distributed configurations than will the EM prediction for unlabeled configurations). Obviously, these differences become even more pronounced for larger M and N.

EM Solutions for Spatial Distributions

 

In this section, we study all four EM problems for spatial distributions that result from different combinations of (1) working with either labeled or unlabeled configurations and (2) using either hard or soft constraints to impose a total number of individuals. In all cases we assume an uninformative prior on the system configuration. Here we summarize the results and discuss similarities and differences between different EM solutions; we refer to the appendixes for the formal derivations.

Labeled Configurations with a Hard Constraint

With an uninformative prior on labeled configurations, all vectors m have equal probability. The EM problem for labeled configurations with a hard constraint is solved in appendix A. The resulting probability distribution for unlabeled configurations n is (see eq. [A3]) where P(lab, hard) denotes the probability distribution that results from applying the EM procedure for labeled configurations with a hard constraint.

Equation (10) is a joint distribution for the abundances of all cells, which we call a “multicell abundance distribution.” For this EM problem, the multicell abundance distribution is multinomial: all individuals are placed independently in one of the M cells, and every cell has the same probability $1/M$ that a given individual is assigned to that cell. This is the spatial abundance distribution for the random‐placement (RP) model (Coleman 1981).

From equation (10) we can compute the marginal distribution for the abundance of any one cell, which we call the “one‐cell abundance distribution.” It is given by for $n_{1}\leq N$ , which is a binomial distribution.

Labeled Configurations with a Soft Constraint

The EM problem for labeled configurations with a soft constraint is solved in appendix A. The resulting probability distribution for unlabeled configurations n is (see eq. [A6]) where P(lab, soft) denotes the probability distribution that results from applying the EM procedure for labeled configurations with a soft constraint.

The one‐cell abundance distribution is the marginal of the multicell abundance distribution (12) and is given by which is a geometric distribution with mean $N/M$ . The distribution for the total number of individuals N(n) is given by which is a geometric distribution with mean N (note that the soft constraint requires that the mean equals N). The link with the hard‐constraint solution (10) can be made by conditioning on the total number of individuals,

Unlabeled Configurations with a Hard Constraint

With an uninformative prior on unlabeled configurations, all vectors n have equal probability. The EM problem for unlabeled configurations with hard constraint is solved in appendix B. The resulting probability distribution for unlabeled configurations n is (see eq. [B2]) where P(unl, hard) denotes the probability distribution that results from applying the EM procedure for unlabeled configurations with a hard constraint.

The multicell abundance distribution (16) gives, by construction, equal weight to all unlabeled configurations. Specifying an unlabeled configuration for M cells and N individuals is equivalent to splitting a community of N individuals into M parts. The idea that all such splits are equally probable is reminiscent of the discrete broken‐stick (DBS) model (MacArthur 1960; Etienne and Olff 2005) for the distribution of species’ abundances. Distribution (16) can be interpreted as the spatial counterpart of the DBS species abundance distribution, with one main difference: whereas in the species abundance distribution each species has at least one individual, in the spatial abundance distribution cells may be empty. From distribution (16) we can compute the one‐cell abundance distribution (see eq. [B3]) for $n_{1}\leq N$ .

Unlabeled Configurations with a Soft Constraint

The EM problem for labeled configurations with a soft constraint is solved in appendix B. The resulting probability distribution for unlabeled configurations n is (see eq. [B5]) where P(unl, soft) denotes the probability distribution that results from applying the EM procedure for unlabeled configurations with soft constraint.

The multicell abundance distribution (18) has a simple structure: it is the product of independent one‐cell abundance distributions, each of which is given by This is a geometric distribution with mean $N/M$ . The distribution for the total number of individuals N(n) is given by which is a negative binomial distribution. The link with the hard‐constraint solution (16) can be made by conditioning on the total number of individuals,

Link between Hard‐ and Soft‐Constraint Solutions

For both labeled and unlabeled configurations, the EM problems with hard and soft constraints on the number of individuals yield related results. The relationship is given in equations (15) and (21): the soft‐constraint solution conditioned on the total number of individuals $N( \mathbf{n}) =K$ equals the hard‐constraint solution for the total number of individuals K. This conditioning property is generally valid for EM solutions.

The difference between hard‐ and soft‐constraint EM solutions resides in their distribution for the total number of individuals N(n). For the hard constraint, the distribution for N(n) is concentrated at the constraint N. For the soft constraint, we know that the distribution for N(n) has its mean at the constraint N, by construction. If the variation around the mean is small, that is, if the soft‐constraint distribution for N(n) is sharply peaked at N, then the EM solutions for hard and soft constraints are practically equivalent.

The EM solutions for labeled and unlabeled configurations behave quite differently in this respect. For labeled configurations, the distribution $P^{( \mathrm{lab}\,,\,\mathrm{soft}\,) }_{M,\, N}( \mathbf{n}) $ is geometric (see eq. [14]), with a large variation around the mean N. This can be verified by computing the coefficient of variation, which is greater than 1 for all N and M. The EM distributions P(lab, hard) and P(lab, soft) are therefore quite different.

For unlabeled configurations, the distribution $P^{( \mathrm{unl}\,,\,\mathrm{soft}\,) }_{M,\, N}( N( \mathbf{n}) ) $ is a negative binomial (see eq. [20]) and sharply peaked at N. Indeed, In most cases of interest, M and N are large (say, $M> 10$ and $N> 10$ ), and the coefficient of variation is much less than 1. Hence, the EM distributions P(unl, hard) and P(unl, soft) can be considered equivalent.

These conclusions are illustrated in figure 1, which compares the one‐cell abundance distributions for the four EM distributions we have analyzed: $P^{( \mathrm{lab}\,,\,\mathrm{hard}\,) }$ , $P^{( \mathrm{lab}\,,\,\mathrm{soft}\,) }$ , $P^{( \mathrm{unl}\,,\,\mathrm{hard}\,) }$ , and $P^{( \mathrm{unl}\,,\,\mathrm{soft}\,) }$ . Note that the one‐cell abundance distributions $P^{( \mathrm{lab}\,,\,\mathrm{soft}\,) }$ and $P^{( \mathrm{unl}\,,\,\mathrm{soft}\,) }$ are mathematically identical; see equations (13) and (19). Formula (17) for P(unl, hard) is not identical to these, but its curve almost always coincides with the two soft‐constraint solutions. However, distribution P(lab, hard) has a completely different one‐cell abundance distribution; see equation (11).

Figure 1: Comparison between different entropy maximization (EM) predictions for the one‐cell abundance distribution: the EM solution (eq. [11]) for labeled configurations with hard constraint (dash‐dotted line), the EM solution (eq. [13]) for labeled configurations with soft constraint (dotted line), the EM solution (eq. [17]) for unlabeled configurations with hard constraint (dashed line), and the EM solution (eq. [19]) for unlabeled configurations with soft constraint (dotted line; one‐cell abundance distributions P(lab, soft) and P(unl, soft) coincide). Also plotted is the EM solution computed by Harte et al. (2008; solid line). The panels show these distributions for different values of M and N. A, $M=4$ and $N=10$ ; B, $M=16$ and $N=10$ ; C, $M=4$ and $N=100$ ; D, $M=16$ and $N=100$ . The solutions for unlabeled configurations are very close for all values of M and N and are visually indistinguishable for $M=16$ . The solution for labeled configurations with hard constraint is very different.

Open New Window

Scale Dependence of EM Solutions

 

We have shown that several spatial distributions can be obtained from the EM algorithm with different assumptions in the formulation of the EM problem. In this section, we consider the scale on which the EM algorithm is applied. Indeed, the scale of the EM problem, measured by the number of cells M, requires close scrutiny. We investigate whether the outcome of an EM computation depends on the scale on which the problem was formulated. This is particularly important when combining EM distributions on different scales, for example, to compute species‐area relationships (Harte et al. 2008). Distributions on different scales should be combined only if they are consistent. We demonstrate that this condition is not necessarily satisfied by EM distributions.

For a region of fixed size, the number of cells into which the region is partitioned determines the scale of the problem. The larger the number of cells, the finer the scale. We consider two different scales, M1 and M2, and we assume that M1 is the finer scale and is related to the coarser scale M2 by an integer factor $\ell =M_{1}/ M_{2}$ . In other words, a cell on scale M2 consists of ℓ cells on scale M1.

We introduce a scale transformation from the fine scale M1 to the coarse scale M2. For any configuration on scale M1, there is a corresponding configuration on scale M2. However, there are several configurations on scale M2 that correspond to a given configuration on scale M1. To find the probability of a configuration on scale M2, we sum the probabilities of all configurations on scale M1 compatible with the configuration on scale M2.

To illustrate the scale transformation, consider unlabeled configurations with $N=2$ individuals. The fine scale has $M_{1}=4$ cells; the coarse scale has $M_{2}=2$ cells. The scale transformation regroups the first two cells on scale $M_{1}=4$ in the first cell on scale $M_{2}=2$ , and the last two cells on scale $M_{1}=4$ in the second cell on scale $M_{1}=4$ . This leads to the following correspondence:

Scale consistency can then be defined as follows. Apply the EM algorithm on fine scale M1, and use the scale transformation from M1 to M2 to obtain a spatial distribution on coarse scale M2. If the latter distribution corresponds to the distribution obtained by applying the EM algorithm directly on scale M2 then the EM distributions are called “scale consistent.”

In appendix C, we show that for EM problems stated in terms of labeled configurations, the resulting spatial abundance distributions are scale consistent. However, for EM problems stated in terms of unlabeled configurations, the distributions are not scale consistent. As a consequence, a new set of EM distributions can be obtained by, first, applying the EM procedure for unlabeled configurations on scale M1 and, second, computing averages of the EM solution to obtain a consistent probability distribution for configurations on a coarser scale M2. We again distinguish hard and soft constraints for the number of individuals.

Averaged Solution for Unlabeled Configurations with Hard Constraint

The EM problem with averaging and hard constraint is solved in appendix C. The resulting distribution for unlabeled configurations n is (see eq. [C2]) where P(avg, hard) denotes the probability distribution that results (on scale M) from applying the EM procedure (on scale ℓM) with averaging (scale factor ℓ) and hard constraint.

The multicell abundance distribution (22) has been used previously to model spatial abundance distribution (Conlisk et al. 2007). It arises from the so‐called single‐division (SD) model based on certain colonization rules of individuals into cells. On a more abstract level, it is related to the Pólya‐Eggenberger urn scheme (Johnson et al. 1997). Distribution (22) has the marginal one‐cell abundance distribution which is a negative hypergeometric distribution.

Note that EM distributions P(lab, hard), P(lab, soft), P(unl, hard), and P(unl, soft) are uniquely determined by the number of individuals N and the number of cells M. In contrast, the distribution P(avg, hard) has one additional parameter, namely, the factor ℓ of the scale transformation.

Averaged Solution for Unlabeled Configurations with Soft Constraint

The EM problem with averaging and soft constraint is solved in appendix C. The resulting distribution for unlabeled configurations n is (see eq. [C4]) where P(avg, soft) denotes the probability distribution that results (on scale M) from applying the EM procedure (on scale ℓM) with averaging (scale factor ℓ), and with soft constraint.

The multicell abundance distribution (24) has a simple structure: cell abundances are independent, and all have the same one‐cell abundance distribution, which is a negative binomial distribution. The link with the hard‐constraint solution (22) can be made by conditioning on the total number of individuals, Hence, the SD model, given by equation (22), can be interpreted as a product of negative binomial distributions conditioned on the total number of individuals (Conlisk et al. 2007).

Link between Averaged Solutions with Hard and Soft Constraints

Using an argument analogous to that for unlabeled configurations, one can show that the averaged EM solutions with hard and soft constraints are practically equivalent. First, we note that distributions (22) and (24) have the same distribution, conditional on the number of individuals; see equation (26). Second, the soft‐constraint distribution for the number of individuals is sharply peaked at the constraint N. Indeed, the coefficient of variation, is much less than 1 if ℓM and N are large, a condition that is satisfied in most cases of interest.

Link between Solution for Labeled Configurations and Averaged Solution for Unlabeled Configurations

The averaged EM solutions are constructed from the solution of the EM problems with unlabeled configurations. One can verify that EM distributions P(unl, hard) and P(unl, soft) are recovered from P(avg, hard) and P(avg, soft) by setting $\ell =1$ . Here we establish a link between the averaged EM solutions and the solution of the EM problems with labeled configurations.

To do so, we consider the limit $\ell \rightarrow \infty $ . It is shown in appendix C that (see eq. [C7]) However, we also find that (see eq. [C8]), The reason for this asymmetry is that the distributions are the same, depending on the number of individuals (see eq. [C10]), but their distribution for the number of individuals is different. For the first, $\mathrm{\lim}_{\ell \rightarrow \infty }P^{( \mathrm{avg}\,,\,\mathrm{soft}\,) }_{M,\, N,\,\ell }$ , the number of individuals has a Poisson distribution (eq. [C9]); for the second, $P^{( \mathrm{lab}\,,\,\mathrm{soft}\,) }_{M,\, N}( \mathbf{n}) $ , the number of individuals has a geometric distribution (eq. [13]).

We conclude that the family of averaged EM distributions ($P^{( \mathrm{avg}\,,\,\mathrm{hard}\,) }_{M,\, N,\,\ell }( \mathbf{n}) $ and $P^{( \mathrm{avg}\,,\,\mathrm{soft}\,) }_{M,\, N,\,\ell }( \mathbf{n}) $ ), parameterized by the scale factor ℓ, comprises many of the other EM solutions. For $\ell =1$ , we recover the EM solution for labeled configurations with both hard and soft constraints. For $\ell \rightarrow \infty $ , we recover the EM solution for unlabeled configurations with hard constraint but not that with soft constraint. For intermediate values of ℓ, we find interpolating spatial abundance distributions; see figure 2.

Figure 2: One‐cell abundance distributions for different entropy maximization solutions with hard constraint. We consider a species with $N=300$ individuals and a spatial domain with $M=10$ cells. The distribution P(avg, hard) is plotted for different scale factors: $\ell =1$ , 2, 4, 10, and 40 and $\ell \rightarrow \infty $ . Note that P(avg, hard) with $\ell =1$ corresponds to P(lab, hard) and that P(avg, hard) for $\ell \rightarrow \infty $ corresponds to P(lab, hard).

Open New Window

Discussion

 

Entropy maximization (EM) is a mathematical framework that can be used to infer a probability distribution on the set of system configurations, given partial information about the system configuration. Several EM applications in ecology can be imagined and have been studied recently. Here we studied the EM problem for spatial abundance distributions. More precisely, we considered a region divided into a number of cells and derived probability distributions for the arrangement of individuals over the cells without taking into account the spatial location of cells.

We showed that an EM problem formulation requires several assumptions or choices and that the outcome of the EM algorithm depends strongly on these choices. There is not a unique EM prediction for the spatial abundance distribution. On the contrary, we obtained a variety of EM predictions, depending on what might look like details in the EM problem: do we formulate the EM problem in terms of labeled or unlabeled configurations; what prior distribution do we assume; do we impose the number of individuals as a hard or a soft constraint; and on what scale is the EM problem formulated?

The fact that EM allows for a wide range of spatial abundance distributions should not come as a surprise. The EM procedure is an inference technique that depends crucially on the information used in the inference. This information is contained not only in the constraints but also in the way we define system configurations and in the prior distribution over the configurations. Our study indicates that these implicit assumptions should be made explicit in any application of EM procedure, because they can radically change the predicted probability distributions.

An analogous situation exists in physics. Consider a system of N noninteracting particles, each occupying one of M energy levels. This physical system is comparable to our ecological example of distributing N individuals over M spatial cells. Labeled (distinguishable) particles give rise to the classical Maxwell‐Boltzmann (MB) distribution, whereas unlabeled (indistinguishable) particles give rise to the quantum mechanical Bose‐Einstein (BE) distribution. A third distribution exists, the Fermi‐Dirac (FD) distribution, which is also quantum mechanical but has the additional constraint that no more than one particle can be in any one state. In our ecological example, this would mean that one cell cannot contain more than one individual. It is well known that a coarse‐grained description of both BE and FD distributions (i.e., by taking together many quantum mechanical energy levels) tends toward the MB distribution. Similarly, our EM solution for unlabeled species at scale M1 averaged at scale M2 becomes the EM solution for labeled individuals when $M_{1}\rightarrow \infty $ (fig. 1).

The equivalence of the EM solutions with hard and soft constraints is a property that is generally satisfied in statistical mechanics (except in phase transitions). The lack of equivalence between P(lab, hard) and P(lab, soft) seems to be pathological and related to a similar problem in statistical mechanics for classical systems. To fix this problem, one must introduce an appropriate prior distribution, the so‐called Boltzmann counting. In appendix D, we present an alternative computation for the EM problem in terms of labeled configurations, using as a prior distribution the analog of Boltzmann counting. This yields distributions, P(lab, alt, hard) and P(lab, alt, soft), that are practically equivalent. If we accept the replacement of the pathological distribution P(lab, soft) with P(lab, alt, soft), then all EM distributions derived in this article are part of the family of averaged EM distributions P(avg, hard) and P(avg, soft).

Harte et al. (2008)’s EM application for spatial abundance distributions is different from ours. Their EM problem is written directly in terms of one‐cell abundance distributions: their system configuration is simply the abundance in a single cell. They implicitly assume a prior that assigns equal probability to each abundance. Their constraints are (1) that the number of individuals in a cell is smaller than the total number of individuals N in the entire region (a hard constraint because it rules out any configuration with abundance greater than N) and (2) that the mean number of individuals in a cell equals $N/M$ (a soft constraint). The solution for the one‐cell abundance distribution is different from but close to our solutions P(lab, soft), P(unl, hard), and P(unl, soft); see figure 1. One may wonder what multicell abundance distribution underlies the one‐cell abundance distribution of Harte et al. (2008). In appendix E, we solve the EM problem for the multicell abundance distribution under their constraints 1 and 2, and we show that the corresponding one‐cell abundance distribution is identical to that of Harte et al. (2008) but not scale consistent. However, this does not mean that Harte et al. (2008)’s one‐cell abundance cannot be embedded in a scale‐consistent multicell abundance distribution. Marginal probability distributions do not, in general, completely determine the joint probability distributions, so it is possible that a scale‐consistent multicell abundance distribution exists that yields the same one‐cell abundance distribution. The unlabeled configurations that we have studied are all invariant under permutations of the cells, thanks to the fact that spatial location is not taken into account. A multicell abundance distribution that is permutation invariant and scale consistent and has Harte et al. (2008)’s one‐cell abundance distribution does not seem to exist. One must incorporate space to find such a distribution. How such a scale‐consistent multicell abundance distribution should result from a properly formulated spatial EM problem remains an open problem.

Note that the discussion of multicell versus one‐cell abundance distribution has an analogy in neutral theory’s predictions for species abundance distributions (Chave et al. 2006), where sampling formulas have been derived that are multispecies abundance distributions (Etienne 2005, 2007), in contrast to one‐species abundance distributions (Volkov et al. 2003). Sampling formulas are required for a detailed comparison between theory and observation, because there may be several sampling formulas that are compatible with a single one‐species abundance distribution (Chave et al. 2006). Here we have also seen that the same one‐cell abundance distribution can correspond to several multicell abundance distributions: compare the joint distributions (18) and (12), which give rise to the identical marginals (19) and (13). Given the central role of data comparison in the EM modeling approach, multicell abundance distributions are therefore required for more powerful EM applications.

It is remarkable that the simple EM applications we have considered yield spatial abundance distributions that have been obtained previously by studying more detailed and often dynamical mechanistic models. We encountered the RP model as the solution of the EM problem with labeled configurations and hard constraint. The DBS model was found as the solution of the EM problem with unlabeled configurations and soft constraint, while scale transforming the latter distribution yields the SD model. The fact that these distributions can be obtained from simple EM applications might indicate a certain robustness. For example, one can expect that a (weakly) perturbed mechanistic model would lead to the same EM distribution.

In fact, even more previously studied spatial abundance distributions can be written as EM solutions. For example, the model based on the hypothesis of equal allocation probabilities (HEAP; Harte et al. 2005) is related to the unlabeled‐configurations solution P(unl, hard). We have shown that this distribution is not scale consistent, which naturally led us to the averaged distribution P(avg, hard). Similarly, the HEAP model can be interpreted as an iterative averaging approach, applying a scale factor $\ell =2$ in every iteration step. We remark that this iterative approach is not equivalent to the one‐step scale transformation on which our distribution P(avg, hard) is based.

This suggests that almost any reasonable spatial abundance distribution can be written as the solution of an EM problem. The choice of assumptions in the EM problem formulation is indeed wide. One could consider other ways to define system configurations (different from labeled and unlabeled configurations), one could work with informative prior distributions, or other consistency requirements could be imposed. We believe that the present understanding of the problem of allocating individuals to spatial cells does not allow us to decide which EM problem formulation is most appropriate. This should caution ecologists that applying the EM method does not automatically yield useful results; the accuracy of EM predictions can be determined only by comparison with empirical data.

This illustrates both a strength and a weakness of the EM procedure. Entropy maximization applications are based on a minimal number of assumptions (e.g., labeled or unlabeled individuals, scale consistency), yielding an efficient formalism to generate predictions that can be compared with empirical data. However, the EM problem formulation does not directly establish a link with an underlying mechanistic model. In fact, different process‐based models will typically yield similar ecological patterns. Although translating ecological processes into an EM problem formulation (i.e., a set of configurations, a prior distribution, and constraints) can be a nontrivial problem, the EM procedure might develop into a valuable tool to extract from a detailed mechanistic model a minimal set of assumptions that determine the model predictions.

We considered only a subproblem of the spatial distribution of a species in an ecological community. First, we considered only one species at a time, neglecting the effects of other species and their spatial distribution. Second, we considered only the abundance distribution over cells, without taking into account the spatial location of these cells. Stronger correlations can be expected with abundance distributions for nearby cells than with those for distant cells (see Maddux 2004 and Ostling et al. 2004 for a consistency problem related to the spatial structure). In turn, this might influence the predicted species abundance distribution. Whether the EM approach can be usefully applied for several species at once and/or for spatially structured communities is an interesting set of topics for future research.

Acknowledgments

 

We thank J. Harte, M. Loreau, A. Ostling, T. Zillio, and an anonymous reviewer for fruitful discussions and comments. Financial support for R.S.E. was provided by the Netherlands Organisation for Scientific Research (NWO).

Appendix A EM for Labeled Configurations

 

We apply the EM algorithm under the assumption that labeled configurations m are a priori equally probable. We maximize entropy subject to a constraint on the total number of individuals. There are two ways to impose this constraint.

Hard Constraint on N

The first possibility is to restrict the set of configurations to vectors m with the exact number of individuals N, There are no further constraints to impose, so the EM computation is trivial. Configurations m with $N( \mathbf{m}) \neq N$ have zero probability; configurations m with $N( \mathbf{m}) =N$ all have equal probability. As there are MN such configurations, the EM distribution is if $N( \mathbf{m}) =N$ , where P(lab, hard) denotes the probability distribution that results from applying the EM procedure for labeled configurations with “hard” constraint (A1). The probability distribution for labeled configurations m can be transformed into a probability distribution for unlabeled configurations n by using the multiplicity factor (eq. [3]): if $N( \mathbf{n}) =N$ , which is a multinomial distribution.

Soft Constraint on N

The second possibility is to take all configurations m into account, thus including vectors m for which $N( \mathbf{m}) \neq N$ . We require that the mean number of individuals equals N,

We use the technique of Lagrange multipliers to solve the EM problem. We denote the Lagrange multiplier for constraint (A4) α. The EM solution reads where P(lab, soft) denotes the probability distribution that results from applying the EM procedure for labeled configurations with “soft” constraint (30). The normalization constant Z can be calculated as follows: Imposing constraint (A4) yields and we can solve for the Lagrange multiplier α, As a result, which gives the distribution for labeled configurations,

Appendix B EM for Unlabeled Configurations

 

We apply the EM algorithm under the assumption that unlabeled configurations n are a priori equally probable. We maximize entropy subject to the constraints on the total number of individuals. There are two ways to deal with this constraint.

Hard Constraint on N

The hard constraint restricts the set of configurations to vectors n with the exact number of individuals N, There are no further constraints to impose, so the EM computation is trivial. Configurations n with $N( \mathbf{n}) \neq N$ have zero probability; configurations n with $N( \mathbf{n}) =N$ all have equal probability. It is a standard result in combinatorics that there are configurations of the latter type. Hence, if $N( \mathbf{n}) =N$ , where P(unl, hard) denotes the probability distribution that results from applying the EM procedure for unlabeled configurations with hard constraint (B1). From equation (16) we compute the one‐cell abundance distribution, for $n_{1}\leq N$ , because there are configurations that have n1 individuals in one particular cell and $N-n_{1}$ in the remaining $M-1$ cells.

Soft Constraint on N

The soft constraint takes all configurations n into account and requires that the mean number of individuals equals N, We use the technique of Lagrange multipliers to solve the EM problem. We denote the Lagrange multiplier for constraint (B4) α. The EM solution reads where P(unl, soft) denotes the probability distribution that results from applying the EM procedure for unlabeled configurations with soft constraint (B4). The normalization constant Z can be calculated as follows: Imposing constraint (B4) yields and we can solve for the Lagrange multiplier α, As a result,

Appendix C Scale‐Transformed EM Distributions

 

In this appendix, we investigate how the EM solutions for labeled and unlabeled configurations change under scale transformation. This transformation maps an abundance distribution on M1 cells to a distribution on a coarser scale with M2 cells, $M_{2}< M_{1}$ . The scales M1 and M2 are related by an integer scale factor $\ell =M_{1}/M_{2}$ .

EM for Labeled Configurations

Consider the distribution $P^{( \mathrm{lab}\,,\,\mathrm{hard}\,) }_{M_{1},\, N}$ on the fine scale M1. It corresponds to randomly allocating N individuals to M1 cells, each cell having probability $1/M_{1}$ of receiving an individual. To scale transform this distribution, we have to take ℓ cells together, so that individuals are now randomly allocated to $M_{1}/\ell =M_{2}$ regrouped cells, each regrouped cell having probability $\ell /M_{1}=1/M_{2}$ . This is the distribution $P^{( \mathrm{lab}\,,\,\mathrm{hard}\,) }_{M_{2},\, N}$ on the coarse scale M2.

This result can be used to scale transform the distribution $P^{( \mathrm{lab}\,,\,\mathrm{soft}\,) }_{M_{1},\, N}$ . The latter distribution can be written as a combination of $P^{( \mathrm{lab}\,,\,\mathrm{hard}\,) }_{M_{1},\, K}$ for different K. As we have shown above, each of these components is scale transformed to $P^{( \mathrm{lab}\,,\,\mathrm{hard}\,) }_{M_{2},\, K}$ . Moreover, the coefficients of this combination, given by equation (14), do not depend on the scale M1 or M2. Hence, the scale transformation of $P^{( \mathrm{lab}\,,\,\mathrm{soft}\,) }_{M_{1},\, N}$ is a combination of $P^{( \mathrm{lab}\,,\,\mathrm{hard}\,) }_{M_{2},\, K}$ for different K, with coefficients also given by equation (14). This leads to the distribution $P^{( \mathrm{lab}\,,\,\mathrm{soft}\,) }_{M_{2},\, N}$ on scale M2.

EM for Unlabeled Configurations

Consider first the distribution $P^{( \mathrm{unl}\,,\,\mathrm{hard}\,) }_{M_{1},\, N}$ on the fine scale M1. It attributes the same probability to all unlabeled configurations. Hence, the scale‐transformed (on the coarse scale M2) probability for a configuration n is proportional to the number of configurations on scale M2 that are compatible with n. This number is given by because there are configurations that satisfy the condition $\sum_{i=1}^{\ell }n_{i}=n$ . The probability distribution on scale M2 follows directly from the multiplicity factor (C1), where P(avg, hard) denotes the probability distribution that results from (1) applying the EM procedure for unlabeled configurations with hard constraint (B1) on a fine scale and (2) scale transforming the EM solution to a coarser scale.

Next, we compute the scale transformation of the distribution $P^{( \mathrm{unl}\,,\,\mathrm{soft}\,) }_{M_{1},\, N}$ . This multicell abundance distribution equals the product of M1 independent one‐cell (on scale M1) abundance distributions (19). The scale transformation consists of regrouping cells on scale M1 into one cell on scale M2. Hence, the scale‐transformed multicell abundance distribution equals the product of M2 independent one‐cell (on scale M2) abundance distributions. Each factor in this product is given by the convolution of abundance distributions (eq. [19]), leading to a negative binomial distribution, For the multicell abundance distribution, we obtain

Our analysis here is concerned with scaling up from a fine scale M1 to a coarse scale M2, corresponding to integer scale factors $\ell =2,\: 3,\:\ldots $ . The opposite is equally possible: scaling down from a coarse scale M1 to a fine scale M2, corresponding to scale factors This case requires a generalization of previous formulas for noninteger ℓ. Equation (C2) for the hard constraint becomes Equation (C4) for the soft constraint becomes

Link between Averaged and Labeled Configurations Solution

We compute the limit $\ell \rightarrow \infty $ for the averaged distributions $P^{( \mathrm{avg}\,,\,\mathrm{hard}\,) }_{M,\, N,\,\ell }$ and $P^{( \mathrm{avg}\,,\,\mathrm{soft}\,) }_{M,\, N,\,\ell }$ . With the hard constraint, With the soft constraint, This can be interpreted as the combination of a Poisson distribution for the number of individuals, and the distribution for the vector n conditional on the number of individuals,

Appendix D Alternative EM for Labeled Configurations

 

We reconsider the EM problem for labeled configurations. Instead of assuming that all vectors m are a priori equally probable, we assume that the prior probability of a vector m is proportional to $1/N( \mathbf{m}) !$ . This implies that vectors with the same number of individuals are a priori equally probable but that vectors m with a large number of individuals are a priori less probable than vectors m with a smaller number of individuals.

The EM procedure with the latter prior distribution and hard constraint (A1) is identical to the computation leading to $P^{( \mathrm{lab}\,,\,\mathrm{hard}\,) }_{M,\, N}$ (eqq. [A2], [A3]). We consider here the EM problem with soft constraint (A4). Using the Lagrange multiplier α, where P(lab, alt, soft) denotes the probability distribution that results from applying the EM procedure for labeled configurations with alternative prior distribution $1/N( \mathbf{m}) !$ and soft constraint (A4). The normalization constant Z can be calculated as follows: Imposing constraint (A4) yields and we can solve for the Lagrange multiplier α, As a result, which gives the multicell abundance distribution This is exactly distribution (C8): a Poisson distribution for the number of individuals and distribution $P^{( \mathrm{lab}\,,\,\mathrm{hard}\,) }_{M,\, N}$ conditional on the number of individuals. As the Poisson distribution has coefficient of variation $\mathrm{CV}\,=1/N^{1/2}$ , hard and soft constraints are equivalent if N is large.

Appendix E Harte et al. (2008)’s EM Problem

 

We construct an EM problem for the multicell abundance distribution that generalizes Harte et al. (2008)’s EM problem for the one‐cell abundance distribution. The EM problem is formulated in terms of unlabeled configurations n and imposes both a hard and a soft constraint on the number of individuals N. The hard constraint restricts the set of configurations: a configuration n with one or more of its components $n_{m}> N$ has zero probability. The soft constraint states that averaged over the remaining configurations, the mean number of individuals equals N; see equation (B4).

This EM problem can be solved with the technique of Lagrange multipliers. With the Lagrange multiplier for the soft constraint (B4) denoted α, the EM solution reads where P(unl, hard/soft) denotes the probability distribution that results from applying the EM procedure for unlabeled configurations with both hard and soft constraints. The normalization constant Z can be calculated as follows: Imposing constraint (B4) yields

This equation can be solved numerically for the Lagrange multiplier. This is the same equation that Harte et al. (2008) solve to obtain their Lagrange multiplier (see their eq. [B‐5]). As a result, This is a product of one‐cell abundance distributions, each of which is given by for $n_{1}\leq N$ .

This is exactly the one‐cell abundance distribution obtained by Harte et al. (2008; see their eq. [9]). Therefore, our EM problem embeds the one‐cell abundance distribution of Harte et al. (2008) in a multicell abundance distribution.

The EM distribution (E1) is not scale consistent. To see this, consider first the distribution (E1) on the fine scale M1. Configurations n with all components $n_{m}\leq N$ have a nonzero probability. We scale transform this distribution to the coarse scale M2 (scale factor ℓ). The resulting distribution assigns a nonzero probability to configurations n with all components $n_{m}\leq \ell N$ . Next, consider the distribution (E1) obtained by applying EM directly on the coarse scale M2. This distribution has only nonzero probability configurations n with all components $n_{m}\leq N$ . Hence, the scale‐transformed EM distribution and the direct EM distribution are different, and so scale consistency is not satisfied.

Literature Cited

 
Associate Editor: Axel G. Rossberg
Editor: Mark A. McPeek

Notes