## Abstract

*Q*_{ST} is a standardized measure of the genetic differentiation of a quantitative trait among populations. The distribution of *Q*_{ST}'s for neutral traits can be predicted from the *F*_{ST} for neutral marker loci. To test for the neutral differentiation of a quantitative trait among populations, it is necessary to ask whether the *Q*_{ST} of that trait is in the tail of the probability distribution of neutral traits. This neutral distribution can be estimated using the Lewontin–Krakauer distribution and the *F*_{ST} from a relatively small number of marker loci. We develop a simulation method to test whether the *Q*_{ST} of a given trait is consistent with the null hypothesis of selective neutrality over space. The method is most powerful with small mean *F*_{ST}, strong selection, and a large number (>10) of measured populations. The power and type I error rate of the new method are far superior to the traditional method of comparing *Q*_{ST} and *F*_{ST}.

IN 1993, Spitze (1993) and Prout and Barker (1993) introduced *Q*_{ST}, a quantitative genetic analog of Wright's *F*_{ST}. Just as *F*_{ST} gives a standardized measure of the genetic differentiation among populations for a genetic locus, *Q*_{ST} measures the amount of genetic variance among populations relative to the total genetic variance. In the years since, *Q*_{ST} has been frequently used to test for the effects of spatially divergent (or less commonly, spatially uniform) selection (see reviews in Lynch *et al.* 1999; Merilä and Crnokrak 2001; McKay and Latta 2002; Howe *et al.* 2003; Leinonen *et al*. 2008; Whitlock 2008). In principle, the average *Q*_{ST} of a neutral additive quantitative trait is expected to be equal to the mean value of *F*_{ST} for neutral genetic loci. *F*_{ST} can be readily measured on commonly available genetic markers, and *Q*_{ST} can be measured as well with an appropriate breeding design in a common-garden setting. As a result, *Q*_{ST} promises to be an index of the effect of selection on the quantitative trait. If *Q*_{ST} is higher than *F*_{ST}, then this is taken as evidence of spatially divergent selection on the trait. If *Q*_{ST} is much smaller than *F*_{ST}, then this has been taken as evidence of spatially uniform stabilizing selection, which makes the trait diverge less than expected by chance.

The comparison with *F*_{ST} is essential to rule out genetic drift as an alternative mechanism for phenotypic divergence among populations. Because finite populations may diverge genetically in the absence of selection, divergence must be greater than expected by drift alone if we are to conclusively demonstrate that divergent selection has played a role in genetic differentiation among populations. Therefore it has become common practice to use *F*_{ST} of putatively neutral markers as a control for the effects of genetic drift and to compare observed *Q*_{ST} values for traits to these neutral *F*_{ST} values.

These comparisons follow two separate methods, to address related but distinct questions. First, many studies of quantitative genetic differentiation measure the *Q*_{ST} of many traits and the *F*_{ST} of many loci, followed by a comparison of the mean *Q*_{ST} to the mean *F*_{ST}. Such a comparison may judge whether the conditions are suitable in that species for local adaptation, that is, whether selective differences between populations are large enough relative to gene flow to allow adaptive differentiation (Whitlock 2008). We do not consider this sort of comparison in this article.

The other type of comparison asks whether the *Q*_{ST} of a single trait is greater than expected by drift, as measured by *F*_{ST}. This type of comparison is most common, but it is statistically difficult. Unfortunately, as emphasized in a recent review by Whitlock (2008), there is great variation in the expected *F*_{ST} among neutral loci and among the *Q*_{ST} of different neutral traits (see Figure 1). The majority of this variation results from evolutionary differences between loci and not sampling error in the observations. Rogers and Harpending (1983) imply that the distribution of *Q*_{ST} of a single neutral trait should be approximately equivalent to that for *F*_{ST} of a single neutral locus, and this has been confirmed by simulation for traits determined by additive loci compared to biallelic marker loci (Whitlock 2008). The two distributions are similar, but there is great heterogeneity among traits or loci. As a result, to show that selection is acting on a trait, it is necessary to show that the value of *Q*_{ST} has a low probability of being observed given the distribution of neutral *Q*_{ST}.

Comparing *Q*_{ST} to the distribution inferred from *F*_{ST} is difficult for two reasons. First, typical data sets rarely include enough loci to directly infer the distribution of *F*_{ST} without extra inferential steps. In our approach, we use the distribution of *Q*_{ST} predicted from the mean *F*_{ST} and the χ^{2} distribution by Lewontin and Krakauer (1973) to bridge this gap. Whitlock (2008) has shown that this distribution is appropriate for nearly all realistic situations for traits determined by additive genetic effects. Second, *Q*_{ST} for a trait is rarely measured with high precision, so the position of a given estimated *Q*_{ST} value in the distribution cannot be known without error.

To test the null hypothesis that the spatial distribution of a particular trait is not affected by selection, we wish to compare the observed of that trait (marked with a hat to indicate it is an estimate) to the distribution of *Q*_{ST} expected for neutral traits. Unfortunately, calculating the distribution of *Q*_{ST} for neutral traits is not straightforward, because the estimate of *Q*_{ST} for a particular trait is variable for several reasons. The estimate of *Q*_{ST} is subject to measurement error, caused by the finite samples of families and individuals in the quantitative genetic experiment. These cause error in the estimate of the additive genetic variance within populations (*V*_{A,within}) and the genetic variance among populations (*V*_{G,among}), which translate into error of the estimate of *Q*_{ST}. In addition, there is another source of variation in *Q*_{ST} among neutral traits, caused by the idiosyncrasies of the evolutionary process in each local population in the study. The true value of *Q*_{ST} for the set of populations being studied can vary tremendously around its expectation, even for neutral traits, because by chance a finite set of populations may drift in a similar direction (Whitlock 2008). As a result, measurements of *Q*_{ST} can vary because of both statistical and evolutionary variation.

Fortunately, these two sources of variation are fairly well understood individually. The sampling error for the estimates of the variance components can be estimated from standard approaches, and this variation can be well approximated using information from the mean squares of the analysis of the breeding experiment (O'Hara and Merilä 2005). The variation in neutral *Q*_{ST} that results from heterogeneity of evolutionary history can be approximated by the Lewontin–Krakauer distribution (Lewontin and Krakauer 1973), if information is available on the mean *Q*_{ST} of neutral traits (Whitlock 2008). This approximation does not depend on the demographic details of the populations in question (Whitlock 2008), but the effects of deviations from assumptions of additive gene effect have not yet been tested. The mean of the distribution of values of *Q*_{ST} for neutral traits is usually not known, but fortunately the mean of the distribution of *F*_{ST} of neutral loci is expected to be approximately equal to the mean *Q*_{ST} of neutral traits (Spitze 1993), and this does not depend on demographic details (Whitlock 1999). Therefore the mean *F*_{ST} measured from a series of genetic markers thought to be selectively neutral can be combined with the Lewontin–Krakauer distribution to predict the distribution of true neutral *Q*_{ST} across the range of possible evolutionary trajectories.

Given that the mean value of of neutral traits is expected to equal the mean *F*_{ST} of neutral markers under certain assumptions (discussed later), we will use as a test statistic and compare the observed quantity to the zero value proposed by the null hypothesis. We will use a traditional hypothesis testing approach, which means that we need to specify the sampling distribution of under the assumption of neutrality. Traditionally, the sampling distribution of is inferred from the data on the trait itself, for example, using bootstrapping to infer the sampling distribution. This is appropriate when calculating a confidence interval for *Q*_{ST} but is a biased measure of the sampling variance of neutral *Q*_{ST}. The variance of the sampling distribution of varies with its expected value; larger values of true *Q*_{ST} have more variable sampling distributions than traits with smaller true *Q*_{ST}. This association between *Q*_{ST} and its sampling error is quite strong, as shown in Figure 2. As a result, if the sampling properties of neutral are inferred from a trait with high *Q*_{ST}, the estimate of the variance of the null distribution will be too high, and the hypothesis test comparing to *F*_{ST} will be conservative. On the other hand, if a low *Q*_{ST} is used to estimate the variance of the null distribution, the estimated error will be too small, and the test will reject true null hypotheses too often.

We address this problem by using *F*_{ST} from putatively neutral maker loci in combination with estimates of the additive genetic variance within populations to predict the sampling variance that would be expected for the *Q*_{ST} of a neutral trait. We show that the power and type I error rate of this test are greatly superior to traditional methods.

## METHODS

#### Testing neutrality:

To generate the null distribution of , we use a parametric simulation approach. To calculate a value from data, we need estimates of three quantities: , *V*_{A,within}, and *V*_{G,among}. To calculate the null distribution, we simulate random sampling for each of these quantities under the assumption that the null hypothesis that *Q*_{ST} equals is true. We calculate from the simulated values, and after repeating this 1000 times, we generate the sampling distribution of assuming the null hypothesis.

is calculated from marker loci; we use the Weir and Cockerham (1984) method in our test calculations. To simulate the sampling error in estimates of , for each replicate simulation we randomly sample with replacement from the marker loci until the number of loci in the simulated data set equals the number of loci in the real data set. Mean *F*_{ST} is calculated from these sampled loci using the method of Weir and Cockerham (1984), and the observed value of their θ is used as the simulated value.

V_{A,within} is calculated from a quantitative genetic breeding design. There are several suitable experimental designs for such estimates. In this article we assume that the additive genetic variance is estimated by a half-sib design, but the approach could easily be modified for other designs. *V*_{A,within} can be estimated from four times the variance among sires; and to estimate the variance among sires we need the mean squares of sires (MS_{sires}) and the mean squares of dams (MS_{dams}). To simulate estimates of *V*_{A,within}, we use an approach analogous to a parametric bootstrap (O'Hara and Merilä 2005). As tested by O'Hara and Merilä (2005), and should be χ^{2} distributed, where d.f. represents the degrees of freedom associated with a particular level and the overbar indicates the true value of the mean square. Therefore by multiplying the estimated . times a random number from a χ^{2} distribution for each of sires and dams we can simulate the sampling distribution of these quantities and therefore of *V*_{A,within}. This procedure is implemented exactly as the parametric bootstrap in O'Hara and Merilä (2005), except to avoid a strong source of bias we do not constrain variance component estimates to be positive.

*V*_{G,among} is calculated from the variance among populations in the mean value of the trait when the organisms are grown in a common environment. The novel aspect of our design comes from how the sampling of *V*_{G,among} is simulated. As mentioned in the introduction, the sampling variance for *V*_{G,among} is correlated with the true value of *V*_{G,among}, and therefore if the null hypothesis is true but *V*_{G,among} incorrectly appears high by sampling error, the estimate of its sampling distribution will also be estimated poorly. If we were only estimating the value of *Q*_{ST} itself, this would pose no real problems, but because we are trying to compare *Q*_{ST} to the neutral expectation, it can be a real source of bias in the calculations. Our solution is to simulate the sampling distribution of *V*_{G,among} assuming that the null hypothesis is true. We therefore calculate the value of *V*_{G,among} that would be expected given the observed and *V*_{A,within}. Given that *Q*_{ST} is defined as and that for neutral traits and neutral loci the average values of *Q*_{ST} and *F*_{ST} are approximately equal, we can find the expected value of *V*_{G,among} under neutrality to beTo simulate the sampling distribution around this expectation, we again assumed that the distribution of trait means among populations follows a normal distribution and multiply times a random number drawn from a χ^{2} distribution with degrees of freedom equal to the number of populations (num_{pops}) minus one. This sampling procedure is the same as assumed by the Lewontin–Krakauer distribution shown to work well to approximate the distribution of *Q*_{ST} under a variety of demographic circumstances (Whitlock 2008). Simulating the sampling error in this way is identical to the approach taken by O'Hara and Merilä (2005) in their parametric bootstrapping, except for using the expected value of *V*_{G,among} calculated from *F*_{ST} instead of the observed *V*_{G,among}.

For a given hypothesis test using a specific data set, we generate 1000 simulated estimates of . For each simulation, , *V*_{A,within}, and *V*_{G,among} are randomly drawn as specified above, and is calculated from these simulated values. The distribution of these 1000 simulated values is the null distribution of the hypothesis test. Therefore by comparing the quantile of the observed value of to the simulated distribution, we may determine the *P*-value of the hypothesis test of neutrality.

Supporting information, File S1 includes an R program to implement this procedure.

#### Simulations:

We tested the method using simulations conducted with the population genetics simulation software *Nemo* (guillaume and Rougemont 2006) updated to include quantitative traits. Neutral marker loci were simulated with 100 biallelic loci, with mutation rates of 10^{−5} in either direction. One hundred loci potentially affected the quantitative traits. Mutation was based on an infinite allele model, where the allelic effect of an allele was, if mutated, changed by a factor randomly selected from a Gaussian distribution with genomic mutational variance equal to 0.001. Mutation rates for the quantitative trait loci were set at 10^{−5}. Each of 20 local populations had an effective population size of 500 diploid individuals, and the migration rate among populations varied from *m* = 0.05 to *m* = 0.001 to produce different *F*_{ST} values, ranging from approximately *F*_{ST} = 0.01 to *F*_{ST} = 0.3. Measurements were taken on the populations after 50,000 generations (or 25,000 generations for the neutral cases), allowing the populations to reach an approximate equilibrium before sampling. The of 10,000 traits was simulated for the neutral traits and 100 for each set of parameters with selection.

In addition to the island model calculations that make the bulk of the simulation tests, we also simulated a one-dimensional, circular stepping-stone model with 60 local populations. Simulations with *F*_{ST} = 0.04 were performed, corresponding to a migration rate of 0.12. Migration occurred only between adjacent (left and right) populations in the stepping-stone model, and at most, every third population was sampled for *F*_{ST} and the *Q*_{ST} calculations, as suggested by Beaumont and Nichols (1996) and Whitlock (2008). For the heterogeneous selection cases, the populations were alternatively assigned to habitats in groups of five.

In some simulations, the quantitative trait was selectively neutral, to allow tests of the type I error rates of the method. In other simulations, the quantitative trait was subjected to either uniform stabilizing selection (for which all local populations had the same optimum with Gaussian selection with *V*_{S} = 5) or heterogeneous selection (for which the selective optimum for half of the local populations was different from the optimum in the other half of the populations.) The strength of selection for the heterogeneous environment case was calculated such that a perfectly adapted individual on one environment would have a 5 or 50% reduction in fitness in the other selective environment in the island or stepping-stone model, respectively. The parameters of the selection functions were *V _{S}* = 5, and the difference between the habitat optimum phenotypes was 0.716 in the island model, and 2.63 in the stepping-stone model. There was no environmental effect added to the genotypic values of the quantitative trait loci (

*V*

_{E}= 0).

For each simulation, was calculated from a simulated half-sib breeding design. In the default configuration, samples were taken from 20 populations, and for each population five sires were mated to five dams each. These numbers were varied to better understand the power of the approach. Five offspring from each dam were measured, and from the results was calculated from the population and sire effects using an analysis of variance.

For all parameter combinations, we tested the null hypothesis of neutrality using the new method and with the best method previously available, the parametric bootstrap approach from O'Hara and Merilä (2005). We refer to this latter approach as the “traditional approach” throughout.

#### Simulation results:

The simulations show that the new method has a more accurate type I error rate and more power than the traditional method. There is sufficient power to detect high *Q*_{ST} when the of a trait is severalfold greater than the mean *F*_{ST} and when large numbers of populations (10 or more) are included in the analysis. However, large numbers of marker loci are not necessary. On the other hand, it is difficult to reliably detect the signal of homogeneous selection; the power to discriminate significantly small *Q*_{ST} values is low, even when the mean *F*_{ST} value is much higher than expected for most intraspecific comparisons.

First, examine the cases where the null hypothesis is true; that is, when the trait is evolving without the influence of selection. The traditional method has an overall type I error rate that is a bit high overall (Table 1), but it is seen to be particularly poor when the type I errors are divided into the two tails. The type I error rate for the traditional method with low *Q*_{ST} values is 7.0–7.8% (in contrast to the expected 2.5%), whereas the type I error rate is far too low for high values of *Q*_{ST} compared to mean *F*_{ST} (0.41–0.44%). In all cases, the one-tailed error rates are different from the stated 2.5% with extremely small *P*-values (the largest being *P* = 4 × 10^{−59}). In contrast, the new method has a much better type I error rate. The total error rate for the new method is always within the 95% confidence interval of the expected value of 5%, and the errors are more evenly divided into the two tails.

With heterogeneous selection in the island model, the mean *Q*_{ST} ranged from 0.026 to 0.564, depending on the amount of migration among populations (see Table 2). The power of the method depends in part on the relative value of the typical *Q*_{ST} value in comparison to the mean *F*_{ST}. When *Q*_{ST} is expected to be much greater than the mean *F*_{ST}, the method has substantial power (Figure 3). Importantly, the new method has much higher power to detect heterogeneous selection than the traditional method (Figure 3). With small sample sizes and low true differences between *Q*_{ST} and *F*_{ST}, neither method is able to detect the effects of selection, and with extremely large samples both methods have high power. But for intermediate (and realistic) sample sizes with moderate *Q*_{ST} values, the new method has substantially more power to detect heterogeneous selection than the traditional method. We also ran simulations of stronger selection (where an individual perfectly adapted to the other environment would have a 10% fitness reduction), where *Q*_{ST} is higher. In these cases the power was very high for both methods, except for the cases when there were only two populations in the study. There again, the new method greatly outperformed the traditional method (results not shown).

In contrast, under only rare circumstances was there much power to detect that the *Q*_{ST} value of a trait was significantly smaller than expected under neutral differentiation (Figure 4). Even when the mean neutral *F*_{ST} is relatively high, the left tail of the distribution of neutral *Q*_{ST} is still relatively dense for small values, making it difficult to separate a low *Q*_{ST} from neutral expectations.

These preceding calculations are based on moderately large sample sizes for the quantitative genetic measurements but not very many (10) marker loci for the calculation of *F*_{ST}. Increasing the number of marker loci increases power, but not dramatically (Figure 5a). On the other hand, using more families per population to estimate better has a beneficial effect (Figure 5b). However, the power of the analysis is critically dependent on the number of populations surveyed (Figure 3). The variance of the expected distribution reduces in proportion to the number of demes measured (Whitlock 2008), and the reliability of estimates increases strongly with number of demes (Goudet and Büchi 2006). Reliable inference about the neutrality of quantitative traits requires sampling of large numbers of populations. The estimation of both *Q*_{ST} and *F*_{ST} depends critically on the estimate of the variance among populations, and the power of the estimate of this variance depends on the number of populations sampled. In studies with small numbers of populations, the estimates were also quite biased for both methods (results not shown), explaining the apparently higher power for the smallest sample sizes.

Results under the stepping-stone model are quite similar. The mean *Q*_{ST} for the stepping-stone simulations was 0.638 with selection and 0.0488 for the neutral case. The power of the analysis is largely dependent on the number of populations sampled (Figure 6) and varies in an equivalent way with the number of families and neutral loci sampled (results not shown).

## DISCUSSION

The *Q*_{ST} of neutral traits is potentially extremely variable from trait to trait, especially when the number of populations in the system (or in the study) is small. This distribution is approximately predictable with knowledge of the mean *F*_{ST} of neutral marker loci for the same populations (Whitlock 2008). A simple function of *Q*_{ST} [equal to (num_{pops} −1)*Q*_{ST}/_{ST}] is approximately distributed by a χ^{2} distribution with num_{pops} − 1 degrees of freedom; this derives from the Lewontin–Krakauer distribution. Given that for traits determined by additively acting alleles the mean *Q*_{ST} is approximately equal to the mean *F*_{ST}, the sampling distribution of neutral *Q*_{ST} can be predicted.

Most studies of *Q*_{ST} explicitly compare of a trait to *F*_{ST}, as a test of whether spatially heterogeneous or homogeneous selection affects the distribution of the trait. These studies use the observed properties of to predict its sampling distribution. However, when testing the null hypothesis of neutrality, we need to infer the sampling properties of for neutral traits, not of traits with high or low expected *Q*_{ST}'s. The difference matters because the width of the sampling distribution of depends on its mean value (Figure 2).

We have developed a new method to test for selective neutrality using the difference between and mean *F*_{ST}. We account for the expected distribution of *Q*_{ST} under neutrality using a distribution inferred from the mean *F*_{ST}. Compared to the traditional method, the new approach works extremely well. The traditional method, which infers the distribution of from the observed , has very poor false positive rates (type I error). High *Q*_{ST} rejects the null hypothesis far too rarely, and low *Q*_{ST} rejects the null hypothesis too often (Table 1). This is because the error variance is overestimated for high *Q*_{ST} and underestimated for low *Q*_{ST} (Figure 2). The type I error rate for our new method is close to the stated values, and it is symmetric in the upper and lower tails as is desirable.

The new method is also more powerful than the traditional method for detecting spatially heterogeneous selection. Both the new and traditional methods work well when *Q*_{ST} is much greater than *F*_{ST} and with data from many populations, and both fail with too few data (*e.g.*, when the number of populations is two). However, in intermediate cases with moderate *Q*_{ST} and moderately large sample sizes, the new method has much more power than the traditional approach. With homogeneous selection, the traditional method appears to have more power, but this is largely due to its inflated type I error rate. Positive results are not reliable for homogeneous selection and small numbers of populations.

Unfortunately, in some biologically interesting circumstances, there are a limited number of populations that exist in nature, and in these circumstances it is simply not possible to reliably show that even a large is different from the neutral expectation. This is especially true when the mean *F*_{ST} of neutral markers is also high. For example, some applications of the *Q*_{ST} approach have been made comparing a pair of subspecies. In these cases, the mean *F*_{ST} is typically high (or the two populations would not have been given subspecific status) and the total number of such populations in nature is just two. In this case, there is little hope of finding significant evidence of selective differentiation via the *Q*_{ST} approach. For example, when there are only two populations, the 97.5 percentile of the distribution of *F*_{ST} or *Q*_{ST} is approximately five times the mean of the distribution, according to the Lewontin–Krakauer distribution. Even with no error in estimating *Q*_{ST}, a trait would have to have a *Q*_{ST} value five times as large as the mean *F*_{ST} to be significantly in the tail of the distribution, for the two-population case. *Q*_{ST} is never estimated with such small error, so in practice the of the trait would have to be much larger than five times the mean *F*_{ST} to find statistical evidence of selection.

There is little power in typical data sets to test for spatially uniform stabilizing selection using − *F*_{ST} comparisons. It has been suggested that small values of *Q*_{ST} relative to *F*_{ST} may indicate strong stabilizing selection with the same optimum in all populations, because such selection would oppose genetic drift and maintain approximately the same mean in each local population. However, the distribution of neutral includes a dense left-hand tail in most intraspecific comparisons, because, with a small mean *F*_{ST} and a few populations sampled, a large number of loci with small *F*_{ST} (or neutral traits with small *Q*_{ST}) are expected just by chance. Only with very strong selection and levels of *F*_{ST} that verge on interspecific values (*F*_{ST} = 0.2) have we found even moderate power to detect spatially uniform selection (Figure 4).

There are a few other caveats that need to be kept in mind when applying this method, in common with all interpretations of *Q*_{ST}. It is crucial that *F*_{ST} and *Q*_{ST} are both estimated without bias, and there are many sources of bias that affect most measures (Whitlock 2008). In particular, it is important that is estimated from a breeding design and not just from phenotypic data. Furthermore, it is essential that the study organisms are grown in a common garden to avoid conflating phenotypic plasticity with local adaptation.

Importantly, the simulations conducted here all assumed that traits are determined by alleles that interact additively, both between and within loci. Dominance variance can under some circumstances cause mean *Q*_{ST} to be greater than mean *F*_{ST}, even for neutral traits. There is controversy over whether the effects of dominance will typically lead to increased values of *Q*_{ST} (Lopez-Fanjul *et al*. 2003, 2007; Goudet and Büchi 2006; Goudet and Martin 2007), but importantly the distribution of *Q*_{ST} among neutral traits has not been investigated for traits affected by dominance or epistasis. Our ability to use the distribution predicted from the *F*_{ST} of marker loci depends on the distribution being similar for *Q*_{ST}, and this has not been investigated for traits with dominance. This method, and indeed any comparison of *Q*_{ST} and *F*_{ST}, requires stringent assumptions about the additive basis of the quantitative trait.

The method also relies on the assumption that we are able to identify neutral markers to use for *F*_{ST} to generate the null distribution. With a large number of marker loci, the chances may be high that at least some of the loci are affected by spatially heterogeneous selection. If such loci can be identified by a procedure such as *fdist2* (Beaumont and Nichols 1996), then removing them from the analysis is probably best, although this may make the test less conservative. Alternatively, all marker loci could be left in the analysis, on the assumption that the loci affecting quantitative traits may sometimes differentiate by pleiotropic effects or by linkage to other selected loci. Keeping the full spectrum of marker loci potentially would control for these extraneous effects.

Finally, there are some specific issues with the new simulation method that limit its breadth of application. The method given here uses the Lewontin–Krakauer distribution to infer the distribution of neutral *Q*_{ST} from mean *F*_{ST}. According to simulation results this should work fine for typical values of mean *F*_{ST} (less than ∼0.2). However, the Lewontin–Krakauer distribution is based on a χ^{2} distribution, and its right tail extends to positive infinity and is not constrained to be less than one. As a result, for large values of mean *F*_{ST} the probability of the right tail of this Lewontin–Krakauer distribution becomes an inaccurate representation of the true tail probability.

To use *Q*_{ST} to test for selection, we have to compare an individual trait's to the distribution of possible values of *Q*_{ST} under neutrality. By doing so, we have developed a method that has much better type I error rates and higher power for detecting spatially heterogeneous selection than traditional approaches.

## Acknowledgments

We thank Bob O'Hara for providing the R code for the parametric bootstrap, and Sally Otto, Jérôme Goudet, and an anonymous reviewer for extremely helpful comments on a previous version of this article. Jérôme Goudet pointed out that *F*_{ST} estimated from multiallelic loci have a different distribution, which helped us to clarify the use of the Lewontin-Krakauer distribution for *Q*_{ST}. This research was supported by a Discovery Grant from the Natural Science and Engineering Research Council (Canada) (to M.C.W.) and a Swiss National Science Foundation grant PA00A3-115383 (to F.G.).

## Footnotes

Supporting information is available online at http://www.genetics.org/cgi/content/full/genetics.108.099812/DC1.

Communicating editor: J. Wakeley

- Received December 15, 2008.
- Accepted August 13, 2009.

- Copyright © 2009 by the Genetics Society of America