Methods for QTL analysis with progeny replicated in multiple environments
This research was supported by grants from the Natural Sciences and Engineering Research Council of Canada.
The genetic component of variation in continuous traits is caused by allelic differences at discrete genetic loci. The discovery and characterization of these QTL is an important area of biological science. This paper explores methods that may help to improve the quality of inferences about QTL.
With many current methods of QTL analysis, hypotheses about the effects of a locus on a quantitative trait are tested at discrete positions along a map of genetic markers. The presence of a QTL is inferred when a significant amount of phenotypic variance can be attributed to estimated QTL parameters. This type of procedure was first described by Lander and Botstein (1989) as 'interval mapping'; we now use the term 'simple interval mapping' (SIM) to distinguish this from other procedures being developed. Several variations on SIM have been described (eg. Knapp et al., 1990; Haley and Knott, 1992; Martinez and Curnow, 1992). All SIM procedures search for a single 'target QTL' at positions throughout a mapped genome.
When multiple QTL segregate, the sampling error associated with inferences about a target QTL may be inflated by the effects of other QTL. Furthermore, linked QTL can cause biased estimates of QTL position. These problems have been recognized for some time (eg. Knapp and Bridges, 1990), and have stimulated efforts to develop methods of QTL analysis that can simultaneously account for multiple QTL. If it is known which marker intervals contain QTL, then it is relatively straightforward to apply a model that contains parameters for multiple QTL. It is also possible to simultaneously estimate the most likely positions of QTL within two or more intervals (Knapp, 1991; Haley and Knott 1992). However, if the intervals that contain QTL are unknown, then searching for the most appropriate model can pose serious problems. For example, with 100 marker intervals, there are 4950 possible two-interval models and 161,700 possible three-interval models. Not only does this present computational challenges, but when a large number of models are tested, statistical error control becomes a difficult and important problem.
Recently, Zeng (1993; 1994) described a method called 'composite interval mapping' (CIM) which permits searching for one QTL at a time while simultaneously accounting for the effects of other segregating QTL. This is accomplished by fitting parameters for a target QTL in one interval while simultaneously fitting partial regression coefficients for 'background' markers outside of that interval. In principle, the genetic variance caused by QTL other than the target is absorbed by the regression coefficients of the background markers. Zeng's methods for CIM are available in the software package 'QTL Cartographer'. Other authors (Rodolphe and Lefort, 1993; Jansen and Stam, 1994) have described similar methods based on this principle.
Another facet of QTL analysis concerns the fact that QTL effects may depend on environment. Methods for analysis of QTL x environment (QTLxE) interactions are not well developed. Beavis and Keim (in press) discuss QTLxE interaction, and review experiments where QTL were analyzed in multiple environments. Most of these experiments were performed in a small number of environments where each environment was analyzed separately. Separate analysis by environment circumvents the problem of dealing with QTLxE interaction and avoids complications due to environmental heterogeneity. However, the results of separate analyses are difficult to interpret, and they do not take advantage of the built-in replication provided by multiple environments. Beavis and Keim (in press) suggested methods for combined analysis based on mixed models (random environments and fixed genotypes) with parameters estimated by best linear unbiased prediction. These models are tested at individual marker loci, and have not been extended to SIM or CIM.
The factors that led to the development of CIM may also be important for analyzing data from multiple environments. With multiple environments, one may wish to test whether a QTL exhibits QTLxE interaction. However, the presence of other QTL can interfere with this test. Non-target QTLxE could cause heterogeneity in the amount of genetic variance that is present in the residuals. Furthermore, if non-target QTL are linked to the target QTL, then they can cause biased inferences about QTLxE for the target locus. If these non-target QTL are accounted for, then the added replication that is built into the multiple environments might be highly effective in reducing the sampling error for inferences about a target QTL.
Jiang and Zeng (in press) have described methods to perform CIM with multiple correlated traits. These methods could also be used to analyze data from multiple environments when tests about specific QTLxE effects are of interest. Software tools for performing this analysis are not yet available, and it is not clear whether these methods will be manageable with a large set of environments.
So far, few QTL experiments have been performed in a large set of environments. However, a growing number of mapping populations in plant species consist of 'immortal' homozygous lines, and there is growing interest in marker-assisted plant breeding. We anticipate that this, together with an awareness of the importance of quantitative trait stability, will encourage the accumulation of many phenotypic observations in these mapping populations. This will present new challenges for those involved in the analysis and interpretation of these results.
An example of these challenges can be seen in the North American Barley Genome Mapping Project (NABGMP), an affiliation of barley researchers collaborating on the development of genetic maps in barley. As part of the NABGMP, Hayes et al. (1993a) reported a study where 150 doubled-haploid (DH) lines were tested in five environments. That experiment was later extended to include 11 further environments in a second growing season (Hayes et al., 1993b). Tinker et al. (in preparation) will report on a similar study in two-row barley, where the replicated mapping population has been observed in 30 environments. The initial challenge in the statistical analysis of these studies was to look at the entire set of data to identify QTL and QTLxE without undue statistical error. Hayes et al. (1993a) used non-linear models to perform an analysis similar to SIM with the addition of a test for QTLxE. Data from the extended data-set (Hayes, 1993b) have been analyzed separately by environment. In a preliminary analysis, Tinker et al. (1993) searched for markers associated with QTL using stepwise multiple linear regression a method that could not account for QTLxE.
Here, we describe a procedure for the analysis of QTL in progeny that are replicated across multiple environments. This procedure is similar in principle to the methods of CIM described by Zeng (1994), but simplifications are made in order to provide efficient algorithms that are applicable to large bodies of data. Because of this, we refer to this procedure as simplified composite interval mapping (sCIM). This procedure is tested using simulated sets of data, and computer software for the application of sCIM is made available in a companion note. We hope that these procedures and this software will provide useful tools for researchers concerned with the analysis of QTL in multiple environments.
In QTL analysis, the most important inference may be QTL presence: inferring the presence of a QTL when there is no QTL is a type I error; inferring the absence of a QTL when a QTL is actually present is a type II error. Type I and II error rates are the frequencies with which these errors would occur if many more experiments were performed in the same manner. The power of a statistical test is one minus the type II error rate. In QTL analysis, power represents the probability of inferring the presence of a QTL when a QTL really exists. Error rates can be expressed for a single statistical test (comparisonwise rates) or in relation to all tests that are made in an experiment (experimentwise error rates).
When we infer that a QTL is present, we use the data to estimate its position (usually in cM, relative to a set of markers) and to estimate the effects of it alleles. The quality of these estimates can be described the terms precision and accuracy. If many experiments could be conducted in the same manner, the precision of an estimate would be measured by its variability among experiments, and the accuracy of an estimate would be measured by how much its average value deviated from the true value of the parameter. The amount of deviation is called BIAS. An estimate that is perfectly accurate is said to be unbiased.
In an attempt to quantify error rates, we use the data to compute test statistics. Many test statistics can be used for QTL analysis, including t, F, LR, LOD, and Wald. It is not the test statistic that is important, rather, it is the way that the test statistic is used to draw inferences. Consider an experiment where no QTL are present: it is said to represent the null hypothesis. To infer that a QTL is present (the alternate hypothesis) we must reject the null hypothesis. Most test statistics are computed such that their absolute value is low under the null hypothesis and high under the alternate hypothesis. Thus, we infer a QTL in an experiment that gives a high test statistic. However, random error occasionally produces a high test statistic in an experiment where the null hypothesis is true. One of the most important steps in QTL analysis is to decide on a threshold value for the test statistic. If the threshold is not exceeded, the null hypothesis (no QTL) is accepted. If the threshold is exceeded, the alternate hypothesis (QTL presence) is made.
A threshold is usually chosen to give a specific type I error rate (e.g. P=0.05). In many applications, a threshold is chosen by making assumptions about the distribution of the test under the null hypothesis. The weakness of this approach is that the true distribution may deviate considerably from the assumed distribution. Alternate methods are based on resampling the data in order to deduce the nature of the true distribution. A resampling method discussed in this paper is called permutation. Permutation involves scrambling the order of the data randomly so that the effects of the parameters are lost. This produces a set of data that represents the null hypothesis. The distribution of the test statistic under the null hypothesis is derived by computing the test statistic in many random permutations of the original data. One can then choose a test statistic that is larger than (e.g.) 95% of this distribution.
The use of genetic simulation plays an important role in this paper. Simulation is a means of producing data sets for which the underlying parameters are known. With the use of computers, a simulation can be repeated many times. These two components (known parameters, and repetition) make it possible to test methods of statistical analysis, and to determine properties that might otherwise be impossible to measure. Specifically, we can measure the type I error rate and power achieved with a particular test statistic and threshold, and we can measure the precision and accuracy of estimates that are made. All of this is accomplished by repeatedly simulating an experiment, analyzing the resulting data sets with the statistical methods in question, and summarizing the average results in relation to the true parameters. The disadvantage of genetic simulation is that it produces conclusions that are valid only for the imaginary parameters that are used. Thus, it is important qualify the scope of the conclusions that are drawn. Sometimes it is useful to test a range of parameters that represent different extremes of what might conceivably be encountered in nature. Further discussion of genetic simulation and a menu-driven computer program for performing genetic simulation are provided by Tinker and Mather (1993).
The Proposed Procedure for Composite Interval Mapping
First, a set of m background markers are chosen at regular intervals throughout the genome. The number of background markers that are chosen can have important effects on the power and precision of CIM (see Zeng, 1994 for a discussion of this). Probably a 20 to 40 cM spacing of background markers will provide an adequate compromise, provided m is substantially less than k. Partial regression coefficients for each background marker are estimated within each environment by using least-squares multiple linear regression to solve for B in
where Y is a vector of quantitative trait phenotypes for k progeny in a given environment, M is a matrix of genotypes for m background markers in k progeny, and E is a vector of residuals. The genotypes in M are coded as integers, based on the parental origin of alleles. The type of coding determines the allelic effect that is to be estimated. For example, if the homozygotes of alleles from parent A and B are coded as 1 and 0 (respectively), then the effect of substituting two alleles from parent A will be estimated.
Tests for the presence of a target QTL are performed at regular cM positions within each marker interval. The markers that define these intervals are not limited to the background markers. As these tests proceed along the length of the mapped genome, they will periodically cross a background marker. Each time this happens, a new vector of adjusted phenotypes Y' is formed within each environment. This adjustment is made as
where B' is the same as B except that coefficients for the new flanking background markers are replaced by 0. The correction for the mean is intended to eliminate the need to fit environmental main effects in later models. In other words, Y' is adjusted for environmental main effect, and for all background markers except for those that flank the position that is being tested. A position that is telomeric to a background marker has only one flanking background marker. When a model is tested at the position of a background marker, two tests are made: the first test is based on flanking background markers for the previous interval, and the second is based on flanking background markers for the next interval. This procedure is depicted in the following diagram, where integers represent the positions of background markers, "M's" represent the positions of other markers, and dots represent the positions of other tests. The integers below the diagram indicate the flanking background markers that would be used for selected tests.
M....1....M.....2...........3.....M.....M...4 | | | | | | | | First test: 1 1 1,2 1,2 2,3 2,3 3,4 3,4 Second test: 1,2 2,3 3,4 4
If there are multiple environments, then the additive effect of the target QTL is estimated separately within each environment by solving for b' in
where X' is a vector of genotypes for k progeny at the target locus. If there is a marker at the target locus, then these genotypes are coded in the same manner as M. Otherwise, the genotypes in X' are estimated from the nearest flanking markers based on a model of complete crossover interference (see Haley and Knott, 1992). When data are missing for the nearest flanking marker, genotypes can be estimated from the nearest markers for which data are available (Martinez and Curnow, 1994). The scalar b' is estimated by least-squares regression, with E' giving the vector of residuals.
An estimate of the additive main effect of the target QTL is then made. This is given by b'' in the expression
where Y'' is the concatenation of the Y' vectors for j environments, and X'' is the vector X' repeated for each environment, such that X'' has the same length as Y''. The scalar b'' is estimated by least-squares regression, with E'' giving the vector of residuals.
The same procedure can be followed for SIM, except that no background markers are used, so Y' is adjusted only for the mean in each environment:
The Test Statistics:
At each target position, we wish to test the reduction in variance caused by fitting additive QTL effects. Various test statistics could be used for this, but we have chosen the simple test statistic described by Haley and Knott (1992):
where n is the number of observations (n = j X k) and where RSSf and RSSr are the residual sums-of-squares in full and reduced models (respectively). First, we calculate a test statistic for the reduction in variance due to fitting a QTL main effect:
Then, if there are multiple environments, we calculate a test statistic for the reduction in variance due to fitting separate effects within each environment:
where SSY'' and SSE'' are the sums of squares of Y'' and E'' in , and the term SSE' is the cumulative sum-of-squares for E' in  across all environments. For SIM, TSm and TSe are approximately equal to the F statistic multiplied by the number of additional degrees of freedom in the full model: one for TSm and j for TSe. Because of the way Y' and Y'' are 'pre-adjusted', it is not clear how many degrees of freedom are represented for sCIM. One of the objectives of this paper is to determine whether this uncertainty can be accounted for by the use of permutations to establish thresholds.
These tests require the assumption that environments are fixed. We chose to make this simplifying assumption with the understanding that it limits the scope of the inferences to the array of environments that have been tested. With a fixed-environment model we can obtain estimates that help to explain the phenotypes that were observed in the mapping experiments. This is useful to test for the presence of loci that are capable of affecting the trait, even if we can not make more general inferences about their average allelic effects in a different set of environments.
Defining Thresholds by Permutation:
Under ideal circumstances, TS would be tested against a chi- square distribution (the expected distribution for one test under the null hypothesis). However, there are several reasons why we cannot assume this type of distribution here. Firstly, the distribution of the maximum TS is unique to the distribution of markers in the genome (a problem discussed by Lander and Botstein, 1989). This prevents us from being able to control the experimentwise type I error rate without making simplifying assumptions. Secondly, TSm and TSe for sCIM have distributions with unknown degrees of freedom due to the adjustment made in . Thirdly, the data may not comply with the basic assumptions for testing a linear model. For example, the amount of residual variance may be heterogeneous across environments. In an attempt to compensate for all of these factors, we have used permutation tests.
Permutation tests for QTL analysis were described by Churchill and Doerge (1994). The data being analyzed is repeatedly permuted to destroy associations between markers and quantitative trait measurements. Since each permutation is analogous to a new experiment in which the null hypothesis is true, each permuted data set is analyzed to determine the maximum value of the test statistic for that experiment. A threshold test statistic is chosen to be larger than (e.g.) 95% of the maximums in all permutations. Methods to apply permutation tests have been implemented in the program QTL Cartographer; however, thresholds for CIM may not be estimated properly by permutation because this also destroys the association between a QTL and background markers. The application of permutation tests to find thresholds for testing QTLxE has not been described.
Software for Implementing this Procedure:
These procedures have been incorporated into a software package called MQTL, presented in a companion note (Tinker and Mather, 1995). A genetic simulation package called MSIM has also been developed. These two packages were used to perform the experiments described below. Both are available by ftp from the site gnome.agrenv.mcgill.ca.
Experiment 1: Testing Simplifications to Composite Interval Mapping
Two thousand simulations were performed, each representing a random population of 200 DH progeny tested in a single environment. A genetic map was defined with 32 markers distributed uniformly at 20-cM intervals on four 140-cM chromosomes. Parameters of 5 QTL were defined as shown in Table 1. Marker and QTL genotypes were generated by using pseudo- random numbers to simulate crossovers between the parental chromosomes in the F1. Quantitative trait phenotypes were generated by adding random error terms to the values for the quantitative trait genotypes. These error terms were selected randomly from a normal distribution with a variance equal to one quarter of the expected genetic variance (ie., a heritability of 80%).
Each simulated population was analyzed by three procedures: SIM, CIM, and sCIM. The SIM procedure was implemented using the procedure described above, but without background markers (see ). Prior to this, it was verified that this procedure gave results that were almost identical to the SIM procedures described by Lander and Botstein (1989). The CIM procedure corresponds to Zeng's (1994) model I, while sCIM was based on the procedure described in the previous section of this paper. Zeng's CIM was implemented using QTL Cartographer, while both SIM and sCIM were implemented using MQTL. All 32 markers were used as background markers for CIM and sCIM.
Significance thresholds for each method of analysis were estimated from 1000 permutations of the first simulated population. These thresholds were estimated by finding a test statistic value that was higher then the maximum experimentwise value for 95% of the permuted sets of data. These thresholds were applied to the analyses of the remaining simulations. The realized level of type I error control achieved by using these thresholds was measured by examining the frequency of QTL detection in the remaining simulations for areas of the genome with no QTL: chromosome 4 and the four outermost marker intervals of chromosome 1.
Experiment 2: Testing the Proposed Procedure using Simulated Data from Multiple Environments
Four thousand simulations were performed for a trait with the parameters shown in Table 2. These simulations were similar to those performed in Experiment 1, but quantitative trait phenotypes for 200 DH progeny were generated differently for each of eight environments. QTL were defined in the same positions as in Experiment 1, but with unique effects in each environment; thus, QTLxE interactions were present. Furthermore, each environment had a different amount of environmental variance. Environmental variance ranged from 18% to 100% of the expected genetic variance, giving the heritabilities (h2) shown (Table 2).
Each set of simulated data was analyzed by procedures described in this paper to produce four types of test-statistic scans: TSm for SIM main effects, TSe for SIM QTLxE, TSm for sCIM main effects, and TSe for sCIM QTLxE. The first simulated population was subjected to 5000 permutations. These permutations were applied such that the same random order of genotypes was produced in each environment. Each permutation was analyzed to produce all four types of scan, and the largest test statistic for each scan was recorded for each permutation. Permuted thresholds were chosen to control experimentwise error rates at 0.05 (see methods for Experiment 1). Realized thresholds were estimated by recording the maximum value of the test statistic on chromosome 4 for each of the remaining simulations. Chromosome 4, with one fourth of the genome and no QTL, represented the null hypothesis. Thus, a realized threshold was the test statistic value that was exceeded on chromosome 4 in approximately 1.25% of all simulated sets of data.
Experiment 3: Exploring the Use of Permutation to Choose Thresholds
Simulations were performed to represent traits of varying complexity (Table 3). All simulations were performed using a model of DH lines with seven chromosomes. For each trait, there were at least 2 chromosomes with no QTL. Simulations were repeated 5000 times for each trait. Data from the first simulation of each trait was permuted 5000 times in order to estimate a threshold to control the experimentwise type I error rate at 5%. Then the remaining replications were carried out and analyzed. The maximum test statistics from two chromosomes with no QTL were used to estimate realized thresholds (see methods for Experiment 2).
Experiment 4: Testing Methods Using a Reanalysis of Steptoe/Morex Data
To test these methods in comparison with previously published results, we examined data from the six-row doubled- haploid barley population produced by the North American Barley Genome Mapping Project (NABGMP). The marker data and linkage map for this population are described in Kleinhofs et al. (1993). The quantitative trait analysis is described in Hayes et al. (1993a). An updated QTL analysis with additional environments is found in Hayes et al. (1993b). We used the data from this updated analysis, which are publicly available from the gopher site greengenes.cit.cornell.edu.
We started with a skeletal map of 123 markers (the same map used by Hayes et al., 1993a) to give a relatively uniform marker spacing throughout the genome. We then selected a subset of 45 background markers. In selecting these markers, we tried to find markers spaced at approximately 25 cM intervals, but we also favoured markers for which complete data were available. Using this map, and the data for the 123 markers, we analyzed the phenotypic trait 'yield' (weight of harvested grain per unit area), for which data were available from 16 environments. Following this analysis, the data were permuted and reanalysed 1000 times to deri
Results and Discussion
Results of analyses for the first set of simulated data, along with the permuted thresholds, are shown in Figure 1. In this example, SIM detected QTL A, B, C, and E, while CIM and sCIM detected QTL A, C, D, and E. None of the three methods detected a spurious QTL on the fourth chromosome.
The scans of the test statistic for both CIM and sCIM are steeper than those for SIM (Figure 1). This gives the appearance that CIM and sCIM are more accurate than SIM. In the scan representing CIM (Figure 1b), the test statistic for any given position should be independent of QTL that lie outside of an interval bounded by the closest background markers (Zeng 1993). This means that CIM should be capable of separating the effects of QTL that are separated by at least two background markers. This property may not apply to sCIM (Figure 1c). Nevertheless, the scan for sCIM has a similar shape to the scan for CIM, suggesting that these properties may be approximated by sCIM.
Results from the remaining simulations are summarized by chromosome region in Table 4 and Figure 2. Region F (ie. chromosome 4, with no QTL) represents 1/4 of the genome, so a detection frequency of 0.0125 in this region corresponds to an experimentwise type I error rate of 0.05. The detection frequencies in region F were between 0.011 and 0.013, so it appears that the permuted thresholds gave satisfactory type I error control for all three methods. The frequency of simulations with significant test statistics in the four outermost intervals of chromosome 1 was less than 0.005. These intervals represent about one tenth of the genome, so it appears that the level of error control was also adequate for regions that were linked to a QTL. This would not apply to the two intervals immediately adjacent to the interval with QTL A, because the test statistic in those intervals is partially influenced by QTL A. It would not apply to the test statistic for SIM anywhere on chromosome 1 because SIM does not include an adjustment for background marker genotypes.
Since regions A through E each contain one QTL, the detection frequencies in those regions indicate the approximate power for detecting the respective QTL. In these simulations, it appears that SIM gave superior power for detecting all but QTL D (Table 4). For QTL B, this seems to be an artifact caused by the algorithm that was used to score the peaks. This algorithm scored a peak as the highest test statistic in a given region. For SIM, the highest test statistic within region B was often at one end of the region, adjacent to region C (see Figure 2a), and was presumably part of a larger peak in region C. Had each scan been examined visually (or analyzed by a more sophisticated peak detection algorithm), it may have been possible to distinguish the peaks representing QTL B from the high points within region B that were really part of the peaks in region C. Both CIM and sCIM detected a QTL in region B less frequently than did SIM, but the location of the peaks was usually unambiguous. This is shown by the example scans (Figure 1b and 1c) and by the peak frequencies (Figure 2b, 2c).
In region D, only 18% of simulations gave significant test statistics with SIM (Table 4), and many of those were significant only because of the influence of the larger QTL in the adjoining region. However, CIM and sCIM were able to detect a QTL in region D in 63 and 67% of all simulations, respectively (Table 4). Again, the locations of these peaks were unambiguous, as shown by Figures 2b and 2c, and exemplified in Figures 1b and 1c. This clearly demonstrates that CIM or sCIM can increase the power of detecting QTL that are linked in repulsion.
For estimating QTL position, both CIM and sCIM were superior to SIM. This can be seen by looking at the averages and the standard deviations of the peak positions in each region. This part of Table 4, when compared to Table 1, shows that both CIM and sCIM gave higher accuracy and precision than SIM. There were slight differences between CIM and sCIM, but these are probably not meaningful.
For estimating QTL effect, the three methods differed in accuracy. SIM overestimated QTL that were linked in coupling and underestimated QTL that were linked in repulsion. This is expected because SIM is not able to resolve the combined effects of linked QTL. Surprisingly, CIM tended to underestimate the effect of all QTL, while sCIM seemed to give unbiased estimates. This may not always be the case. In simulations where a QTL is located at the position of a marker (not shown) we have found that sCIM will give estimates with a small upward bias, while the downward bias of estimates made by CIM is less pronounced.
In this experiment, we have deliberately chosen a scenario where genetic variance is caused by several QTL, and where some of those QTL were linked. All advantages of sCIM or CIM relative to SIM will disappear if there is only one QTL segregating, and the advantages are less likely to be shown with a simpler genetic model. Many other factors may influence the efficiency of CIM or sCIM, but perhaps the most important consideration is the density and distribution of background markers. We present some general guidelines about this in the last section of this paper.
Experiment 2: Testing the Proposed Procedure Using Simulated Data from Multiple Environments
The analysis for the first simulation is shown in Figure 3. The thresholds depicted there are 'realized' thresholds estimated from the remaining simulations. For SIM, permuted thresholds were approximately equal to realized thresholds. For sCIM, the permuted thresholds were either too high (for the main effect) or too low (for the interaction). The use of permutation to establish thresholds was investigated further in Experiment 3. The remaining results for this experiment are based on realized thresholds.
In Figure 3, the inferences that would be drawn about the number and positions of QTL seem to be approximately the same for both SIM and sCIM. However, as in Figure 1, sCIM gives steeper, narrower peaks. The QTL on chromosome 1 (QTL A) was detected as a main effect, but not as an interaction. Two QTL (B and C) were detected on chromosome 2: both show a separate and significant main effect and interaction by sCIM, but only one peak is evident for the main effect detected by SIM. Two QTL (D and E) are detected on chromosome 3, but only based on the test for QTLxE. No QTL were detected on chromosome 4. All of this is consistent with the underlying simulation parameters (Table 2) except that neither method detected a main effect for QTL D.
Table 5 shows average results from analyses of the remaining simulations, and Figure 4 shows the frequencies of peaks by position that contributed to these averages. The frequency of detection in region F shows that all tests gave chromosomewise type-I error rates of approximately 0.0125. This level is equivalent to an experimentwise error rate of 0.05. Recall, however, that this error rate was imposed by the use of realized thresholds. The frequency of detection in the other QTL regions gives an indication of the power for detecting each QTL. Both SIM and sCIM showed a high power for detecting the main effect of QTL A. This power was higher than in Experiment 1 (Table 4), even though the average heritability was lower in this experiment and the average main effect was the same. Presumably, the power is higher here because of the replication that is built in to the multiple environments. In the regions with linked QTL, both methods also showed a high power for detecting the QTL. Sometimes the power for SIM was slightly higher, but this is probably due to the artifact that was described in Experiment 1. The frequency of detection for non-existent effects (QTLxE at A and main effect at E) is low, as it should be, except that SIM detected spurious main effects at a rate of 21% in region E. As shown by Figure 4a, the spurious detection in this region is probably an artifact of the automatic detection algorithm (as discussed earlier).
The average and standard deviation of the positions of peaks indicate the accuracy and precision for estimating QTL position in each region. In all cases, sCIM was more accurate than SIM, and in most cases, sCIM was more precise than SIM. Figure 4 shows how the positions of the peaks were distributed for each type of test. In region B, many of the peaks for SIM were on the border with region C, indicating that the test statistic was influenced by QTL C. This was not that case for sCIM, where the test statistic in region B appears to be influenced only by QTL B.
Table 5 also shows average and standard deviation for the regression coefficient that estimates QTL main effect. In comparing these averages to the values of the underlying parameters (Table 2), it is apparent that SIM sometimes gave biased estimates for the effects of linked QTL, while sCIM has apparently estimated all QTL main effects without bias. The standard deviations for these estimates were similar for both types of interval mapping, indicating that both methods gave similar precision for estimating QTL effect.
Experiment 3: Exploring the Use of Permutation to Choose Thresholds
In Experiment 2, a problem became apparent: the thresholds derived from permutations did not agree with the realized thresholds for sCIM. In a real situation, there would be no information about realized thresholds because the underlying model would be unknown. Experiment 3 was designed to examine factors that could influence the test statistic, and to build experience that might be applied toward making approximations of appropriate thresholds. An additional objective was to determine whether permuted thresholds for SIM were robust in the presence of factors such as heterogeneity of variance. The traits that were simulated (Table 3) cover an array of potential scenarios.
Permuted vs. realized thresholds are shown in Table 6. The first eight traits were simulated for a single environment. For these traits, the permuted thresholds were approximately the same as the realized thresholds. Thus, there seems to be no substantial problem with the use of permuted thresholds for SIM or sCIM with a single environment. The thresholds for sCIM were highly dependent on the number of background markers (e.g. Table 6, trait no. 1 vs. 2) and on population size (e.g. Table 6, trait no. 4 vs. 5). Since we do not know how to predict thresholds for sCIM, this highlights the importance of using permutation to derive appropriate thresholds.
The remaining traits (trait no. 9 to 20) were simulated in multiple environments. Here too, the permuted thresholds for SIM were approximately equal to the realized thresholds: both for main effect and for QTLxE. These thresholds were dependent on factors such as heritability (e.g. Table 6, trait no. 11 vs. 12) heterogeneity of variance (e.g. Table 6, trait no. 12 vs. 13) and amount of QTLxE interaction (e.g. Table 6, trait no. 13 vs. 15). Without the use of permutation, there is no analytical method to predict thresholds that can account for these unknown factors.
With sCIM, permutation gave thresholds that were comparable to realized thresholds for two traits: those where the null hypothesis was true ( Table 6, trait no. 9 and 10). For the remaining traits, the permuted thresholds were substantially different from the realized thresholds (Table 6). In most cases, the permuted threshold for testing main effects was too high and the permuted threshold for testing QTLxE was too low ( Table 6, traits no. 11, 12, 13, 17, 19 and 20). In two cases the opposite was true (traits no. 14 and 15), and in two cases the permuted thresholds were too high for both main effects and QTLxE (traits no. 16 and 18). Since the underlying model is only known for simulated traits, there seems to be no way of knowing whether the permuted thresholds are too high or too low. Thus, permutation is not an appropriate way to estimate thresholds for sCIM when data from multiple environments are analyzed. Although sCIM with multiple environments could lead to more powerful, accurate and precise QTL mapping (as shown by experiment 2), there may be no way to apply it with a known level of error control.
Rather than abandon sCIM as a tool for QTL analysis in multiple environments, we think that it could be applied judiciously in conjunction with SIM. For example, one would first apply SIM using permuted thresholds with confidence that the type I error rate is approximately correct. Then one would perform sCIM and examine the locations of the peaks as further guidance about the locations of QTL. This recommendation is based on the results of Experiment 2, where sCIM gave estimates of QTL position that were more precise and sometimes more accurate than SIM. One could also use sCIM to estimate the size of QTL main effects: based on the results in Experiment 2, sCIM would give estimates of QTL main effect that are at least as precise, and sometimes more accurate, than the estimates from SIM. The use of sCIM to make these inferences would not depend on a threshold for the test statistic, because the presence of a QTL would be inferred primarily from SIM.
One could go a step further and use permutation to make an educated guess about the appropriate threshold for testing sCIM. Speculation about the type of model that is present would be made based on the SIM results, then Table 6 (or additional simulated traits) would be used a guideline to approximate the appropriate threshold. This approach should be used with caution. If it causes inferences about additional QTL, then conclusion based on these inferences should be qualified.
Experiment 4: Testing Methods Using a Reanalysis of Steptoe/Morex Data
The NABGMP collected data on 150 DH progeny in 16 environments (1991 and 1992) from the Steptoe/Morex barley cross. This provided a wealth of information, but it also presented new statistical challenges. Should such data be treated by combined analysis, or should the data from each environment be analyzed separately? One might expect that a combined analysis would be more useful. Not only does this make use of added replication, but it can help to sort out questions about the identity of QTL that are detected at slightly different locations in each environment. Initially, the data from the first five environments was analyzed by combined analysis, using methods based on SIM (Hayes et al., 1993a). An alternative was to perform separate QTL analyses within each environment. This approach was used by Hayes et al. (1993b) to provide summary-type analyses for each of the 16 environments. This approach was effective in providing information about specific environments, but it produced a large quantity of results from which it is difficult to extract overall conclusions.
We first used SIM to analyze the Steptoe/Morex data from the five 1991 environments. The results (not shown) were similar to those reported by Hayes et al. (1993a) in that peaks were present at approximately the same locations in the genome. However, only one QTL main effect was inferred with our SIM analysis (the large effect on chromosome 3), whereas six were inferred by Hayes et al. (1993a). Presumably, this is because the permuted threshold used in our analysis was more conservative. Similarly, peaks for QTLxE were in similar locations, but a smaller portion were significant when tested against the permuted threshold.
We applied both SIM and sCIM to the analysis of data from the complete set of 16 environments (Figure 5). The permuted thresholds for SIM should be appropriate to give an experimentwise error rate of 5%. Thus, only one QTL main effect is statistically significant in the combined SIM analysis (Figure 5a), while there are five chromosome regions that exhibit significant QTLxE interaction. Due to the uncertainty about error control with sCIM, the results for sCIM are shown without thresholds (Figure 5b). Many of the peaks in the sCIM analysis correspond to peaks in the SIM analysis. Furthermore, some sCIM peaks (one pair on chromosome 2 and the pair on chromosome 6) now suggest the presence of two linked QTL that interact with environment.
For comparison, a summary of the results of separate analyses in the same data are shown (Figure 5c). These analyses were performed by SIM, and tested against a threshold of 12.5 (equivalent to LOD 2.75). This was the average permuted threshold from the separate analyses. Collectively, these analyses gave evidence for a greater number of QTL than did the combined analyses. Two explanations should be considered: firstly, the 16 separate analyses are analogous to performing 16 different experiments. Since the error rate is controlled separately within each experiment, one expects a higher cumulative rate of type I error than with the single combined analysis. However, this alone may not account for the number of additional effects seen in the separate analyses. A second consideration is that several of the QTL detected by separate analysis seem to be unique to a single environment (Figure 5c). For a QTL that acts in only one environment, one would expect that the analysis in that environment would be more powerful than a combined analysis. Depending on the motivation for analysis, either approach may be more useful, but the combined analysis is more useful if one is interested in finding QTL that have effects in most environments.
The results of the sCIM analysis of the Steptoe/Morex yield data illustrate of one of the advantages that sCIM may have over CIM. The sCIM approach gives clear evidence that there are two QTL which interact with environment on chromosome 2. This is supported by the fact that QTL in the separate analyses tend to be clustered at the positions of both of these peaks. This would probably be missed if one relied only on the SIM analysis. A similar phenomenon may be present on chromosome 6, although this is less certain.
The next step in this analysis may be to look for groups of environments that have similar effects at the loci exhibiting large QTLxE interactions. Perhaps if the data are grouped into sets of environments that behave similarly, analyses could be performed that are more precise and well-defined. Further testing of the lines in specifically controlled environments may help to characterize the biological significance of certain interactions. Alternatively, one could analyze subsets of environments that represent reasonable targets for a marker- assisted breeding program. It is possible that certain QTLxE interactions may revert to main effects when the scope of environments is restricted this way.
General Discussion and Conclusions
There are many potential variations on sCIM. For example, after an initial analysis by SIM, background markers could be placed strategically in regions where there is evidence for QTL. However, there is a danger that this type of exploratory analysis will become a search for the best possible model, rather than the objective type of analysis that sCIM is intended to be. There is probably no way to control error rate in the 'best model' approach, and users are cautioned against using sCIM (or CIM) in this way. Simulation experiments 1 and 2 were each performed with a set of uniformly spaced background markers, and this set was not changed or iterated to 'improve' the results. All conclusions that we have presented are based on this type of scenario. Our conception of how this type of analysis should be used is that a single set of uniformly spaced background markers should be chosen, and the same set used throughout the analysis of all traits. Any further improvement to QTL estimates that are made in this way should be based on other techniques, or other methods of error control.
The optimum density for background markers is an issue that requires further investigation. Further experimentation with real and simulated data (not shown) shows that the use of a high density of background markers does not produce useful results. The peaks that represent real QTL become overwhelmed by spurious peaks, and the power to detect real QTL is lost. In the opposite extreme, use of very few background markers produces results that are similar to SIM, and the power to distinguish the presence of two linked QTL disappears. Our experience indicates that a density of background markers between 20 and 40 cM is adequate for analyzing a genome the size of barley (approximately 1200 cM). It is important to realize that the estimation of partial regression coefficients for the background markers uses up a large proportion of the information that is present in the mapping population. Thus, for a population of 150, it might be reasonable to use a total of 40 background markers, whereas for a population of 300, one might use considerably more. The distribution of the mapped markers should also be considered when choosing background markers. If the available markers are clustered, then fewer markers should be chosen as background markers. Use of background markers that are tightly linked to each other can produce the same type of meaningless results that are obtained by using a high total number of background markers.
A primary motivation behind this study was to develop simplified methods of CIM that could be applied objectively to large sets of data from many environments. The MQTL software for implementing sCIM is described in a companion note. We anticipate that other new methods for QTL analysis will be developed, and that these will facilitate rich new insights about biological phenomena. We hope that the procedures presented here, and the accompanying software, will be useful as exploratory tools in large data sets, and that they will be compared to other methods of QTL analysis.
Churchill, G.A., and R.W. Doerge. 1994. Empirical threshold values for quantitative trait mapping. Tech. Rpt. Biometrics Unit, Cornell Univ, Ithica NY.
Haley, C. S., and S. A. Knott. 1992. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69: 315-324.
Hayes, P.M., B.H. Liu, S.J. Knapp, F. Chen, B. Jones, T. Blake, J. Franckowiak, D. Rasmusson, M. Sorrells, S.E. Ullrich, D. Wesenberg, and A. Kleinhofs. 1993a. Quantitative trait locus effects and environmental interaction in a sample of North American barley germ plasm. Theor. Appl. Genet. 87: 392-401.
Jansen, R.C., and P. Stam. 1994. High resolution of quantitative traits into multiple loci via interval mapping. Genetics 136: 1447-1455.
Jiang, C., and Zeng, Z-B. 199_. Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics (in press).
Kleinhofs, A., A. Kilian, M.A. Saghai Maroof, R.M. Biyashev, P. Hayes, F.Q. Chen, N. Lapitan, A. Fenwick, T.K. Blake, V. Kanazin, E. Ananiev, L. Dahleen, D. Kudrna, J. Bollinger, S.J. Knapp, B. Liu, M. Sorrells, M. Heun, J.D. Franckowiak, D. Hoffmann, R. Skadsen, and B.J. Steffenson. 1993. A molecular, isozyme and cytological map of barley (Hordeum vulgare) genome. Theor. Appl. Genet. 86: 705-712.
Knapp, S.J., and W.C. Bridges. 1990. Using molecular markers to estimate quantitative trait locus parameters: power and genetic variances for unreplicated and replicated progeny. Genetics 126: 769-777.
Lander, E. S., and D. Botstein. 1989. Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185-199.
Lincoln S., Daly M., Lander E. 1992. Mapping genes controlling quantitative traits with MAPMAKER/QTL. Whitehead Institute Technical Report. 2nd edition.
Rodolphe, F., and M. Lefort. 1993. A multi-marker model for detecting chromosomal segments displaying QTL activity. Genetics 134: 1277-1288.
Tinker, N.A., and D.E. Mather. 1993. GREGOR: Software for genetic simulation. J. Hered. 84: 237.
Tinker N.A. and D.E. Mather. 1995. MQTL: software for simplified composite interval mapping of QTL in multiple environments. JQTL 1: (2).
Zeng, Z-B. 1993. Theoretical basis for separation of multiple linked gene effects in mapping quantitative trait loci. Proc. Natl. Acad. Sci. USA 90: 10972-10976.
Zeng, Z-B. 1994. Precision mapping of quantitative trait loci. Genetics 136: 1457-1468.
Figure 1. Scans of a test statistic for (a) simple interval mapping, (b) composite interval mapping, and (c) simplified composite interval mapping. Each scan represents models that were tested at 1 cM intervals in one simulated population with a hypothetical genome (see Table 1). The positions of A through E represent the true positions of simulated QTL. Short vertical lines represent the positions of 42 background markers. Dotted lines represent an experimentwise P=0.05 threshold that was estimated by analyzing 1000 permutations of the same simulated data.
Figure 2. Frequency of peaks in 6 QTL regions for scans of a test statistic based on (a) simple interval mapping, (b) composite interval mapping, and (c) simplified composite interval mapping. The black part of the histogram represents the portion of peaks that were significant (p=0.05); the grey part of the histogram represents the portion of peaks that were non- significant. Frequencies are based on results of analyses in 2000 simulations (Table 1). Region A and F each constitute an entire chromosome. Regions B through E each constitute half of a chromosome (delineated by a vertical dotted line). Each region contains one QTL (indicated by the position of the circled letter) except for region F.
Figure 3. Scans of a test statistic for (a) simple interval mapping and (b) simplified composite interval mapping. The scan above the horizontal axis represents the test statistic for QTL main effect, and the scan below this axis represents the test statistic for QTLxE interaction. Models were tested at 1 cM intervals in one simulated population with a hypothetical genome (see Table 2). The positions of A through E represent the true positions of the simulated QTL. Short vertical lines represent the positions of 42 background markers. Dotted lines represent the experimentwise P=0.05 threshold that was realized by analyzing 5000 simulations.
Figure 4. Frequency of peaks in 6 QTL regions for scans of a test statistic based on (a) simple interval mapping and (b) simplified composite interval mapping. Histograms above the horizontal axes represent peaks of the test statistic for QTL main effect, while histograms below this axis represent peaks for the test statistic for QTLxE interaction. The black part of the histogram represents the portion of peaks that were significant (p=0.05); the grey part of the histogram represents the portion of peaks that were non- significant. Frequencies are based on results of analyses in 5000 simulations (Table 2). Region A and F each constitute an entire chromosome. Regions B through E each constitute half of a chromosome (delineated by a vertical dotted line). Each region contains one QTL (indicated by the position of the circled letter) except for region F.
Figure 5. Analysis of grain yield in a population of 150 doubled- haploid lines from the barley cross Steptoe/Morex tested in 16 environments. Scans of a test statistic are plotted for (a) simple interval mapping, (b) simplified composite interval mapping, and (c) simple interval mapping with separate analysis in each environment. The scans above the horizontal axis represents test statistics for QTL main effect, and the scans below this axis represents test statistics for QTLxE interaction. Short vertical ticks on the horizontal axis represent the positions of background markers used for simplified composite interval mapping. Models were tested at 5-cM intervals on eight linkage groups (the last two linkage groups are known to represent barley chromosome 7). Thresholds (dotted lines) were derived by analyzing 1000 random permutations of the data. In (c), each superimposed scan represents results from a different environment.
Table 1: Simulated* trait used to test procedures for interval mapping with a single environment. ============================================================ QTL identification A B C D E F ----------------------------------------- Chromosome 1 2 2 3 3 4 cM Position 70 30 110 30 110 NA Effect** 1 1 2 -1 2 0 ============================================================