Zhiwei Li, Xiogng Liu,Xiojie Xu, Jicheng Liu, Zhiqin Sng,c, Kncho Yu,d,Yuxin Yng, Wenshung Di, Xin Jin, Yunbi Xu,b,*
aInstitute of Crop Science,Chinese Academy of Agricultural Sciences,Beijing 100081, China
bInternational Maize and Wheat Improvement Center(CIMMYT), El Batan 56130,Texcoco,Mexico
cXinjiang Academy of Agricultural Reclamation,Shihezi 832000,Xinjiang,China
dNortheast Agricultural University,Harbin 150030,Heilongjiang,China
A B S T R A C T
Archaeological and genetic evidence reveals that maize (Zea mays ssp.mays L.)was domesticated approximately 10,000 B.P.in the Balsas River basin of southwest Mexico, under high temperature and short day length [1-3]. Following the largely unconscious process of domestication, maize was subjected to targeted improvement, and became adapted to tropical environments mainly through development of landraces,synthetics, and open-pollinated cultivars. More recently,maize has been selected for performance in temperate environments and adapted to modern agro-ecosystems through the development and commercialization of hybrid cultivars. Domestication and post-domestication selection have contributed to large differences between tropical and temperate maize groups, the former showing much richer genetic diversity and more rapid LD (linkage disequilibrium)decay than the latter[4].
Several studies [5-7] have suggested that tropical maize with favorable alleles for resistance to biotic and abiotic stresses should be used to increase genetic diversity for breeding programs. Temperate germplasm with high yield potential can be used as a source of several tolerance-related traits, including high density tolerance and functional staygreen, that can be exploited to improve tropical maize.However, gene introgression between temperate and tropical maize groups has been hindered, mainly by their differences in response to day-length and photoperiod. Flowering time and photoperiod sensitivity are complex, showing a wide range of phenotypic variation influenced by many minor genes, which determine the adaptation of plants to their ecological environments and influence the exchange of germplasm resources across regions with different latitudes[8,9]. Photoperiod sensitivity is usually considered to be a prerequisite for crop survival and reproduction under various environments [10], and is indirectly evaluated by comparing QTL (quantitative trait locus) for flowering time identified under multiple regions in different latitudes[11,12].Studies of days to heading (DTH) [13] and to flowering (DTF) [14] in response to photoperiod and temperature of rice showed that QTL for DTH and DTF did not coincide with QTL for photothermo sensitivity, indicating that photoperiod sensitivity and flowering time were independent. Maize is a windpollinated outcrossing species that grows in a range of environments from tropical southwestern Mexico to the Andean highlands and has been widely introduced into tropical and temperate regions [15]. Natural and artificial selection, especially continuous selection over the past century, of key genes affecting flowering time have led to the photoperiod sensitivity of tropical maize and reduced the photoperiod sensitivity of temperate maize[16],such that the time required for maize landraces to mature ranges from 2 to 11 months [17]. Some phenotypic traits of maize have been discovered to be highly associated with photoperiod sensitivity,in particular number of leaves,plant height,and flowering time including days to pollen shed, silking, and anthesis[16,18,19].These traits can thus be used as efficient indicators of photoperiod sensitivity.
Selective signals and associated genomic regions and candidate genes can be identified and studied using highdensity molecular markers such as single nucleotide polymorphisms (SNPs). Although theoretically there are up to four different alleles per SNP locus in a population, SNP arrays can detect only two of them [20,21]. Haplotypes derived from a set of SNPs within a single LD block can be used as markers with more alleles [22]. Haplotype construction compresses multiple SNPs into a haplotype locus and optimizes the design of genomic selection (GS) and genomewide association studies (GWAS) [23]. Haplotype analysis provides two further advantages. The association between a trait and a specific allele depending on cis interactions with other loci may not be recognized by SNPbased analysis until the functional haplotypic unit is used in GWAS. Differences in haplotype diversity and frequency across populations may be valuable for identifying variants that are the most likely determinants of phenotypic traits[24,25].
Natural and artificial selection of favorable mutations associated with adaptation leads to reduced polymorphism,increased LD, and increased allele frequency [26-28]. When selection for a beneficial mutation occurs, the genetic variation in the neighboring region will be homogeneous,leaving selective signals in the genome and ultimately shaping the phenotype. With the rapid development of chip technology and high-throughput sequencing technology,selective signals can be identified at the whole-genome level. Selective signatures identification can be divided into three categories according to the source of genomic information and the algorithm employed. (i) Potential selection signatures are identified based on population differentiation,including FST[29] and di [30], by comparison of their allele frequencies among different subgroups.(ii)Potential selection signatures are identified based on LD, including EHH (extended haplotype homozygosity) [31], iHS (integrated extended haplotype homozygosity score) [32] and XP-EHH(cross-population extended haplotype homozygosity) [33],which describes the homozygosity of the extended haplotype shaped by the “hitchhiking effect”. (iii) Potential selection signatures are identified based on allele frequency patterns,including Tajima's D statistic [34] and the methods related to Tajima's D,by distinguishing the difference between θπand θwthat indicates a signature of low-frequency mutation. In addition, emerging approaches commonly employ XP-CLR(cross-population composite likelihood ratio), which is used to reveal historical selection by comparing allele frequency spectrum at linked loci [35], and the combination of the θπratio (θπ,domesticated/θπ,improved) and FST, which involves an empirical procedure employing sliding windows with both significantly low or high θπratios and significantly high FSTvalues to identify potential regions affected by long-term and intensive selection over the whole genome [36-38]. However,small sample sizes weaken the power to detect association of selective signatures with domesticated and improved traits.Recently, large samples have been used to detect selective signals, with respectively 115, 278, and 302 lines included in selective signature analyses of cucumber[39],maize[40],and soybean[41].
Identifying genetic determinants of flowering time and photoperiod sensitivity is usually considered to be a prerequisite for successful exchange of germplasm resources across regions adapted to different latitudes. It is thus desirable to discover selective signatures regulating flowering time and photoperiod sensitivity that distinguish temperate from tropical maizes.In this study,we sought to identify genomewide changes between temperate and tropical maize groups by selective signature analysis and GWAS.
The diverse maize germplasm panel used in the study (Table S1), consisted of 410 maize inbred lines. Of these, 238 were temperate inbred lines selected from breeding programs and germplasm collections to represent maize germplasm improved for and adapted to temperate environments and 172 were tropical inbred lines selected from CIMMYT to represent maize germplasm with high genetic diversity but less improved and more adapted to the original environment where maize was domesticated.
The panel had been genotyped with the Maize 55 K SNP Affymetrix Axiom Genotyping Array [42]. From this SNP dataset, 39,350 SNPs were selected as high-quality SNPs with minor-allele frequency (MAF) greater than 0.05 and maximum missing rate less than 20%. They were evenly distributed across the maize genome, with chromosomes carrying 2850 (chromosome 10) to 6050 (chromosome 1)SNPs.
The diverse maize panel was planted in 2014 in three locations, Shunyi, Beijing (40°13′N; summer), Xinxiang,Henan (35°18′N; summer) and Sanya, Hainan (18°09′N; winter), with different environments that affect maize flowering,using a randomized block design with two replications. Each two-row plot was 3 m long with 60 cm between rows and 25 cm between plants.The number of days from planting to the time at which more than 50%of the plants in a plot displayed tasseling, silking, and anthesis was measured as days to tassel (DTT), days to silk (DTS), and days to anthesis (DTA),respectively. The interval between DTA and DTS for the plot was calculated as anthesis-silk interval (ASI). Photoperiod sensitivity was measured as the relative difference index(RD)[43,44],calculated for each flowering trait phenotyped in pairs of environments (Shunyi vs. Sanya; Xinxiang vs. Sanya) as follows:
where Liis the mean performance of a trait under long-day conditions (Shunyi or Xinxiang) and Siis the mean performance of a trait under short-day conditions (Sanya, Hainan).The RDs for DTT, DTS, and DTA are abbreviated as RD_DTT,RD_DTS, and RD_DTA, respectively.
Using TASSEL software [45], a SNP array of 410 maize accessions was constructed and used to construct a cladogram of SNPs across the whole genome using genetic distances between inbred lines. The online toolkit iTOL(https://itol.embl.de/) [46], was then applied to generate the neighbor-joining tree which displayed the genetic distance gained above. Principal component analysis (PCA) was conducted using TASSEL.
Squared correlation coefficients (r2) were calculated with PLINK software [47]. The parameters were set to “-ldwindow-r2 0 -ld-window 999999 MAF >0.05”. The mean r2was computed for pairs of SNPs within 20-kb intervals across the genome.
The genetic parameter population-differentiation statistic FSTwas used to measure genetic distance or population differentiation using genetic polymorphism data[29].Nucleotide diversity (θπ) was employed to measure the degree of diversity within a population [48]. Tajima's D statistic was computed to distinguish random from non-random evolution of DNA sequences [34]. To identify selective signals across the whole genome between temperate and tropical maize groups, FST, θπ, and Tajima's D were calculated as indicators of selective signatures by a sliding window method (including 10 SNPs per sliding window in steps of 2 SNPs). FSTwas calculated for high quality SNPs as follows[49]:
where πBetweenrepresents genetic differences across populations and πWithinrepresents genetic differences within populations.
θπwas calculated according to Nei and Li [48],using
where xiand xjare the frequencies of the ith and jth sequences, respectively, πijis the number of nucleotide differences at each site between the ith and jth sequences,and n is the number of sequences over all samples.
Tajima's D was calculated as follows[34]:
wherea1=n denotesthe number ofsamples,S denotes
thenumber of segregatingsites,Vdenotes the variance of π-and π is the sum ofthe pairwisedifferences per site between sequences.
The θπratio (θπ,tropical/θπ,temperate) was calculated and used to evaluate the direction of selection.θπ,FST,and Tajima's D statistic were calculated using the PopGenome package in R[50].
To identify putative selective signatures associated with the temperate maize group, selective signature analysis using temperate and tropical maize groups was performed to identify selective signatures over the genome. The allele frequencies of SNP loci were used to identify selective sweep regions that were probably shaped by continuous selection.The top 5%values of θπratio were adopted to identify putative selective regions. Likewise, the top 5% values of FSTwere adopted to confirm highly differentiated regions. The intersections of windows based on FSTand θπratio were assigned as potential selective regions.
To identify biological functions of candidate genes associated with selection, annotation and enrichment analysis were performed by submitting gene information to Gene Ontology(GO) using the agriGO online platform(http://bioinfo.cau.edu.cn/agriGO/) [51]. A hypergeometric distribution adjusted by false discovery rate (FDR) was employed to check the relationship of different genes in identical GO terms and return a P-value for each term. GO terms with P-value <0.05 were taken as those in which candidate genes were significantly enriched.
GWAS was performed with TASSEL for seven traits associated with flowering time and photoperiod sensitivity, including DTT, DTS, DTA, ASI, RD_DTT, RD_DTS, and RD_DTA. SNP loci were selected for identification of candidate genes based on a threshold of P ≤0.0001. To correct false positives, a mixed linear model (MLM, PCA + K) was applied in GWAS for both temperate and tropical maize groups together.The population structure was determined by PCA. Thus, kinship matrix (K)was treated as a random effect in MLM.
Haplotype loci were identified from the high quality SNP set with 39,350 SNPs using Haploview's interpretation to define haplotype loci as recommended by Gabriel S[52],using PLINK.The parameters were set to “-allow-extra-chr -blocks nopheno-req -blocks-max-kb 1000 -geno 0.2 -blocks-stronglowci 0.8 -out -vcf ”, which means that two sites are considered to be in a block of LD if the bottom of the confidence interval of r2is greater than 0.8. The R package GHap was employed to call haplotypes and identify haplotype alleles across all inbred lines. A haplotype genotype matrix,generated by the ghap.haplotyping program, was transformed to PLINK format with the ghap.hap2tped program, and converted to VCF format for haplotype-based GWAS. Phenotypic variation explained was estimated for both SNPs and haplotypes using marker R2, which takes genetic background and population structure into account[53,54].Taking traits related to flowering time and photoperiod as dependent variable,multiple regression analysis was carried out with significantly associated haplotype loci, respectively.
Table 1 shows phenotypic means for maize flowering time(DTT,DTS, DTA, and ASI) and photoperiod sensitivity (relative difference index traits, RD_DTT, RD_DTS, and RD_DTA). Tropical inbreds had relatively long flowering times. With decreased latitude,the number of days to flowering was shortened for both temperate and tropical inbred lines. Compared with temperate maize, tropical maize was relatively sensitive to photoperiod as revealed by both comparisons,Shunyi vs.Sanya and Xinxiang vs.Sanya, and temperate maize showed a greater difference in longer days and lower temperature during the crop season (Fig.S1, Fig. S2). The broad-sense heritability estimates of flowering time and photoperiod sensitivity were moderate to high except for ASI(Table 1).Both flowering time and photoperiod sensitivity showed wide genetic variation for GWAS and associated analyses.
The inbred lines could be clearly divided into two major groups, temperate and tropical, although a few lines were mismatched in the neighbor-joining tree (Fig. S3). PCA visualization using the first three eigenvectors shows a similar classification result (Fig. S4). The levels of LD were attenuated to half of their maximum values at 40 kb for the tropical and 120 kb for the temperate group (Fig. S5). The mean distance of LD decay in the temperate group was thus three times that in the tropical group,supporting that tropical maize underwent intensive recombination with more genetic diversity.
FST, θπratio, and Tajima's D across the whole maize genome are shown in a circus plot (Fig. S6). A total of 106 putative selective-sweep regions were identified (Fig. 1), eachconsisting of four genes in a region of mean size 208.7 kb,covering in total 1.07% of the maize genome. These regions were validated by a significantly lower level of Tajima's D (Pvalues 2.2×10-16)in the temperate maize group.
Table 1-Phenotypic means and heritability for flowering time and photoperiod sensitivity.
The identified selective-sweep regions contained 423 protein-coding genes, linked with only 53 gene models.Taking a maize genome annotated with GO as background reference, 37 significant GO terms were detected in the candidate genes of selective-sweep regions, of which 30 were categorized as biological process, 6 as molecular function,and 1 as cellular component(Fig.2).GO enrichment analysis indicated that candidate genes from selective-sweep regions in temperate maize were significantly enriched in biosynthesis and regulation of biological process pathways.Specifically, three genes identified for flowering in previous studies [55,56] were overrepresented in multiple GO terms.Both GRMZM2G124421 and GRMZM2G129034 displayed overrepresentation in GO:0003700 (transcription factor activity),GO:0005634 (nucleus), GO:0006355 (regulation of transcription), while GRMZM2G180555 showed a strong overrepresentation in GO:0005634 (nucleus) and GO:0003677(DNA binding). The identification of candidate genes within selective signatures and the enrichment analysis of the candidate genes show that candidate genes associated with biological regulation and biosynthesis pathways, especially for regulation of flowering time, experienced more selection in the process of differentiation between temperate and tropical maize groups.
Fig.1-Genome-wide selective sweep analysis of the temperate maize group.Distribution of the θπ ratio(θπ,tropical/θπ,temperate)and FST values were calculated in 10-SNP sliding windows with 2-SNP steps.The horizontal and vertical lines represent threshold lines of the top 5%of the FST and θπ ratio values,respectively.Points(red)located in the top right sector represent selective signatures of the temperate maize group.Blue and red bins in the histograms of FST(right)and θπ ratio(top)represent levels respectively higher and lower than the threshold line.
Previous studies have shown that the process of maize improvement and adaptation is not strictly sequential, but overlapped [57]. Several genes associated with flowering time have been reported in recent studies [55,56]. Comparison with the physical loci of previously identified flowering-time genes revealed that 25 known flowering genes overlapped the selective sweep regions identified in this study(Fig.3,Table S2).
A 160-kb genomic region around each significant SNP from the 63 moderate GWAS signals was analyzed for DTT,DTS,DTA,and ASI (Fig. S7). A total of 16 genes involved in flowering time were found to overlap with previously reported genes identified in GWAS(Table S3).Comparison of each GWAS-identified locus with selective-sweep regions as described for soybean[41],cotton[58], and sesame [59] revealed 35 candidate genes that may regulate flowering time,of which 27 were associated with ASI and eight with DTT, and two (GRMZM2G015384 and GRMZM2G031447) overlapped known flowering-time genes (Fig.3a, Table 2). The allele frequency distribution and nucleotide diversity at the significant GWAS-identified loci showed similar results that these loci identified by selective signatures and GWAS signals have undergone selection.
A previous study [60] suggested that several traits (DTS,DTT, DTA, ASI, plant height and ear height) reliably reflects the photoperiod sensitive characteristics of different maize groups, so that these traits can be employed as a reliable indicator for photoperiod sensitivity measured by their relative difference under different day lengths. Diverse ecological types and various traits in maize germplasm have different levels of photoperiod sensitivity, indicating its complexity. In the present study, RD_DTT, RD_DTS, and RD_DTA measured for temperate maize under two contrasting day-length conditions, in Shunyi and Sanya, were significantly lower than those for tropical maize (Welch's ttest with P-values of 7.051×10-13,2.828×10-13,and 1.296×10-11,respectively). However, no significant difference was found between temperate and tropical maize groups under the other pair of contrasting conditions, Xinxiang, Henan and Sanya.Under the Shunyi vs.Sanya contrast,6,5,and 7 candidate loci were identified by GWAS to be associated with RD_DTT,RD_DTS, and RD_DTA, respectively (Fig. S8a). In the Xinxiang vs. Sanya contrast, however, only 1, 2, and 3 SNPs were identified for RD_DTT, RD_DTS, and RD_DTA, respectively(Fig. S8b). Yet no significant SNPs were shared between the two sets of GWAS. When a 160 kb window around each significant SNP was used to identify candidate genes, two candidate genes were identified by both GWAS and selective sweep regions (Table 2). These results can be used to functionally characterize candidate genes in association with flowering time and photoperiod sensitivity for improvement of tropical maize.
Fig.2- GO enrichment analysis of candidate genes of selected regions in temperate group.Blue,green,and red bars indicate biological process,molecular function,and cellular component,respectively.
From 39,350 SNPs, 4166 haplotype loci were constructed, each containing 2 to 75 SNPs with a means of 3.5 SNPs and 2.8 alleles per haplotype.Haplotype-based GWAS was performed for four flowering traits and three photoperiod sensitivity traits using the data collected in three locations. Using a Q+K model, we consistently identified 16 different haplotype loci surpassing the significance threshold (-lg P ≥4). Of 159 candidate genes within the significant haplotype,only eight overlapped known genes associated with flowering time(Table S3).
Multiple regression revealed that significant haplotype loci explained 18.6%, 17.5%, and 17.7% of phenotypic variation atthe 0.01 significance level for DTT,DTS,and DTA,respectively(Table 3). For photoperiod sensitivity, however, significant haplotype loci explained 14.8%, 11.2%, and 15.5% of the phenotypic variation for RD_DTT, RD_DTS, and RD_DTA,respectively(Table 4).
Table 3-Multiple regression models for DTT, DTS, and DTA determined by haplotype loci significantly associated with flowering time.
Table 4-Multiple regression models for RD_DTT, RD_DTS, and RD_DTA determined by haplotype loci associated with photoperiod sensitivity.
No significant haplotype loci were shared between flowering time and photoperiod sensitivity, indicating that haplotype loci associated with the two trait types were independent of one another. For seven haplotype loci significantly associated with at least two traits of flowering time and photoperiod sensitivity, the great majority of haplotype alleles showed the same effect direction(Table 5).
For the four haplotype loci that were identified using multi-location data for multiple traits,HapL499 on chromosome 1, significantly associated with flowering time,contained one gene(Fig.4a)while HapL978 on chromosome 2, significantly associated with photoperiod sensitivity,contained five genes(Fig.4b),which have not been reported previously except for GRMZM2G166337 annotated as a transcription factor.No candidate genes were found within HapL4054 and HapL4055 on chromosome 1 to be significantly associated with photoperiod sensitivity. The inbred lines carrying GATT at HapL499 had relatively short DTT,DTS, and DTA compared to other three haplotype alleles(Table 5). HapL4054 and HapL4055 were significantly associated with RD_DTT, RD_DTS, and RD_DTA. CA and CG at HapL4054 and TA and TG at HapL4055 showed opposite effects on photoperiod sensitivity. Inbred lines carrying CA and TA were less sensitive to photoperiod than other haploytype alleles in same haplotype locus (Fig. 5). Thus,haplotype alleles GATT, GTTGT, CA, and TA, which had negative effects on flowering time and photoperiodsensitivity, can be considered as favorable haplotypes for shortening flowering time and reducing photoperiod sensitivity. The temperate group showed a higher frequency of the haplotype alleles GATT, GTTGT, CA, and TA than the tropical group.
Table 5-Significant haplotype loci associated with flowering time and photoperiod sensitivity.
Also, as described by the result of GWAS for DTT,haplotype-based GWAS increased the proportion of phenotypic variation explained in comparison with SNP-based GWAS (Fig. 6). Similar results were found for the other traits(Fig. S9). Haplotype-based GWAS provides some favorable haplotype alleles for introducing into tropical maize and might identify specific haplotypes associated with phenotypic traits by detecting associations between traits and alleles that depend on cis interactions with other loci.
Fig.4- Identification of candidate genes as shown by the peaks on chromosomes in GWAS for DTA in HapL499 on chromosome 1(a)and RD_DTT in HapL978 on chromosome 2(b).Local Manhattan(top),candidate gene structures and names located in the haplotype locus(middle) and LD heatmap (bottom).
Fig.5- Boxplots for RD_DTT,RD_DTS,and RD_DTA based on haplotypes for HapL4054(left) and HapL4055(right).Box edges represent 25%quantile (top)and 75%quantile(bottom)while the black bold line represents median value. Statistical significance was determined by Welch's t-test.
Fig.6- Comparison of powers(phenotypic variation explained)in GWAS based on single SNP markers(right)and haplotype loci(left).
Maize is one of the most diverse plants at both genetic and morphological levels [61-63]. Previous studies have revealed that the genetic diversity of the breeding pool of elite temperate maize germplasm has declined over the past century [64]. However, the decline of genetic diversity can be mitigated by use of untapped sources, including landraces and wild relatives [65]. Tropical and subtropical maize germplasm resources with a wide range of genetic variation in biotic and abiotic stress tolerance are valuable for maize improvement, particularly for temperate maize.However,the introduction and cultivation of tropical maize has been hindered by its lack of adaptability to temperature and photoperiod [5]. In this study, we identified candidate genes and favorable haplotypes using selective signature analysis and GWAS, which can be applied to molecular breeding for using germplasm resources in different regions.
Teosinte, a presumed ancestor of maize, probably evolved ecotypes with photoperiod sensitivity to coordinate its reproductive stages to water, short-day environmental conditions [66,67]. Maize reduced its photoperiod sensitivity to adapt to long-day conditions when dispersing from its original area to temperate latitudes [68]. Thus, photoperiod sensitivity is an important factor for widening the planting area of diverse maize varieties.A study[13]of rice response to temperate and photoperiod showed that four genome regions associated significantly with DTH had much lower likelihood of odds(LOD)scores for photo-thermo sensitivity(PTS)than a threshold line, while a PTS QTL region similarly showed a much lower LOD score for DTH than the threshold line. This result indicates that the PTS QTL is independent of DTH QTL.Mining candidate genes associated with photoperiod sensitivity will help to understand the genetic changes during domestication and improvement and contribute to reducing the barriers to use of tropical germplasm [5,69,70]. In the present study,SNP-or haplotype-based GWAS identified 11,9,and 15 significant loci for RD_DTT, RD_DTS, and RD_DTA,respectively. Candidate genes within significant signal regions could be deployed to overcome the barrier of photoperiod sensitivity and improve temperate maize using tropical germplasm resources.
XP-EHH,an algorithm based on haplotype,was developed[33]to identify recently fixed or high allele frequency in selective sweep regions via across-population comparison.The XP-EHH procedure is insensitive to background selection and thus constitutes a less confusing method for systematic detection of positive selection in a genome[71].In comparison with FST,however, the XP-EHH algorithm requires a denser set of SNP markers. Both XP-CLR and FSTidentify signatures of domestication and improvement by comparing allele frequency between two or more populations. However, the XP-CLR algorithm was developed [72] to reveal “ancient” selective signatures from several thousand years ago. The combined use of FSTand θπratio has been shown[36,73]to be an effective method for identifying selective signals, especially when functional regions responsive to specific environmental conditions are being sought. In the present study, FST, and θπratio were employed to investigate genome variation between temperate and tropical maize groups. By comparing the 106 selective sweep regions identified for temperate maize in this study with previously reported features (466 involved in maize domestication and 573 involved in maize improvement) [74], we found that 15 selective sweep features overlapped domestication features and eight overlapped improvement features (Table S4), supporting the reliability of our study.
The power of association analysis for detecting positive associations between SNPs and traits can be shown by explained phenotypic variation within a population [16].Phenotypic variance of a specific trait is determined by phenotypic effect and frequency differences among different allelic variants in the test population. SNPs afford a limited number of alleles per locus, in turn limiting the polymorphism information content at each locus [75-77] whereas haplotype data contains multiple haplotypes (2-9) at each haplotype locus and a higher level of allele diversity.Accordingly, in this study the use of haplotypes increased the amount of phenotypic variation explained,as indicated by GWAS for DTT,supporting the advantages of haplotype-based association analysis. SNP-based GWAS identified 87 significant SNPs above threshold for seven traits, many more than the 16 significant haplotype loci identified by haplotype-based GWAS at the same threshold.This result is due largely to the fact that haplotype data have fewer loci, requiring a lower threshold value. In addition, haplotype length was probably underestimated for the temperate maize group but overestimated for the tropical maize group when haplotype loci were identified using the entire sample. In selecting an improved threshold of linkage disequilibrium (r2≥0.8), we expected that bias, if any, would be greatly reduced. A previous report [4] recommended that the combination of SNP-and haplotype-based GWAS is a better option to identify associated signals than either alone.
In the process of domestication and improvement,long-term natural and artificial selection in maize has resulted in high genetic variation associated with yield, plant type, flowering time, and photoperiod sensitivity, and also left differential footprints on the genome across germplasm groups.Selective signature analysis focuses on detecting signatures of positive natural and artificial selection and mining candidate genes[78]. Although several studies [37,72,79] have searched for selective signatures using diverse algorithms, few candidate genes associated with specific traits have been identified accurately.Given that the selective signatures identified have spanned several or even dozens of kilobases, it may not be possible to identify all candidate genes within selective signatures. GWAS provides a powerful tool to reconnect phenotypic traits back to their underlying genetic factors[75]. Although GWAS is a sensitive means of identifying candidate genes that have experienced selection, selective signature analysis is preferred and can be used to overcome the limitation. In the present study, both selective signature analysis and GWAS were used, with candidate genes identified, of which 35 genes were associated with flowering time and two with photoperiod sensitivity.
Generally, the process of domestication and improvement reduces effective population size and levels of genetic diversity[80,81].However,we found that the genetic diversity of temperate maize group (π = 5.0×10-5) was slightly higher than that of tropical maize group (π = 4.7×10-5), a finding inconsistent with our PCA result. In the PCA plot analysis,only the top two or three principal components could be selected for visualization, and the cumulative interpretation rate was less than 15%,possibly resulting in biased estimates of genetic diversity. Additionally, the temperate maize group included diverse materials from multiple breeding programs,whereas the tropical maize group was selected largely from CIMMYT. LD decay, usually described by decay distance,which indicates the extent of domestication and the intensity of selection, differed between the two groups. When the threshold of r2was set to 0.1,the decay distance in temperate germplasm increased to 380 kb,compared to 80 kb in tropical germplasm, indicating that much faster LD decay was detected in tropical maize. Thus, the extent of selection in temperate maize was much greater than that in tropical maize, a finding consistent with the reality that temperate maize has undergone stronger artificial selection in modern breeding.Also,LD decayed to 0.1 within the physical distance of 160 kb, a greater distance than found in an earlier study[82].The difference between the two LD decay distances may be ascribed to the difference in genome coverage of markers.LD decay varies from group(population)to group(population),especially under different levels of selection intensity. For example, the extents of LD decay in maize landraces, maize cultivars with extensive source and elite maize inbreds are 1 kb[83],1.5 kb[84],and 100 kb[82],respectively.Commonly,LD decay is expected to be higher in outcrossing than in selfpollinated species. Accordingly, LD decay distances in two self-pollinated species, Arabidopsis thaliana and rice (Oryza sativa L.), were 50 kb[85]and 100 kb[86],respectively.
Concerns about the reduction of genetic diversity in commercial hybrids and the depletion of genetic diversity in gene banks make it imperative to introduce exotic germplasm to broaden germplasm resources [87]. Compared with conventional backcrossing and phenotypic selection, rapid-cycle genome-wide marker-assisted selection strategies, such as GS and marker-assisted recurrent selection(MARS),have been shown to be faster and more effective in incorporating favorable alleles and haplotypes [88]. Furthermore, the development of sequencing technology has greatly reduced the cost and time required for genotyping, making it possible to implement MARS and GS on a large scale. Favorable haplotypes for flowering time and photoperiod sensitivity identified in the present study might provide a basis for introducing tropical maize to temperate regions.
An understanding of the genetic determinants of flowering time and photoperiod sensitivity is usually considered to be a prerequisite for successful exchange of germplasm resources across regions adapted to different latitudes. We identified selective signatures, candidate genes, and favorable haplotypes influencing flowering time and photoperiod sensitivity using selective signature analysis and GWAS in large populations.These genes and haplotypes with different allele distributions across germplasm groups or populations provide a resource for understanding genetic differences in the genome and for improving both temperate and tropical maize.Also,the rapid LD decay in the tropical group suggests that the tropical group might contain more rare alleles and undergo more recombination. GO enrichment analysis showed that these genes were mainly enriched in GO categories associated with biological regulation and biosynthesis pathways, indicating that genetic changes between temperate and tropical maize groups have occurred mainly in genomic regions influencing biological regulation and biosynthesis.
Declaration of competing interestAuthors declare that there are no conflicts of interest.AcknowledgmentsThis research was supported by the National Key Research and Development Program of China (2016YFD0101803), the Agricultural Science and Technology Innovation Program(ASTIP)of CAAS,and Fundamental Research Funds for Central Non-Profit of Institute of Crop Science,CAAS(1610092016124).Y. Xu was also supported by the Bill and Melinda Gates Foundation and the CGIAR Research Program MAIZE.Appendix A. Supplementary data Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2019.09.012.