Xiao Zhan, Zhiyon Rn, Bown Luo, Haixu Zhon, Pn Ma, Honkai Zhan,Honmi Hu, Yikai Wan, Haiyin Zhan, Dan Liu, Lin Wu, Zhi Ni, Yonhui Zhu,Wnzhu H, Suzhi Zhan,f, Shunzon Su, Yaou Shn,f, Shibin Gao,f,*
a Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, Sichuan, China
b Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Chengdu 611130, Sichuan, China
c Guangxi Qingqing Agricultural Technology CO., LTD., Nanning 530000, Guangxi, China
d Biotechnology and Nuclear Technology Research Institute, Sichuan Academy of Agricultural Sciences, Chengdu 610066, Sichuan, China
e Crop Research Institute, Sichuan Academy of Agricultural Sciences, Chengdu 610066, Sichuan, China
f State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Chengdu 611130, Sichuan, China
g College of Agronomy, Sichuan Agricultural University, Chengdu 611130, Sichuan, China
Keywords:Maize Yield traits Genome-wide association study (GWAS)Quantitative trait locus (QTL)Coexpression networks
ABSTRACT The study of yield traits can reveal the genetic architecture of grain yield for improving maize production.In this study,an association panel comprising 362 inbred lines and a recombinant inbred line population derived from X178×9782 were used to identify candidate genes for nine yield traits.High-priority overlap (HPO) genes, which are genes prioritized in a genome-wide association study (GWAS), were investigated using coexpression networks. The GWAS identified 51 environmentally stable SNPs in two environments and 36 pleiotropic SNPs, including three SNPs with both attributes. Seven hotspots containing 41 trait-associated SNPs were identified on six chromosomes by permutation. Pyramiding of superior alleles showed a highly positive effect on all traits, and the phenotypic values of ear diameter and ear weight consistently corresponded with the number of superior alleles in tropical and temperate germplasm. A total of 61 HPO genes were detected after trait-associated SNPs were combined with the coexpression networks. Linkage mapping identified 16 environmentally stable and 16 pleiotropic QTL.Seven SNPs that were located in QTL intervals were assigned as consensus SNPs for the yield traits.Among the candidate genes predicted by our study, some genes were confirmed to function in seed development.The gene Zm00001d016656 encoding a serine/threonine protein kinase was associated with five different traits across multiple environments.Some genes were uniquely expressed in specific tissues and at certain stages of seed development.These findings will provide genetic information and resources for molecular breeding of maize grain yield.
Maize(Zea mays L.)is a worldwide staple crop.The use of fertilizer has drastically increased grain yield over the past century.However, because excessive application of fertilizer can lead to environmental pollution, the genetic improvement of grain yield traits is a preferred way to increase maize production. Grain yield is a complex quantitative trait with low heritability controlled by multiple quantitative trait loci(QTL)[1,2].Since a large proportion of phenotypic variation is thus due to interaction between genotype and environment, measuring grain yield in multiple environments is necessary for identifying reliable gene loci[3].Grain yield can be partitioned into components:cob diameter(CD),cob weight(CW), ear weight (EW), ear length (EL), ear diameter (ED), 100-kernel weight(HKW),kernel number per row(KN),and row number (RN). These yield traits are more highly heritable than grain yield [4,5]. Dissecting the genetic mechanisms that underlie these yield components is accordingly a common strategy in grain yield research.
QTL linkage mapping is a conventional method for identifying genes underlying the quantitative traits. Many QTL for yield traits have been identified in bi-parent population.Some of these QTL are consistent across diverse environments. Using multiple F2:3populations in different environments, many QTL with high environmental stability were detected for kernel traits and other yield components [3,6]. With the development of sequencing technology, high-density SNPs are now used to construct genetic maps for increasing mapping precision. Using a high-density genetic map, QTL for RN and EL were confined to an interval of several centi-Morgans(cM) [7,8]. In a set of 10 RIL populations were used to dissect ear genetic architecture, 17–34 minor- and moderateeffect loci were found,accounting for 55%–82%of phenotypic variation [5]. QTL, such as KRN4 [9], qKRN8 [10], and qGW4.05 [11],qhkw5-3 [12], have been fine-mapped. Two candidate genes from fine-mapped QTL have been validated: krn1 is a major QTL that explained 22% of RN variation, with the candidate gene encoding an AP2-domain protein that regulates spikelet pair meristem development [13]. KNR6 is a serine/threonine protein kinaseencoding gene that determines pistillate floret number and EL.An absence of transposable elements in its regulatory region can significantly enhance grain yield [14].
GWAS is an efficient method for identifying candidate genes that underlie the yield traits. Because there is more variation in an association panel than in a bi-parental population,more candidate genes can be identified in a single experiment.A GWAS study using a panel of 292 inbred lines revealed 20 SNPs associated with ear traits[15].By combining GWAS and QTL mapping,many studies have concentrated on yield-related traits to extract more reliable QTL. Combining linkage mapping in a B73 × Mo17 population and a GWAS in an association panel identified consensus SNPs of yield traits[16],kernel test weight[17],and kernel size[18]. The candidate genes associated with kernel size were validated in Arabidopsis [18]. Another study used QTL linkage and association mapping jointly to characterize the genetic architecture of maize ear and grain morphological traits, identifying 26 environmentally stable QTL and six environmentally stable SNPs across multiple environments [19]. However, some limitations,such as population structure and linkage disequilibrium (LD), can reduce accuracy and precision, leading to false-positive identifications[20]. LD is the non-random association of alleles from different loci[21].The LD decay distance reflects the power of the GWAS and can identify a broad region that includes many candidate genes surrounding an identified locus.This complicates the confirmation of candidate gene in GWAS. To mitigate this issue, a study introduced high-priority overlap(HPO)genes,which are the prioritized candidate genes predicted using a computational approach integrating GWAS-identified loci and a coexpession network [22].This method was used in GWAS for the accumulation of 17 different elements in maize seed [22].
The object of this study was to identify genes that regulate yield-related traits, using an association panel of 362 inbred lines widely applied in Southwest China [23], and a RIL population derived from two variant inbred lines, namely X178 and 9782[24].The approach included the coexpression networks for increasing the reliability of the candidate genes identified by GWAS.Public transcription atlases related to seed development for understanding the expression patterns of the candidate genes.
An association panel was constructed by 362 inbred lines from the current Southwest China breeding program. The genotyping,population structure, and LD decay distance have been described in our previous report[23].The RIL population comprised 262 lines derived from two maize inbred lines, namely X178 and 9782. The genotyping information was also introduced in our previous study[24]. For the association panel, 239 inbred lines were planted at Chongzhou farm of Sichuan Agricultural University, Sichuan (CZ,plains region)in 2015,and 222 inbred lines were grown at Hongya,Sichuan(HY,mountainous region)in 2016.For the RIL population,all inbred lines were cultivated at Wenjiang, Sichuan (WJ, plain region) in 2013, Fenjiang farm of Sichuan Agricultural University,Ya’an, Sichuan (FJ, mountainous region) in 2014, Bifengxia farm of Sichuan Agricultural University,Ya’an,Sichuan(BFX,mountainous region) in 2014, and CZ in 2015. The field experiments were designed using a randomized complete block design of single row plots in two replications. The plots were 3 m in length with 0.75 m spacing between rows. Plants were managed using standard cultivation practices.
The ears of each inbred line were harvested simultaneously following full maturity. Five consistent-growth ears in each replication were selected for phenotypic data collection after thorough drying. Yield traits including EL, ED, CD, CW, EW, HKW, KN, RN,and grain yield per plant (Y) were measured. HKW was recorded as the mean weight of three replicates of 100 randomly selected kernels from each inbred line.
For a single environment,the mean value from two replications of each inbred line was calculated as phenotypic data for descriptive statistics.The data analysis module of Excel was used to determine the standard deviation (SD), coefficient of variance (CV),kurtosis, skewness, minimum, and maximum. Pearson’s correlation (r) for each pair of traits was calculated using the ‘‘Hmisc”[25] and ‘‘corrplot” [26] packages in R software [27]. The phenotypic variation of yield traits was evaluated by ANOVA. Genotype(G), environment (E), the interaction between genotype and environment (G × E), and replication were fitted with a general linear model as below:
where yijkis the phenotypic value of the ith inbred line in the jth environment,and the kth replication,μ is the overall mean of a trait,Giis the genetic effect of the ith inbred line,Ejis the environmental effect of the jth environment, Gi× Ejis the interaction between genotype and environment for the ith inbred line and jth environment, R(E)jkis the kth replication within the jth environment, and εijkis the residual error.Broad-sense heritability(h2B)was calculated as follows:
where σ2Gis the genetic variance,σ2GEis the variance of G × E,σ2∈is the variance of residual error, nEis the number of environments,and nRis the number of replications. Best linear unbiased predictions (BLUPs) were calculated with a mixed linear model (lme4)package in R [28].
The genotype data of the association panel included 56,110 SNPs, which were filtered by missing, minor allele frequency(MAF),and physical position[23].After filtering,43,735 SNPs were extracted for GWAS.A multidimensional scaling(MDS)matrix and a kinship matrix were imported as covariates. The physical position of each SNP was converted into B73 Ref_v4. Six multi-locus GWAS algorithms were applied using the mrMLM.GUI package(Version 4.0) in R software, including mrMLM [29], FASTmrMLM[30], FASTmrEMMA [31], pLARmEB [32], pKWmEB [33], and ISIS EM-BLASSO [34]. Because mrMLM is a multi-locus method,LOD ≥3.0 was set as a threshold for significance [29]. The phenotypic data of the nine traits were subjected to GWAS analysis.Reliable and significant SNPs detected using at least two methods were collected for subsequent study. Significant SNPs detected in more than one environment were considered to be environmentally stable,and pleiotropic SNPs associated with at least two traits were also extracted. Candidate genes were annotated according to B73 Ref_v4 in the UniProt database (https://www.uniprot.org/) to predict gene function. Superior and inferior alleles were determined according to the mean value of the corresponding traits of each significant SNP.Alleles showing higher mean values were assigned as superior alleles, and those with lower mean values as inferior.
To exploit SNP hotspots of GWAS,the significant SNPs detected using multiple methods from all nine traits were collected for 1000 genome-wide permutations [35]. To discover hotspots across the whole genome, the genome was scanned in 1-Mb window size and 100-kb step size. For each permutation, significant SNPs were randomly assigned to windows, SNPs in each window were counted,and the maximum value among the windows in each permutation was recorded.After permutations,the 95th percentile of the maximum values was set as the threshold for hotspots. Windows with counts greater than the threshold were identified as SNP hotspot windows.Overlapping hotspot windows were further collapsed into SNP hotspots.
Reliable SNPs detected using at least two methods were imported into Camoco software [22]. Three RNA-seq datasets derived from 503 maize seedlings [36], 76 different tissues/timepoints of B73 [37], and 46 genotypically diverse maize root samples [22] named respectively ZmPAN, ZmSAM, and ZmRoot were imported as network data.Significant SNPs were mapped to genes using three window sizes:50,100,and 500 kb,flanking gene limits were set to one,two,and five genes,and gene-specific density and locality were both calculated for each trait in three networks.Highconfidence discoveries were defined as HPO genes at a false discovery rate ≤30%in at least two SNP-to-gene mapping parameter settings among ‘‘Total”, ‘‘Density”, ‘‘Locality”, and ‘‘Both”.
When she is hungry, she is upset in the tank, getting down and up, go and back, hidden and out, looked pitiful and needed help. Sometimes she stretches her head out of the salt water, the body is stiff and looked like an SA missile ready to launch.
To construct a linkage map, 693 markers, including 146 highquality SSRs and 547 SNPs were genotyped in the RIL population composed of 173 individuals. QTL Icimapping V4.2 [38] was used for linkage map construction and QTL mapping. The grouping,ordering, and rippling parameters were fixed to the default settings. Markers were manually removed if the genetic distance between adjacent markers was larger than 30 cM.After the linkage map was constructed, the mean values of replications and BLUP values for each trait were imported as phenotypic data. The ICIM-ADD mapping method [39] was used for QTL mapping. The missing phenotype was set as ‘‘Deletion”, and the LOD threshold was set to 2.5 for QTL detection. As in the GWAS, QTL detected in more than one environment were considered to be environmentally stable, and pleiotropic QTL affecting at least two traits were extracted.
The transcriptome atlas of early seed development of the B73 inbred line from 0 to 144 h after pollination (HAP) [40], the dynamic transcriptome of the embryo from 10 to 38 days after pollination (DAP), and endosperm from 6 to 38 DAP [41] were integrated for the expression pattern analysis. Candidate genes were removed if they were not expressed in all three atlases.Transcript per million mapped reads (FPKM) was normalized by the maximum FPKM value of all time points for each gene. The normalized FPKM was imported into MeV (V4.9) [42] for heatmap plotting.Hierarchical clustering was performed using the HCL method[43].
The characterization of nine yield traits indicated a large variation in two populations for each trait, and the distribution of each trait was largely consistent with the normal distribution observed in violin plots (Figs. S1–S2). The coefficient of variance (CV)between the RIL population and the association panel was calculated to estimate the level of variation. In the RIL population, EW in WJ showed the highest CV(42.58%),whereas ED in BFX showed the lowest CV (8.43%). In the GWAS panel, Y in HY showed the highest CV (46.49%), and ED in CZ showed the lowest (10.97%).Overall, complex traits such as EW and Y consistently displayed higher CVs in both populations.For the same traits,in the two populations,the CVs for the association panel were slightly higher than those for the RIL population, but there were still large differences in multiple environments. Broad-sense heritability (h2B) ranged from 67.54% to 87.25% for the association panel and from 71.51%to 88.01% for the RIL population. RN showed the highest, and Y showed the lowest h2Bin both populations. Comparing the same traits between the two populations, most traits (EL, EW, HKW,KN, RN, and Y) showed larger h2Bin the RIL population, but the h2Bof CD, CW, and ED in the RIL population was lower than that in the association panel(Fig.1a;Table S1).The proportions of phenotypic variation were shown in Fig. 1b, c. Genotypic variation of each trait in the association panel could explain the most, while some of the traits in the RIL population explained less than 40%phenotypic variation. The relatively high heritability suggested that most variation in these traits was genotypic, indicating that the traits were suitable for GWAS and QTL mapping.
To understand the relationship between the yield traits, Pearson’s correlations of BLUP values were performed in the two populations (Table S2; Fig. 2a, b). Most traits were significantly correlated at P <0.05. Y and EW were positively correlated with all traits in both populations.However,HKW was negatively correlated with KN and RN in the RIL population but not in the association panel; EL was negatively correlated with ED but positively correlation in the RIL population. Closely correlated traits such as EL and KN,ED,and RN,showed a positive correlation.These results indicated that the yield architecture was complex, and yield traits were interacted with each other.
Fig. 2. Pearson correlations between BLUP values of nine yield traits in the association panel (a) and the RIL population (b). The gradient bar represents the correlation coefficient.CD,cob diameter; CW,cob weight;ED, ear diameter; EL,ear length;EW,ear weight;HKW,100-kernel weight;KN, kernel number per row;RN, row number;Y,grain yield per plant.
Fig.1. Heritability and proportion of phenotypic variation explained by genotype(G), environment(E), replicates(R), and interaction between them in nine yield traits. (a)Heritability of nine yield-related traits between the association panel and the RIL population. (b) Proportions of phenotypic variation explained in the association panel.(c)Proportions of phenotypic variation explained in the RIL population.CD, cob diameter; CW, cob weight;ED, ear diameter; EL,ear length;EW, ear weight;HKW,100-kernel weight; KN, kernel number per row; RN, row number; Y, grain yield per plant.
At a LOD threshold of 3.0, 966 significant SNPs were detected among two environments from CZ, HY, and BLUP values. To remove the false-positive SNPs, 384 SNPs identified by at least two methods were considered to be reliable SNPs (Table S3;Fig. 3). Because significant SNPs were detected by multiple methods, and the proportion of phenotypic variation explained by the SNPs differed, the proportion of phenotypic variation was determined using the mean r2across the multiple methods. As shown in Table S4, the proportion of phenotypic variation explained by single significant SNPs for traits ranged from 0.95% to 10.03%,and that explained by all significant SNPs for traits ranged from 21.34% to 87.23%. The significant SNPs detected in multiple environments or by several methods showed increased reliability.
Among the significant SNPs detected, 51 were identified in at least two environments or BLUP (Table S3), of which 50 environmentally stable SNPs could be identified from BLUP. Five SNPs were found in all environments, associated with CD, CW, ED, and EL(Table 1),and 17 significant SNPs were detected by all the methods(Table S3).Many pleiotropic SNPs were associated with multiple traits (Table S3). PZE-105110447, which was associated with CD, ED, EW, HKW, and RN, might play an essential role in yield architecture; PUT-163a-60344741-2523 was associated with EW,KN, and Y; and PZE-106076029 was associated with EW, KN, and Y. These pleiotropic SNPs affected multiple yield traits, suggesting that the underlying genes might be associated with seed and ear development. Three SNPs were identified among the nine yield traits by combining the environmentally stable and pleiotropic SNPs (Table 2).
To understand the allele pyramiding effect of significant SNPs identified in this study, the phenotypes of lines grouped by superior and inferior alleles were compared in Fig. 4. Superior and inferior alleles were determined by the mean value in the inbred lines. Inbred lines enriched with more superior alleles showed larger phenotypic values than those with more inferior alleles across all traits (Figs. S3–S4). These results showed that there was high accuracy of significant SNPs detected in our study, and suggested that pyramiding superior alleles could potentially increase yield. The inbred lines with the most superior alleles are listed in Table S5. They may be useful for introgressing superior alleles into other elite lines to improve grain yield in maize breeding.
Table 1 Significant SNPs detected in all environments.
Table 2 Environmentally stable and pleiotropic SNPs.
Fig.3. Distribution of genome-wide significant SNPs,QTL intervals,and related genomic features.The tracks from inner to outer layer are gene density from B73 Ref_V4,QTL frequency from the present in 1-Mb windows,SNP frequency in 1-Mb windows,SNP and QTL distribution of CD,SNP and QTL distribution of CW,SNP and QTL distribution of ED,SNP and QTL distribution of EL,SNP and QTL distribution of EW,SNP and QTL distribution of HKW,SNP and QTL distribution of KN,SNP and QTL distribution of RN,and SNP and QTL distribution of Y,distribution of markers of linkage map,SNP position of association panel,chromosome length.The scatter plot and highlights in the same color represent SNP and QTL of the same trait, respectively. CD, cob diameter; CW, cob weight; ED, ear diameter; EL, ear length; EW, ear weight; HKW, 100-kernel weight; KN,kernel number per row; RN, row number; Y, grain yield per plant.
Fig. 4. Pyramiding effect of superior alleles (cyan) and inferior alleles (red) identified in BLUP values of nine yield traits.
Fig. 5. Comparison of superior allele pyramiding and phenotypic performance of nine yield traits in temperate and tropical germplasm. (a) Superior allele pyramiding in temperate vs.tropical germplasm.(b)Phenotypic performance in temperate vs.tropical germplasm.CD,cob diameter;CW,cob weight;ED,ear diameter;EL,ear length;EW,ear weight; HKW, 100-kernel weight; KN, kernel number per row; RN, row number; Y, grain yield per plant.
Because the GWAS for each trait was independent,and the significant SNPs identified by GWAS were dispersed across the total genome.The SNP hotspots were investigated by genome-wide permutation to understand the enrichment of significant SNPs for different traits. The cutoff of the SNP number was set to 4 according to the permutation results.Seven SNP hotspots containing 41 SNPs were identified on chromosomes 1, 2, 3, 4, 5, and 9 after overlapping hotspots were combined (Table 3). Because we separated the significant SNPs according to environments and traits, most of the SNPs in the hotspots were environmentally stable and pleiotropic. For example, there were nine SNPs enriched in hotspot region Chr. 5: 170600001-172500000, including PZE-105110447,which associated with three different traits.
According to our previous study, the LD decay distance of this population is relatively large(~200 kb)[23],which implied a broad confidence interval surrounding the significant SNPs. A coexpression network method was applied to prioritize the causal genes detected by GWAS for each trait. The significant SNPs identified in our study were combined with three networks of ZmPAN,ZmSAM, and ZmRoot in the Camoco tool [22] for casual gene analysis. At a threshold of FDR 0.3 in at least two SNP-to-gene mapping parameters, HPO genes were characterized by density,locality, total, and both datasets. A total of 61 HPO genes were detected (Tables S6–S7). They numbered from 1 to 13 across different traits in the total set, from 0 to 10 for the density, and from 2 to 4 for the locality dataset. Only three HPO genes were discovered in both the density and locality sets. In contrast to the candidate genes derived from the environmentally stable and pleiotropic SNPs, some of the genes were not adjacent to the associated SNP position.
Combining the results of QTL mapping and GWAS analysis could increase the accuracy of gene identification. In our study, a RIL population derived from two parents showing large differences was used for the QTL mapping of yield components. The linkage map was constructed using 693 SSR and SNP markers,and the total length was 3072 cM (Fig. S5). Phenotypic data for nine traits,including CD,CW,ED,EL,EW,HKW,KN,RN,and Y,were collected from BFX,CZ,FJ,and WJ.BLUP values were calculated for QTL mapping.Of 109 QTL mapped(Table S8),23 were associated with HKW,and only one with EW. The proportion of phenotypic variation explained by the markers (PVE%) ranged from 3.49 to 22.62.qHKW1-2 explained the largest phenotypic variation, while qHKW3-2 explained the least amount of phenotypic variation for a single QTL. The total phenotypic variation identified by QTL for each trait among the different environments and BLUP is described in Table S8,in which QTL associated with ED explained the largest proportion (60.03%) of phenotypic variation. Sixteen environmentally stable QTL and sixteen pleiotropic QTL were also identified(Table S9). One QTL mapped in the interval from PZE-101049608 to SYN450 consistently affected CD in WJ and BLUP, and also ED in BFX and BLUP.
To identify reliable causal genes, the QTL and significant SNPs were combined for further analysis.Seven consensus SNPs located in the QTL intervals for the same traits were identified (Table 4).PZE-105127228, which was located in the interval of qCD5-4,was associated with CD. The qED1-2 contained the SNP PZE-101005766 and both were associated with ED. Two RNassociated SNPs, PZE-105110447 and PZE-101004923, colocalized in the intervals of qRN5 and qRN1-1, respectively. Two HKW associated QTL, qHKW1-3 and qHKW4-1, also colocalized with the HKW-associated SNPs PZE-101068891 and PZE-104103469 in each interval.Finally,the Y-associated QTL qY3-1 overlapped with the Yassociated SNP PZE-103091976.
For GWAS, considering the relative position and LD decay distance around the SNPs, the candidate genes were determined as the closest or colocalized gene with an annotation within the 250-kb flanking the SNPs. Only 51 environmentally stable SNPs(Table S3), 36 pleiotropic SNPs (Table S3), seven consensus SNPs(Table 4), and 61 HPO genes were collected for analysis(Table S7).A total of 127 annotated genes were predicted as candidate genes for yield traits. Some genes were closely related to genes identified in previous studies. For example, a gene similar to Zm00001d016656 encoding a serine/threonine protein kinase associated with CD, ED, EW, HKW, and RN, has been shown to affect RN and increase grain yield [14]. Another gene,Zm00001d034428, associated with ED and RN in CZ and BLUP,has been validated to regulate seed development [44]. For candidate genes from the coexpression networks, 61 HPO genes were predicted by Camoco, and some of these were associated with growth and development (Table S7). Zm00001d014507 encoding auxin response factor 19(arftf19),is a transcription factor involved in root development [45]. Another gene, Zm00001d027412 encoding a Dicer-like 101 (dcl101) protein. Similar genes might regulate meristem determination in the inflorescence [46].Zm00001d016656 and Zm00001d027412 were also found in consensus SNPs. Zm00001d017207, which encodes AP2-EREBPtranscription factor 206(ereb206),has been reported to participate in the lysine biosynthesis pathway of maize seed development[47].
Table 3Information of SNP hotspots and associated traits.
?
Because many of the predicted candidate genes play a role in seed development, characterizing their spatial and temporal gene expression pattern was essential for identifying the mechanism of gene regulation. We integrated candidate genes derived from environmentally stable SNPs (Table S3), pleiotropic SNPs(Table S3), consensus SNPs (Table 4), and HPO genes (Table S7).The expression patterns of 110 candidate genes were investigated at 63 time points after removal of genes that were not expressed in all the time points (Fig. 6a). Among the candidate genes, 87 were expressed in all three transcription atlases, eight genes were expressed specifically in the nucellus, two genes were specifically expressed in the endosperm, and two genes expressed specifically in embryo (Fig. 6b; Table S10). According to 18 modules clustered by nucellus transcriptome, 14 in Stage I (0–16 HAP), 10 in Stage II(20–44 HAP),16 in Stage III(56–96 HAP),and 12 in Stage IV(102–144 HAP). The other 51 genes displayed intricate expression patterns in multiple stages, indicating that they are involved in some common functional processes.For example,the calcium dependent protein kinases(CDPKs)function in plant growth and development[48], and the candidate gene Zm00001d051502 (cdpk7) was expressed in multiple stages in the nucellus, embryo, and endosperm. Furthermore, the pentatricopeptide repeat (PPR) genes have been shown to be essential for kernel development [44,49].Correspondingly, the candidate gene Zm00001d034428 (ppr78) is expressed widely in multiple stages in nucellus,embryo,and endosperm.Some genes were expressed in a specific transcription atlas,indicating that they are expressed in that specific stage and tissue.For example, a sugar transporter has been shown to mediate sucrose from endosperm to early embryo in soybean [50], and a similar gene,Zm00001d023677(sweet13a),is expressed specifically in the early stage (Stage I-B) in the nucellus. Moreover,Zm00001d048050(gln3),which is specifically expressed in multiple stages (module 17), encodes glutamine synthetase, is essential for nitrogen assimilation and recycling and influences grain production [51,52]. The expression pattern analysis using the transcription atlas provides information on gene function and regulation that can be used for further study.
In our study, we combined linkage mapping, GWAS, and coexpression networks to identify QTL in a bi-parent population and association panel across multiple environments. Combing linkage mapping and GWAS for nine yield traits revealed respectively 109 and 384 QTL, many of them were pleiotropic. Many studies to date have investigated the genetic architecture of maize yield,and numerous QTL have been detected.QTL identified in this study colocalized with others previously reported in different populations. qRN2-2, located at 24.43–25.94 Mb of bin 2.03, was also identified in three studies of the genetic architecture of RN[7,53,54]. qEL2-1 was found in three environments in 14.11–20.88 Mb of bin 2.01 in our study. This QTL colocalized in an ear architecture study of multiple populations [5] and QTL mapping of yield traits [53]. qRN5, which is located at 171.19–172.21 Mb of bin 5.04 colocalized in two studies of ear-related traits [5,19].Three QTL for RN that were detected in bin 1.05 and 4.08 overlapped with QTL were detected previously in multiple populations[5],and a combination of GWAS and linkage mapping[16].Ten QTL for HKW all overlapped with those identified in a study on the genetic architecture of kernel size and weight in maize and rice[55].The significant SNPs identified by GWAS were compared with those from a recent study that used the same association panel and genotypic data [16]. Four SNPs were consistently detected for the nine traits: PZE-106115028, located at 165.3 Mb on chromosome 6, was associated with CW; PZE-102105585 and PZE-102107480,which were both associated with EL, colocalized on chromosome 2; PZE-102114516, which was associated with RN and located on chromosome 2, was detected in both studies. These colocalized or overlapping QTL and SNPs that were detected in multiple studies were valuable for further validations.
Fig.6. Expression pattern of candidate genes in nucellus,embryo,and endosperm.(a)Expression heatmap of candidate genes.For each gene,the FPKM values are normalized by the maximum value of all FPKM values. (b) Venn diagram showing the number of expressed genes in different tissues during the seed development.
Linkage mapping is a classical method for dissecting the mechanisms that underlie quantitative traits.Fine mapping based on the main-effect QTL derived from primary mapping results remains a common strategy. Due to the limited combinations and linkage markers,intervals in the primary mapping are usually large.Moreover, because of the narrow genetic background of the parents,linkage mapping usually explains only a small proportion of genetic variation for complex quantitative traits. Along with high-density SNPs and large populations, GWAS can efficiently solve the issues of low diversity and detection rates, but a large number of false-positive results can confuse the truly associated sites and reduce detection power. Furthermore, LD decay distance complicates the selection of candidate genes, as the causal genes sometimes are not the closest genes closest to the associated sites.Because the parents of the RIL population were included in the association panel, and significant SNPs in the intervals of major QTL can improve the reliability of gene detection, joint analysis of GWAS and linkage mapping is a complementary and efficient strategy for accurately mining candidate genes, even though there is more variation in the association panel than the RIL population.In our study, we identified seven consensus SNPs associated with CD, ED, HKW, RN, and Y, and some similar genes have been validated by previous studies.To date,many studies have used a combination of multiple populations or detection methods to examine the genetic architecture of yield traits[17–19],but few used coexpression networks to identify candidate genes. In this study, by using public RNA-seq data from three different tissues,we applied Camoco software to investigate HPO genes (Table S7). Two HPO genes detected by linkage mapping and GWAS were identified as consensus genes.Some new HPO genes associated with yield traits,such as arftf19, dcl101, and ereb6, were not detected by linkage mapping and GWAS. Thus, joint analysis by GWAS and linkage mapping is an excellent strategy for discovering novel genes that underlie target traits.
Because the genetic architecture of quantitative traits is usually regulated by many small-effect loci that also interact with the environment [5,56,57], the effect of a single locus is likely to play a limited role in improving the target trait. Many studies have attempted to pyramid multiple related genes or linked markers to create new materials by backcrossing and selection, especially to produce quality-protein and high-nutrition maize. Chandran et al.[58]pyramided opaque2 and crtRB1 to combine higher lysine,tryptophan, and β-carotene contents. Another study [59] used the marker-assisted pyramiding of opaque2 and the novel gene opaque16 to enhance lysine and tryptophan contents. Ribaut et al. [60] applied marker-assisted selection to improve drought adaption using a backcrossing approach and proposed that pyramiding favorable alleles from different sources would be valuable if only a small number of target loci are used. These studies prompted us to investigate whether pyramiding the superior alleles identified in our study would increase grain yield. For all the yield traits in our study, pyramiding the superior alleles of stable loci resulted in significant improvements. Because in a single study, the explanation of phenotypic variation and the number of QTL is necessarily limited, some inbred lines that were enriched in superior alleles did not show higher phenotypic values for yield traits.Compared with the pyramiding of superior alleles in temperate and tropical groups,a large difference in pyramiding status was found in yield traits. Some of the traits showed a consistent association between superior alleles and phenotype, demonstrating that pyramiding superior alleles could affect the phenotypic difference between the tropical and temperate germplasm.These results illustrate the high accuracy of identifying candidate genes using these methods in our study and provide information for MAS.There are two prospects for MAS of candidate loci derived from GWAS results: conventionally, main-effect loci are selected, and markers are developed to pyramid superior alleles by backcrossing to create new materials. More importantly, superior alleles can be combined as haplotypes to predict the yield of inbred lines.
In our study,based on the large-scale population constructed by inbred lines,many alleles were identified by integrating GWAS and QTL mapping,providing genetic resources for investing the genetic architecture of yield traits.Although the high productivity of maize depends on the grain yield of hybrids,superior alleles identified in inbred lines can be transferred by MAS or gene editing into the parents of hybrids for increasing the hybrid grain yield.The KNR6 haplotype identified from QTL mapping increases the EL and KN of Zheng 58 and Chang 7–2 relative to the original inbred lines, and also increases the grain yield of Zheng 58/Chang 7–2 hybrids[14]. ZmCLE7 and ZmFCP1 have been identified as CLE peptide signals in the conserved CLAVATA-WUSCHEL pathway maintaining inflorescence meristems in Arabidopsis and maize [61]. A more recent study used CRISPR-Cas9 genome editing to engineer the promoter regions of ZmCLE7 and ZmFCP1. The hybrids of B73(ZmCLE7-promoter editing) × W22 and B73 (ZmFCP1-promoter editing) × A619 showed significant increase in yield traits, such as RN,ED, EW,and grain yield per ear, relative to original hybrids,similar to the inbred lines[62].These cases indicate that the yieldrelated genetic studies of inbred lines contribute to increasing maize production. Environmentally stable loci will play a critical role in MAS for yield-related traits. Our study sheds light on the genetic architecture of grain yield and presents a new strategy for dissecting critical traits for maize breeding.
An association panel and a RIL population were used to identify candidate genes for nine yield traits in multiple environments.Pyramiding of superior alleles detected in the study showed highly positive effects on all traits. Differences in superior-alleles pyramiding consistently corresponded with the phenotypic values of ED and EW in tropical and temperate germplasm. Combining the use of coexpression networks revealed many HPO genes from GWAS. Zm00001d016656, a gene encoding a serine/threonine protein kinase was associated with five different traits across multiple environments.Some genes were uniquely expressed in specific tissues and stages. These findings provide genetic resources for molecular breeding and shed light on the genetic architecture of maize grain yield.
CRediT authorship contribution statement
Xiao Zhang:Writing–original draft,Writing-review&editing,Funding acquisition.Zhiyong Ren:Investigation, Methodology.Bowen Luo:Funding acquisition, Investigation.Haixu Zhong:Investigation.Peng Ma:Investigation.Hongkai Zhang:Investigation.Hongmei Hu:Investigation.Yikai Wang:Investigation.Haiying Zhang:Investigation.Dan Liu:Project administration.Ling Wu:Project administration.Zhi Nie:Methodology.Yonghui Zhu:Methodology.Wenzhu He:Supervision, Methodology.Suzhi Zhang:Supervision, Methodology.Shunzong Su:Methodology.Yaou Shen:Supervision, Methodology.Shibin Gao:Writing -review & editing, Supervision, Funding acquisition.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research was funded and supported by China Agriculture Research System of MOF and MARA,Sichuan Science and Technology Support Project (2021YFYZ0020, 2021YFYZ0027,2021YFFZ0017), National Natural Science Foundation of China(31971955), and Sichuan Science and Technology Program(2019YJ0418, 2020YJ0138). The authors would like to thank Prof.James C. Nelson for English editing.
Appendix A. Supplementary data
Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2021.07.008.