Tinwng Wen,Bosheng Di,To Wng,Xinxin Liu,Chunyun You*,Zhongxu Lin,*
a National Key Laboratory of Crop Genetic Improvement,College of Plant Science and Technology,Huazhong Agricultural University,Wuhan 430070,Hubei,China
b Huanggang Academy of Agricultural Sciences,Huanggang 438000,Hubei,China
c Cotton Research Institute,Shihezi Academy of Agriculture Science,Shihezi 832003,Xinjiang,China
Keywords:Brown-fiber cotton Plant architecture traits GWAS QTL Genetic variation
A B S T R A C T Plant architecture traits influence crop yield.An understanding of the genetic basis of cotton plant architecture traits is beneficial for identifying favorable alleles and functional genes and breeding elite cultivars.We collected 121 cotton accessions including 100 brownfiber and 21 white-fiber accessions,genotyped them by whole-genome resequencing,and phenotyped them in multiple environments.This genome-wide association study(GWAS)identified 11 quantitative trait loci(QTL)for two plant architecture traits:plant height and fruit spur branch number.Negative-effect alleles were enriched in the elite cultivars.Based on these QTL,gene annotation information,and published QTL,candidate genes and natural genetic variations in four QTL were identified.Ghir_D02G017510 and Ghir_D02G017600 were identified as candidate genes for qD02-FSBN-1,and a premature start codon gain variation was found in Ghir_D02G017510.Ghir_A12G026570,the candidate gene of qA12-FSBN-2,belongs to the pectin lyase-like superfamily,and a significantly associated SNP,A12_105366045(T/C),in this gene represents an amino acid change.The QTL,candidate genes,and associated natural variations in this study are expected to lay a foundation for studying functional genes and developing breeding programs for desirable architecture in brown-fiber cotton.
It is important to improve yield and quality by modifying plant architecture traits in crops[1].In wheat and rice breeding,the deployment of dwarfing genes led to the success of the Green Revolution,and the major genes have been cloned[2,3].With the development of molecular genetics,QTL(quantitative trait loci),major-effect genes,and the molecular mechanisms underlying diverse plant morphologies have been identified [4-8].Numerous classical genes (e.g.,FRUITFULL,WUSCHEL,and CAULIFLOWER)conditioning plant architecture have been characterized at the molecular and network regulatory level[9],among them CLAVATA,which influences shoot meristem development[10-12].
Cotton(Gossypium spp.),an important economic crop,is the largest source of textile fiber[13].Plant architecture traits strongly influence fiber yield and quality[14].The aerial architecture of the cotton plant is determined by numerous traits.Genetic mapping has been used to fine-map QTL associated with plant architecture traits[15-18].A study of cotton plant architecture traits including fruit branch length,plant height,and branch angle has been conducted by Wang et al.[15].Song and Zhang[16]constructed a population derived from an interspecific cross between G.hirsutum and G.barbadense to study plant architecture traits.
With the proliferation of genotyping markers and the collection of germplasm resources,association mapping is another strategy for dissecting the genetic basis of plant architecture traits[19-21].Simple-sequence repeat(SSR)markers are abundant in the cotton genome[22]and sufficiently numerous for genetic mapping,but not for association mapping in populations with many more recombination events.The publication of a reference genome[22]has allowed the use of single-nucleotide polymorphisms(SNPs)for genotyping populations.Huang et al.[20]identified numerous QTL influencing plant architecture traits using the CottonSNP63K array with 503 Chinese upland cotton accessions.Recently,Su et al.[21]performed GWAS in upland cotton using genotyping by specific-locus amplified-fragment sequencing to fine-map QTL underlying plant agronomic traits and candidate gene influencing plant height,Gh_D03G0922.As the genome of upland cotton has been assembled and the cost of whole-genome resequencing(WGR)has decreased,genotyping association-mapping populations by WGR has become feasible.WGR supports the use of mass sequencing reads,generating millions of SNPs and leading to high mapping resolution to dissect the genetic basis of traits[23,24].
We investigated two plant architecture traits,plant height(PH)and fruit spur branch number(FSBN),by WGR genotyping in brown-and white-fiber accessions.We aimed to(1)use two association methods to map QTL for cotton architecture traits with low heritability;(2)illustrate breeding strategies for architecture traits in elite brown-fiber cotton cultivars;and(3)predict candidate genes underlying the QTL and characterize genetic variation in the candidate genes.This study will provide a genetic basis for studying functional genes and developing breeding programs to achieve a desirable architecture in brown-fiber cotton.
A panel of 121 cotton accessions including 100 brown-and 21 white-fiber accessions(Table S1)was used and has previously[24]been studied for fiber traits.Field experiments were performed with the accessions in multiple environments including E1(Huanggang,Hubei province,China;30.44°N,114.87°E)in 2015,E2 and E3(Shihezi,Xinjiang province,China;44.27°N,85.94°E)in 2015 and 2016;and Ezhou,Hubei province,China(30.23°N,114.52°E)in 2016(destroyed by flooding and waterlogging).Field trials followed a completely randomized block design in each environment.The accessions were planted in E1 with 10 plants per accession and two replicates in plots 5 m in length and 1 m in width with 0.45 m spacing between plants,with 10 plants in each row;in E2 and E3,plots were 5 m in length and(0.40+0.50+0.46)m in width with 0.1 m spacing between plants and 100 plants in each double row.During the boll-opening phase,two plant architecture traits,PH and FSBN,were recorded.Statistics were calculated and analysis of variance(ANOVA)for the phenotype data was fitted using R 3.4.3 function and scripts.
To genotype the 121 accessions,the panel was resequenced at approximately sixfold depth.The sequence data were deposited in the NCBI databank after the previous study[24].Clean reads obtained by filtering the raw sequencing data were aligned against the upland cotton reference genome[25]and SNP calling was performed as described previously[24].A total of 2,620,639 SNPs were obtained after applying TASSEL 5.0[26]to control SNP quality with the parameters(-homozygous-filterAlign-filterAlignMinFreq 0.05).The statistics of the SNP markers were calculated with Plink 1.07[27]and their polymorphism information content(PIC)values were calculated with R following Bolaric et al.[28].
A set of 50,000 high-quality SNPs randomly selected from the whole-genome SNPs were used to estimate the linkage disequilibrium decay in the panel using Plink(parameter:-ldwindow-r2 0-ld-window 99999-ld-window-kb 1000).The kinship among the accessions was calculated with SPAGeDi 1.4b[29],and a principal component analysis and phylogenetic tree were constructed with TASSEL.To evaluate the population structure of the panel using the 50,000 SNPs,the number of subgroups(K)from 1 to 7 was computed with five independent repeats by STRUCTURE 2.3.4[30].The delta K value was obtained from STRUCTURE HARVESTER[31],and the optimal K was selected according to the highest delta K value[32].Lastly,CLUMPP[33]was used to integrate the five repeats.
Genome-wide association analysis was performed between the high-quality SNPs and traits using TASSEL with a mixed linear model(MLM),and the optimal structure subgroups(Q)and kinship(K)were employed to correct for stratification[34].GEMMA[35]was also applied to perform association analysis with a linear mixed model(LMM),and the relatedness matrix calculated by GEMMA was applied to correct stratification.The threshold line of association analysis was set at-lg(1/N)(N is the number of SNP markers).Finally,the results from the two association methods were combined,considering the LD decay in the population.The LD block of associated QTL was drawn using R package(LDheatmap).
The QTL nomenclature starts with “q”,followed by the name of the chromosome,an abbreviation for the trait name,and the number of the QTL affecting the trait on the chromosome[15].
The effects of the genome-wide SNPs were annotated and predicted with snpEff[36].Published QTL markers were downloaded from CottonQTLdb(http://www2.cottonqtldb.org:8081/traits).Electronic PCR(e-PCR)[37]was performed using the following sequence(1)building a famap,./famaptN-bgenome.famap./cotton.fa;(2)creating a genome hash,./fahash-bgenome.hash-w12-f3./famap;and(3)aligning the marker sequences within the reference genome TM-1,./re-PCR-Sgenome.hash-d50-5000-n1-g1-m3-oPrimer_location.txt./Primer_sequence.txt.
Box plots for the phenotypes of PH and FSBN showed differences between E1 and E2/E3(Fig.1-a,b).The correlation coefficient between PH and FSBN is 0.37(P<0.01).Both environment and genotype influenced the plant architecture traits(P<0.001),with genotype-by-environment interaction influencing PH(P<0.01)(Table S2).High phenotypic variation was observed in the 121-accession panel and skewness statistics showed that these two traits were not normally distributed.The heritability of these two traits was low and PH showed the higher heritability(H2=0.68)of the two traits(Table S3).
In total,2,620,639 SNPs distributed randomly on 26 chromosomes were identified from the genome-wide resequencing data of the 121-accession panel.The average marker density was 1.17 SNPs kb-1and the average PIC was 0.33(Table 1).
Table 1-Summary of SNPs and PIC on 26 chromosomes.
Population diversity,structure and kinship can affect the results of association mapping.The delta K result suggested that the 121-accession panel was divided into two subgroups(Fig.2-a),meaning that population structure was observed(Fig.S1).Principal component analysis(PCA)and the phylogenetic tree also split the panel into two groups.The 21 whitefiber accessions were grouped together(Fig.2-b,c).The histogram of the kinship value distribution showed that 90%of the relationship coefficients were lower than 0.1(Fig.2-d).To evaluate the association resolution,the linkage disequilibrium(LD)decay in the panel was calculated with SNP pairs,and the estimated values of LD decay were 145 kb(r2=0.2)and 900 kb(r2=0.1)(Fig.2-e).
Fig.1-Phenotypes of PH and FSBN in three environments.(a)Variation boxplots of PH in three environments.(b)Variation boxplots of FSBN in three environments.PH,plant height;FSBN,fruit spur branch number.
Fig.2-Population structure,diversity,and linkage disequilibrium(LD)in the 121 accessions.(a)Delta K value of population structure plotted from 2 to 7.(b)Principal component analysis of the 121 accessions.(c)Phylogenetic tree of the 121-accession panel.(d)Distribution of pairwise kinship value between accessions.(e)Genome-wide LD decay in the 121 accessions.
Five QTL for PH and six QTL for FSBN were identified by the two association models(Table 2,Figs.S2 and S3).qD02-FSBN-1 was associated with FSBN by both LMM and MLM.The QTL identified by GWAS were compared with previously reported QTL,and the reported QTL markers were anchored to the reference genome[25]by e-PCR.Numerous QTL were within reported QTL regions,and the qA03-PH-1 region was similar to a region identified by GWAS in 503 Chinese upland cotton accessions[20].
Eighteen elite brown-fiber accessions were selected to examine the distribution of favorable alleles among 11 associated QTL.The QTL alleles were classified as positive or negative alleles according to their association with high or low trait values at significantly associated marker loci.Most of the negative-effect QTL alleles were found in 17 elite cultivars.In contrast,Zhongmian 81 harbored eight positive and only three negative alleles.Interestingly,16 elite accessions harbored positive alleles from qA01-PH-1.These results suggest that cultivars with attributes of moderate PH and FSBN have tended to be selected in brown-fiber cotton breeding(Table S4).
Table 2-Genome-wide significant association of SNPs with two plant architecture traits.
Candidate genes were predicted depending on significant association SNPs,LD value(145 kb,r2=0.2)(Fig.2-e),and functional annotation of the genes.Causative candidate genes were identified for four QTL(Table S5),though no likely candidate genes were identified for the other seven QTL.Table S5 describes the sequence variation in the candidate genes.
The significantly associated SNPs around qD02-FSBN-1 were analyzed in the candidate region.An LD block was defined between 60.30 and 60.55 Mb on chromosome D02(Fig.3).This region includes 12 genes and significantly associates with FSBN(P<0.05)(Fig.4-a).The D02_60493591 SNP in the candidate gene,Ghir_D02G017510,is a premature start codon gain variant(Table S5).In another candidate gene,Ghir_D02G017600,two missense variants were found.Ghir_D02G017600 belongs to the CLAVATA gene family and influences meristem development.Of 24 genes in the genome that have been annotated as CLAVATA family genes(Fig.S4),only Ghir_D02G017600 harbored the missense variants in its sequence.
Ghir_A12G026570,the candidate gene for qA12-FSBN-2,is a pectin lyase-like superfamily gene (Table S5). The A12_105366045(C/T)missense SNP variant was significantly associated with FSBN (Table 2).Accessions harboring A12_105366045-CC(n=13)showed higher FSBN than those harboring A12_105366045-TT(n=107)(P<0.001)(Fig.4-b).
In Ghir_A01G015560(the qA01-PH-1 candidate gene),A01_100712277(G/A)represented the loss of a stop codon and other variations were present in the untranslated region(UTR)(Table S5).In Ghir_A03G005160(qA03-PH-1 candidate gene),A03_8815955(C/T)represented a missense and splice region variant(Table S5).
In the era of cotton functional genomics[38],GWAS is a preferred tool to dissect the genetic basis of cotton traits[20,23,39,40],and several software and association models can be applied to study genome-wide associations[41].Because inadequate association models will result in false positive associations(Type I error)or false negative association(Type II error)[42,43],the selection of an association method is important.We applied TASSEL and an MLM model to perform the association analysis,but the association results were not ideal,because the Q-Q plot showed that the plots were below the theoretical Q-Q line at the start position,suggesting that this model was not suitable for associating all the traits in multiple environments.For this reason,we selected GEMMA and the LMM model to perform the association analysis.Many more loci were fine-mapped and the locus of qFSBN-D02-1 was mapped by both MLM and LMM.Among the QTL,we identified responsible candidate genes for four QTL.This result suggests that selecting the appropriate association models will identify more loci,but that we need to confirm the association results with supporting information(e.g.,biological experiments and published references)and consider Type I and Type II error.
Broad-sense heritability,defined as the ratio of genetic to phenotypic variance,influences the efficiency of association mapping studies[44,45].In this study,the broad-sense heritability of the traits(Table S2)was lower than those previously[24]reported for fiber traits.This phenomenon was also observed for the 503 Chinese upland cotton accessions[20].An explanation may be that fibers develop in an enclosed space,whereas branches,stems,and bolls in the cotton plant architecture are exposed to the outside environment and affected by artificial modifications.Plant architecture is the result of dynamic interactions between endogenous plant growth processes and the exogenous environment[46].
Fig.3-Linkage disequilibrium(LD)block and genetic variants of candidate genes in qD02-FSBN-1.
Fig.4-Boxplots of FSBN for variant SNP genotypes.(a)Boxplot of FSBN for genotypes of SNP D02_60467480.(b)Boxplot of FSBN for genotypes of A12_105366045 SNP.FSBN,fruit spur branch number.
For yield,biological resistance such as pest resistance,and field-utilization efficiency,an ideal plant architecture type is important[47,48].Cotton has an indeterminate growth habit[49],and the plant architecture can affect the cotton canopy,density,and yield[50].In cotton,a superior plant architecture type includes a compact plant type,dwarf plant height,and short internodes and branches[16].Considering the low heritability of cotton architecture traits,thus,breeding an ideal plant architecture type in cotton is difficult and cultivation practice is important for modifying plant architecture(Table S2),and it is necessary to remove the excessive apical buds and branches in lateral development,although this will need more labor[51].
Breeding for a cotton plant architecture adapted to cultivation is also important.In the history of plant architecture breeding,several major genes(e.g.,sd1,TB1,and MOC1)influencing branching have played a crucial role[52-54].A few genes influencing plant architecture have been exploited in cotton.The nulliplex-branch gene(gb_nb1)and okra(L-D1)genes influencing plant and canopy architecture have been characterized[55,56].Recently[21],Gh_D03G0922,which influences plant height,has been identified by genome-wide association mapping.Although most studies about the genetic basis of plant architecture traits in cotton still stay at QTL level via linkage and association mapping[16,19],currently,these QTL can be considered in marker-assisted breeding to improve plant architecture traits.
In this study,eleven QTL were identified by association mapping(Table 2),and five candidate genes and causative genetic variations were characterized(Table S5).Further biological experiments are the best way to identify the functions of the candidate genes and validate the functional alleles described in Table S5.In Arabidopsis,the CLAVATA 3 gene regulates the activity of shoot meristem development[10],and the precise editing of CLAVATA genes in Brassica induced multilocular silique development[57].We also identified genetic variations of two candidate genes annotated as CLAVATA and protein kinase superfamily genes in the qD02-FSBN-1 locus.The function of these genes and genetic variations are being validated by the CRISPR-Cas9 system in our laboratory.Ghir_A03G005160,a GRAS gene,may affect plant height.Given that the GRAS protein has been identified[58]as playing an important role in meristem maintenance and development,it is desirable to validate the A03_8815955(C/T)functional allele in Ghir_A03G005160(Table S5).
Conflict of interest
The authors declare no conflict of interest
Acknowledgments
This work was supported by the Fundamental Research Funds for the Central Universities(2662015PY097)and the Breeding of New Early Maturing and High-quality Coloured Cotton Varieties(2016HZ09).We appreciate help from laboratory members in the cotton molecular genetics and breeding group in the National Key Laboratory of Crop Genetic Improvement,Wuhan,Hubei,China.
Appendix A.Supplementary data
Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2018.12.004.