Ziqi Sun, Feiyan Qi, Hua Liu, Li Qin, Jing Xu, Lei Shi, Zhongxin Zhang, Lijuan Miao, Bingyan Huang,Wenzhao Dong, Xiao Wang, Mengdi Tian, Jingjing Feng, Ruifang Zhao, Xinyou Zhang, Zheng Zheng
Henan Academy of Crop Molecular Breeding, Henan Academy of Agricultural Science/Key Laboratory of Oil Crops in Huang-Huai-Hai Plains, Ministry of Agriculture/Henan Provincial Key Laboratory for Oil Crops Improvement, Zhengzhou 450002, Henan, China
Keywords:QTL mapping Peanut Oil content Fatty acid Whole-genome resequencing
ABSTRACT Oil and protein content and fatty acid composition are quality traits in peanut.Elucidating the genetic mechanisms underlying these traits may help researchers to obtain improved cultivars by molecular breeding.Whole-genome resequencing of a recombinant inbred population of 318 lines was performed to construct a high-density linkage map and identify QTL for peanut quality.The map, containing 4561 bin markers,covered 2032 cM with a mean marker density of 0.45 cM.A total of 110 QTL for oil and protein content, and fatty acid composition were mapped on the 18 peanut chromosomes.The QTL qA05.1 was detected in four environments and showed a major phenotypic effect on the contents of oil,protein,and six fatty acids.The genomic region spanned by qA05.1,corresponding to a physical interval of approximately 1.5 Mb, contains two SNPs polymorphic between the parents that could cause missense mutations.The two SNP sites were employed as KASP markers and validated using lines with extremely high and low oil contents.These sites may be useful in the marker-assisted breeding of peanut cultivars with high oil contents.
Peanut (Arachis hypogaeaL., 2n= 4x= 40) is a major industrial crop worldwide [1].In China and India,peanut seeds are used primarily as a source of vegetable oil, whereas in western countries,they are also consumed as food products [2-4].Peanut seeds contain oil (40%-56%), protein (20%-30%), and carbohydrates (10%-20%) [4].Peanut oil is composed of several fatty acids, including palmitic (C16:0), stearic (C18:0), oleic (C18:1), linoleic (C18:2),arachidic (C20:0), behenic (C22:0), and arachidonic acid (C20:1).The ratio of saturated (palmitic, stearic, arachidic, and behenic)to unsaturated (oleic, linoleic and arachidonic) fatty acids is approximately 1:4 [5].Peanut proteins contain essential amino acids that are easily absorbed by the human body [6].
Protein and oil content are key quality traits in peanut.Fatty acid composition, particularly of oleic acid, is especially valued,as oleic acid can increase the shelf life of peanut products and is beneficial for human health [7].Elucidating the genetic mechanisms underlying the expression of quality traits may help researchers to develop improved cultivars by marker-assisted breeding.
Several quantitative trait loci (QTL) have been reported [1-4,8]to be linked with oil content and fatty acid composition in peanut.In two recombinant inbred line (RIL) populations genotyped with simple sequence repeat (SSR) markers, 78 main-effect and 10 epistatic QTL were detected for oil content and oil quality traits [2].Another SSR-based QTL mapping study identified 12 QTL for eight quality-related traits with phenotypic variation explained (PVE)ranging from 1.7%to 20.2%[3].Two major QTL located on chromosomes A02 and A10 and 20 on chromosomes A05, A07-A10, B01,B04,and B09 were associated with oil content and fatty acid compositions, respectively, in two F2populations with diversity array technology (DArT) markers [4].Finally, Liu et al.[1], using SNP markers from double-digest restriction-site-associated DNA(ddRAD) sequencing, mapped the major and stable QTLqOCA08.1to an approximately 0.8-Mb genomic region containing two annotated genes predicted to affect oil biosynthesis.
With advances in next-generation sequencing(NGS)technology and the availability of reference genomes for diploid progenitors and cultivated peanut [9-12], high-resolution mapping has been successfully performed for complex traits in peanut, such as yield[13,14]and disease resistance[15-17].Various NGS methods have been employed to generate a large number of peanut SNPs,including restriction-site-associated DNA sequencing(RAD-seq)[17-18],single-nucleotide polymorphism (SNP) array [19-20], specificlocus amplified fragment sequencing (SLAF-seq) [13,21], DArT[4,22], and whole-genome resequencing (WGRS) [15].
The objectives of this study were(1)to construct a high-density linkage map in peanut by whole-genome resequencing;(2)to conduct QTL mapping for the content of oils, protein, and seven fatty acids; (3) to develop KASP (Kompetitive allele specific PCR) [23]markers for QTL with stable and major genetic effects; and (4) to identify candidate genes involved in the fat biosynthesis pathway.
A population of 329 RILs was derived from a cross between Yuhua 15 (female parent) and W1202 (male parent) by the corresponding author’s laboratory in 2012.Yuhua 15 is a peanut cultivar with high(54.0%)oil content released by the Institute of Industrial Crops,Henan Academy of Agricultural Science,in 2001.W1202 is a breeding line with relatively low oil content(52.6%)maintained in the author’s laboratory.F10RIL lines were obtained by single-seed descent (SSD) in the Chinese provinces of Hainan and Henan to reduce generation time.Single plants from parental lines and RILs were used for DNA isolation and sequencing and propagated by self-pollination for phenotyping.
The RIL population and the two parental lines were grown in Zhengzhou(Henan province)in 2018 and three locations(Zhengzhou and Shangqiu, Henan province; Weifang, Shandong province)in 2019.Twenty seeds of each line were sown in a 3.0 m × 0.5 m plot in a randomized complete block design with two replicates.After harvesting, approximately 10 dried plump seeds for each genotype were used to estimate the contents of oil, protein and fatty acids by near-infrared reflectance spectroscopy (NIRS)DA7200 (Perten Instruments(Beijing) Co., Ltd, China).A NIRS calibration model was developed using the values determined by gas chromatography (GC) and spectral data returned by NIRS and the coefficient of determination (R2) was above 0.90.The fatty acids assayed were palmitic (C16:0), stearic (C18:0), oleic (C18:1), linoleic (C18:2), arachidic (C20:0), behenic (C22:0), and arachidonic(C20:1) acids.
Genomic DNA was extracted from fresh leaves using the Plant Genomic DNA Kit (Tiangen Biotech (Beijing) Co., Ltd, China).DNA quality, concentration, and integrity were assessed with a NanoDrop-2000 spectrophotometer (Thermo Fisher Scientific,Waltham,MA,USA),a Qubit Fluorometer(Thermo Fisher Scientific,Waltham,MA, USA), and agarose gel electrophoresis.DNA passing quality control was randomly sheared by sonication, and fragments of approximately 300 bp were recovered by electrophoresis.DNA fragments with adapters were sequenced using the Illumina HiSeq Xten (Illumina, Inc., San Diego, CA, USA) platform with PE151.
Clean reads were obtained after filtering for adapters and lowquality reads (>50% of bases with a quality score ≤20 and ≥1% of missing bases) with SOAPnuke [25].Trimmed reads were aligned to the peanut reference genome (Arachis hypogaeavar.Tifrunnerversion 1) [20] using the aln command implemented in bwa-0.7.10 software[26].Uniquely mapped reads were used to identify SNPs with the GATK3.3.0 pipeline [27].In the raw SNP dataset,low-quality SNPs between the two parents were excluded based on missing values, heterozygosity, sequencing depth <10, and genotype quality (GQ) < 20.Homozygous and polymorphic loci between the two parents were used to genotype the RIL population.
A sliding-window approach for genotype calling and recombination breakpoint determination[28]was applied to convert SNPs into bin markers.These were used to construct a linkage map and perform QTL mapping using QTL IciMapping.First,the BIN module was used to remove redundant markers by missing rate,so that bin markers with fewest missing values were retained in each bin.Second,filtered markers were used to construct the linkage map with the MAP module.For grouping, the group number was set as 20,the peanut chromosome number.For ordering, REC and 2-OptMAP of the k-optimality method were chosen, as they give a higher proportion of correct orders in less time [29].Inclusive Composite Interval Mapping (ICIM) [30] was then used for QTL mapping, and the parameters were set as follows:mapping step was 0.1 cM and the LOD threshold was 3.0 calculated following Sun et al.[31].
Nine traits:the contents of oil, protein, and palmitic, stearic,arachidic,behenic,oleic,linoleic,and arachidonic acids,were measured in four environments(Zhengzhou,2018 and 2019;Shangqiu and Weifang, 2019) for the male parental line P1 (W1202), the female parental line P2(Yuhua 15),and 329 RILs.ANOVA revealed significant genotypic effects for all the traits (Table 1).P1 showed higher contents of oleic, behenic, and arachidonic acids, whereas P2 showed higher contents of oil, protein, and palmitic, stearic,linoleic, and arachidic acids (Table 1).For all traits and environments, wide phenotypic variation and transgressive segregation were observed in the RIL population(Table 1;Fig.S1).The CV ranged from 4.2%for oil content to 20.7%for arachidonic acid content and the broad-sense heritability ranged from 0.74 for linoleic acid content to 0.91 for behenic acid content (Table 1).
Oil content was negatively correlated with protein,arachidonic and oleic acids contents(-0.79 to-0.39)and positively correlated with arachidic, behenic, stearic, palmitic, and linoleic acids contents (0.87-0.12) (Table 2).Negative correlations were observed between protein and behenic, arachidic and palmitic acids contents(-0.83 to-0.33).Among fatty acids,a strong negative correlation was observed between oleic and linoleic acid (-0.91), as well as between stearic and arachidonic acid (-0.87), whereas astrong positive correlation was observed between stearic and arachidic acid (0.86).
Table 1 Basic statistics and genetic heritability of nine quality traits.
Table 2 Pairwise correlation among the contents of oil, protein, and six different fatty acids.
Whole-genome resequencing of the two parental lines and 329 RILs generated approximately 700 Gb of clean data (9.1 billion reads).For each sample, the proportions of mapped reads and of mapped reads with unique positions were>96%and>73%,respectively.The effective sequencing depths were 34.42× for P1 and 34.58× for P2 and ranged from 1.20× to 1.40× for the RILs(Table S1).The coverage was 99.1%for P1 and 98.5%for P2 and ranged from 52.0% to 64.0% for the RILs (Table S1).Following alignment and the application of the GATK protocol, 741,564 SNPs were obtained.Further filtering revealed 213,868 SNPs homozygous and polymorphic between the two parents,which were used to identify bin markers.
As the RIL population was sequenced at low depth, the SNP dataset was converted into bin markers using a sliding-window approach [27].In total, 7595 bin markers were detected, and eleven lines with >10% heterozygosity were removed from further analysis (Table S1).After redundant markers were filtered out,the remaining 4565 bin markers of 318 lines were used to construct a linkage map.Four bin markers remained unlinked and the remaining 4561 were assigned to 20 linkage groups (LGs)(Fig.S2, Table 3).As the total map length was 2032 cM, the mean map distance between markers was 0.45 cM(Table 3).The number of bin markers per LG ranged from 173 (LG11) to 323 (LG13), the LG length varied from 77.5 cM (LG20) to 170.2 cM (LG06), and the mean marker interval ranged between 0.37 cM (LG15) and 0.59 cM (LG06) (Table 3).The maximum marker interval was 13.4 cM on LG06, and >90% of marker intervals were <1 cM(Table 3).
Using the LOD threshold of 3.3,110 QTL were identified for the nine quality traits.QTL were distributed on all LGs but LG15 and LG19 (Tables 4, S3 and S4).Twelve QTL were mapped on LG05(Table 4), among of which, QTLqA05.1covered a region of 0.5 cM and was associated with all traits but linoleic acid content(Table 4).QTLqA05.1showed a negative additive effect on five traits(oil,palmitic, stearic, arachidic, and behenic acids content), which was observed in all four environments.This QTL also showed positive additive effects on protein, oleic, and arachidonic acids content in two or three environments.For the same trait,qA05.1showed consistent effects in different environments.TheqA05.1region flanked by markers bin1572 and bin1573 exhibited a considerable effect on the traits, being associated with PVE values of 10.4%-27.0%and LOD scores of 10.4 to 28.4 for oil content (Fig.1), stearic, arachidic, and behenic acids.Another major QTL on LG05,qA05.10,covered a region of 1.5 cM and showed pleiotropic effects on the contents of all fatty acids except arachidic and behenic acids(Table 4).The LOD score associated withqA05.10varied from 5.1 to 39.9, whereas the PVE values were 0.3%-15.2%.
Fig 1.LOD curves of oil and protein content across the whole genome under four environments.
Several QTL were mapped on LG08, 12 and 14 (Table S3).On LG08, a region of 2.6 cM, covered by QTLqA08.4andqA08.5, was associated with oil, protein, and behenic acid content, with LOD scores of approximately 5.7-14.7 and PVE values of approximately 3.9%-12.6%.Associations with oil and protein content were consistent for all four environments being tested, whereas association with behenic acids was confirmed for three environments(Table S3).A large genomic region containing several QTL with minor phenotypic effects was identified on LG12(Table S3).In particular, QTL for oil, protein, and behenic acid content that were consistently found in four environments were detected in regions spanning respectively 18.3, 7.40, and 17.6 cM.On LG14, QTL fromqA14.5toqA14.10, which were contained in the interval between 40.3 and 43.4 cM,were detected in four environments for oil,stearic acid, and arachidic acid content and in three environments for behenic acid content (Table S3).2018ZZ, Zhengzhou in 2018; 2019ZZ, Zhengzhou in 2019; 2019SQ, Shangqiu in 2019; 2019WF, Weifang in 2019.
Table 3 Statistics of the linkage groups (LGs) of the RIL population populated with bin markers.
Table 4 QTL identified on LG05 for oil, protein, and fatty acid content in four environments.
Among the other 69 QTL mapped on LGs, some showed pleiotropy for several traits and showed consistent effects in more than one environment (Table S4).In particular, a region of 3.4 cM covered by QTLqA06.3andqA06.4was associated with protein content in three environments with LOD ranging from 15.4 to 18.8 and PVE of approximately 10.8%-15.8%.QTLqA06.4was associated with behenic acid in all four environments, with LOD scores of 13.9-24.8 and PVE of 11.0%-17.6%.QTLqA06.6showed a major effect on arachidic acid content, with a PVE of 10.0% and a LOD score of 11.7.
For oil content,27 QTL were mapped on 12 LGs(Tables 4,S3 and S4).Among these QTL, those mapped on LGs 05, 08, 12, and 14 were detected in at least two environments.QTLqA05.1, detected in all four environments, displayed LOD scores of 13.6-26.9, PVE of 9.6%-22.7% and additive effects of -0.88 to -0.65 (Table 4).QTLqA08.4was detected in three environments and the neighboring QTLqA08.5was detected in the remaining environment(Table S3).QTLqA12.7andqA14.10were identified in two environments (Table S3).
Candidate genes in the intervals ofqA05.1,qA05.9andqA05.10on LG05,qA06.3andqA06.4on LG06,qA08.4andqA08.5on LG08,qA12.1toqA12.7on LG12,qA14.5toqA14.9on LG14 were predicted using the annotation database of reference genome and screened for SNPs polymorphic between the parents, revealing 84 SNPs in 71 genes (Table S5).Among these SNPs, 17 resulted in missense mutations (Table 5), whereas the remaining 67 were located in introns or resulted in silent mutations.Six SNPs covered byqA05.1could cause missense mutations of six genes and 11 other polymorphic SNPs inqA06.4,qA08.4,qA12.1,qA12.2,qA12.4andqA12.5could affect genes encoding various proteins(Table 5).
Table 5 Nucleotide types of SNPs located in candidate genes.
KASP markers were designed for the 17 SNPs associated with missense mutations(Table S6)and validated using the two parents and 44 lines of the RIL population displaying contrasting oil content.Two SNPs at sites Arahy05:6599714 and Arahy05:6709559 were strongly associated with oil content (Fig.2).The mean oil content was 55.4%in RILs carrying G and 50.6%in RILs carrying A at Arahy05:6709559 (Fig.3).Compared with the reference genome,the bases of the high oil-content parent Yuhua 15 at the sites Arahy.05:6599714 and Arahy.05:6709559 changed from C to A and A to G, respectively, and the encoded protein changed from proline (P) to threonine (T) and from tyrosine (Y) to cysteine (C)(Table 5).The two SNPs were included in the genesArahy.T0P5W2andArahy.YR3A5K, encoding a Scarecrow-like transcription factor PAT1-like and a galactosyl transferase GMA12/MNN10 family protein, respectively (Table S5).
Fig 2.KASP validations for the two SNPs on LG05.
Fig 3.Phenotypic difference between two genotypes at the SNP site Arahy05:6709559 for 44 lines of the RIL population displaying contrasting oil content.
Both marker number and marker density of the genetic map were greater than those reported for other recent peanut linkage maps [1,14-15,19-20].This finding might be attributable to the large size of the population used in this study and/or the wholegenome resequencing strategy adopted.The marker order of the genetic map was consistent overall with the physical order,except for two translocations between LG3 and LG13 and between LG6 and LG16, caused by assembly errors in the reference genome as described in PeanutBase (https://www.peanutbase.org/peanut_genome_v1_v2).
Among the 110 QTL detected for nine traits, 36 pleiotropic QTL were associated with two or more traits (Tables 4, S3 and S4).For pleiotropic QTL, the sign of the additive effect for the positively correlated traits was consistent with that of the correlation.ForqA05.1, the additive effect was negative for oil and its four positively correlated traits (palmitic, stearic, arachidic, and behenic acids) and positive for three traits (protein and oleic and arachidonic acids)negatively correlated with oil(Tables 2,4).The favorable alleles originated from the female parent Yuhua 15 for the traits with negative effect and from the male parent W1202 for the traits with positive effect.
For oil content,the interval of 0.5 cM spanned byqA05.1,corresponding to a 6.3-7.8 Mb physical region of chromosome A05,may be the same as that reported by Pandey et al.[2], flanked by the markers GM1878 and GM1890, which were mapped to the 6.4-10.9 Mb region of A05 [1].QTLqA08.4andqA08.5spanned 2.6 cM,corresponding to a physical distance of 37.0-38.2 Mb,close to the hotspot region (39.9-43.8 Mb) on chromosome A08 for genes controlling oil content, as reported by Liu et al.[1].
A total of 2559 genes were involved in the metabolism of fatty acids and lipid storage and are unevenly distributed on the 20 peanut chromosomes[12].The SNP Arahy.05:6709559 was located in an exon of the geneArahy.YR3A5Kencoding a galactosyl transferase GMA12/MNN10 family protein.This gene is involved in transferring glycosyl groups and xyloglucan metabolic processes inArabidopsis thaliana[32].In the high oil-content peanut cultivar,its mean expression level (1.29) was lower than that (4.74) in the low oil-content cultivar (unpublished transcriptome data).The SNP Arahy.05:6599714, located in the geneArahy.T0P5Wand encoding the Scarecrow-like transcription factor PAT1-like, is involved in phytochrome A signal transduction inArabidopsis[33].This gene may not be involved in the fatty acid biosynthetic pathway, as its expression level did not differ between the high and low oil-content cultivars.
The QTL identified for peanut quality traits in this study can be used in breeding special-purpose peanut cultivars.For peanut oil production, increasing seed oil content is the major objective of peanut breeding.The KASP markers developed from the two SNP sites covered byqA05.1and validated as being linked with oil content could increase the efficiency of high-oil breeding,but because oil content is a quantitative trait, further markers for oil content should be developed.
CRediT authorship contribution statement
Ziqi Sunperformed field experiments and phenotypic analysis and wrote the manuscript.Feiyan Qiperformed laboratory experiments and genotype analysis.Hua Liu and Li Qindeveloped the RIL population.Jing Xu and Zhongxin Zhangprovided help in field experiments.Lei Shi, Lijuan Miao, Xiao Wang, Mengdi Tian,Jingjing Feng, and Ruifang Zhaoprovided help in laboratory and field experiments.Bingyan Huang and Wenzhao Dongprovided help in designing the experiments.Xinyou Zhang and Zheng Zhengconceived and designed the experiments, facilitated the project, and assisted in manuscript preparation.All authors read and approved the final manuscript.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors thank Prof.Stefano Pavan (University of Bari Aldo Moro, Italy) for fruitful discussions of QTL analysis and for editing the English text of the manuscript.They also thank the AJE Company for editing the English text of the manuscript.This work was supported by the National Basic Research Program of China,Special Project for National Supercomputing Zhengzhou Center Innovation Ecosystem Construction (201400210600), Outstanding Young Scientists of Henan Academy of Agricultural Sciences(2020YQ08), Fund for Distinguished Young Scholars from Henan Academy of Agricultural Sciences (2019JQ02), China Agriculture Research System(CARS-13),Henan Provincial Agriculture Research System,China(S2012-5),and Henan Provincial Young Talents Supporting Project (2020HYTP044).The funding agencies played no role in the design of the study and collection, analysis, and interpretation of data or in writing the manuscript.
Appendix A.Supplementary data
Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2021.04.008.