Shuxiang Yan,Huaixin Li,Hongbo Chao,Jianjie He,Yiran Ding,Weiguo Zhao,Kai Zhang,Yiyi Xiong,Kang Chen,Libin Zhang,Maoteng Li,*
a Department of Biotechnology,College of Life Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,Hubei,China
b Key Laboratory of Molecular Biophysics of the Ministry of Education,Wuhan 430074,Hubei,China
c School of Agricultural Sciences,Zhengzhou University,Zhengzhou 450001,Henan,China
Keywords:Brassica napus Oil content QTL mapping Resequencing Transcriptome
ABSTRACT Rapeseed(Brassica napus)supplies about half of the vegetable oil in China.Increasing oil production and searching for genes that control oil content in the crop are research goals.In our previous studies,four major QTL for oil content located on A08,A09,C03 and C06 in the KenC-8×N53-2(KN DH)mapping population were detected.The parental lines were resequenced to identify structural variations and candidate genes affecting oil content in these four major QTL regions.Insertion-deletion (InDel) markers were developed and used to narrow the regions.Differentially expressed genes located in the regions were investigated.GO and KEGG analysis showed that several genes were associated with lipid metabolism.Several transcription factors with higher expression in N53-2 than in KenC-8 were identified.These results shed light on the genetic control of oil content and may be helpful for the development of highoil-content cultivars.
Brassica napus (AACC,2n=38) or rapeseed,ranking as the second largest oilseed crop after soybean in oil production [1],is an allotetraploid species derived from B.rapa (AA,2n=20) and B.oleracea (CC,2n=18) approximately 7500 years ago [2] and a source of edible oil and bioenergy [3].Increasing its oil content by breeding is a primary research goal.In modern breeding,DNA markers are extensively used to molecular breeding [4].With a mature second-generation sequencing technology[5]and the completion of genome sequences of B.napus [2,6],many single nucleotide polymorphisms (SNPs) and InDels have been identified and used as markers in genome-wide association studies (GWAS)[7]and high-density genetic linkage map construction for mapping quantitative trait loci (QTL) [8–10].
QTL mapping is a powerful method to study genetic mechanisms underlying quantitative traits such as oil content [9–12].QTL for oil content have been found on all 19 chromosomes,in numbers ranging from 3 to 27 per study[13–18].In these studies,some segregating populations used for QTL analysis were derived from parents showing little difference in oil content.For example,the oil contents of Darmor-bzh and Yudal were 46%and 48%,Rapid and NSL96/25 were 41% and 47%,and all of them were measured by near infrared reflectance spectroscopy (NIRS) [14].But 164 QTL were identified in the KN DH population derived from two parents with oil contents of 40% and 50%,including a few QTL with more than 20% phenotypic variation explained (PVE) [9].
Few genes associated with oil content in B.napus have been cloned.Low expression of UPL3 increased the expression of LEC2,indirectly increasing seed oil content during seed maturation[19].ORF188,cloned using comparative genomics and transcriptome analysis,increased oil content by 8% [20].But the genetic mechanism of controlling oil content in B.napus remains undetermined.
Multi-omics joint analysis is an effective strategy for identifying candidate genes [21].Combining transcriptome and QTL analysis will increase the detection efficiency of candidate genes progress[22,23].The nitrogen-use efficiency of sorghum was studied by QTL and transcriptome analysis,and some candidate genes involved in nitrogen stress were identified [24].Of more than 100 genes associated with resistance to root-knot nematode identified in one major QTL interval,transcriptome screening reduced the number of potential candidate genes to three [25].Combining QTL with transcriptomics,45 candidate genes controlling flowering time were refined from thousands of genes located in QTL intervals[26].In a study[21]of whitefly resistance in cabbage,several genes involved in the ABA signaling pathway were confirmed by combined analysis of a major QTL,metabolome,and transcriptome.
In our previous study,a high-density genetic linkage map containing 3207 markers with a total length of 3073 cM and a mean genetic interval of 0.96 cM was constructed using the Brassica 60 K array in the KN DH population,and 164 QTL for oil content,with PVE values ranging from 1.92% to 22.24%,were identified.Seven major QTL were located on chromosomes A08,A09,C03,C05,C06,and C09.The PVE of the four QTL cqOC-A8-4,cqOC-A9-9,cqOC-C3-6 and cqOC-C6-6 on A08,A09,C03,and C06,respectively,exceeded 10% in more than four trials [9].These four QTL are the most significant major QTL for oil content,and at the same time most likely to explain differences of oil content in the KN DH population.
The aim of the present study was to identify candidate genes controlling oil content in these four major QTL intervals.The experimental approach was to develop InDel markers by resequencing the parents N53-2 and KenC8,use them to narrow the intervals,and identify genes differentially expressed between the parents.
The KN DH population consisted of 348 doubled-haploid (DH)lines derived from F1microspore culture from the cross of KenC-8 and N53-2[27].KenC-8,the male parent,is a spring type B.napus with seed oil content of about 40%.N53-2,the female parent,is a winter type with seed oil content exceeding 50%.Of the 348 DH lines,300 were selected for further research.They were planted in Wuhan(WH),Hubei province,for eight years(2011–2018),Dali(DL),Shanxi province,for eight years (2008–2015),Sunan,Gansu province (GS),for two years (2010–2011),and Huanggang (HG),Hubei province,for one year (2010).Wuhan,Dali,Gansu,and Huanggang were treated as macroenvironments,and one year and one location was treated as a microenvironment.The trial named ‘‘18WH” was planted in Wuhan in 2018 and harvested in 2019.Among the four macroenvironments,DL was a winter environment,WH and HG were semi-winter environments,and GS was a spring environment.
Open-pollinated seeds from lateral branches were collected,and their oil content value was measured by nuclear magnetic resonance(NMR)[28]or NIRS[29].Each sample was measured three times and the average value was calculated.The oil content data of 2008-2013DL,2010-2011GS,2010HG,and 2011-2013WH were obtained from Chao et al.[9].The oil content data of 2014-2015DL,and 2014-2018WH were collected recently.
QTL were detected with Windows QTL Cartographer 2.5 software[30],at a threshold LOD score set to 2.5 using 1000 permutation tests.Additive effects of QTL were estimated by composite interval mapping [9].QTL thus identified were designated identified QTL.The identified QTL 18WH-OC-1 represents seeds sown in Wuhan in 2018,OC stands for oil content,and 1 represents the serial number of the QTL in the 18WH microenvironment.Identified QTL on chromosomes A08,A09,C03,and C06 were integrated by the meta-analysis model of BioMercator 4.2 software[31],and named consensus QTL.In our previous study [9],four major QTL for oil content:cqOC-A8-4,cqOC-A9-9,cqOC-C3-6,and cqOC-C6-6,were detected in 12 environments.To distinguish these QTL from those detected in the present study,we added ‘0’between A (or C) and the number following them (representing the chromosome).For example,cqOC-A8-4 was changed to cqOCA08-4.Thus,cqOC-A08-4 indicates the fourth consensus QTL for oil content on chromosome A08 and is referred to here as an original QTL.The most significant QTL with high LOD and PVE values in each of the four linkage groups was selected for further analysis.
In the previous study [9],whole-genome resequencing data of the parental lines KenC-8 and N53-2 were used for construction of the mapping population.In the present study,SNPs and InDels were detected by the UnifiedGenotype module of GATK 3.3 [32]and filtered with the VariantFiltration module,and SNP and InDel calling was based on the Bayes rule and genotype likelihood function calculation.Annotation of SNP was performed with ANNOVAR software [33].Using the resequencing of the two parents,new InDel markers were designed,evenly distributed in the QTL hotspot regions (QTL clustering regions).Primers amplifying InDel markers were designed with Oligo 7.0 software (http://www.oligo.net/),the target fragments were amplified by GeneAmp PCR System 9700 (Applied Biosystems,Foster City,CA,USA),and the lengths of amplification products ranged from 100 to 200 bp.Primers were selected by polyacrylamide gel electrophoresis and used for genotyping in the KN DH population.
The genotypes of all polymorphic markers were recalculated with Joinmap 4.0 software [34],and the consensus QTL were integrated with BioMercator 4.2 software once again.After genotyping of the KN DH population,an uppercase N (for ‘‘new”;e.g.,cqNOCA08-4) was added to distinguish the updated from the original QTL.The updated QTL shared the same markers or physical intervals with the original QTL.
KenC-8 and N53-2 were planted in Wuhan (WH) in October of 2017.Their respective initial flowering times were on March 13 and 19,2018 and their full flowering times were on March 24 and 26.To ensure that seeds collected for transcriptomic analysis were at similar developmental stages,flowers blooming on the same day at full flowering time were tagged as 1 DAF in full blooming stage,and seeds of 30 DAF were collected.Sequencing data were deposited in SRA database in NCBI (Accession number:SRA:PRJNA661261).
Total RNA was extracted with TRIzol reagent and its quality was tested by agarose electrophoresis and Nanodrop 2000 (Thermo-Fisher Scientific,Waltham,MA,USA).The first cDNA strand was synthesized by reverse transcription,and then sequenced with Illumina HiSeqTM(Illumina,San Diego,CA,USA).Clean reads were defined by Q20 (Phred value greater than 20) and Q30 (Phred value greater than 30).Samtools and Picard tools program were used to exclude repetitive sequences[32].FPKM(expected number of Fragments Per Kilobase of transcript sequence per Million base pairs sequenced) [35] was introduced to correct the gene expression level,and the FPKM threshold value was set to 1.The negative binomial distribution module of DESeq[36]was used to detect differentially expressed genes (DEGs),with a gene with at least twofold expression difference between the parents and an adjusted P-value ≤0.05 being assigned as differentially expressed.
Total RNA was extracted from 100-mg seed samples of the two parents at 30 DAF with a RNAprep Pure Plant Plus Kit (TIANGEN,Beijing,China) according to the manufacturer’s protocol,and mRNA expression levels of the DEGs in these seeds were measured by qPCR.One microgram of total RNA was used to synthesize cDNA by SuperScript III Reverse Transcriptase(Invitrogen,Waltham,MA,USA).Primers for 20 randomly selected DEGs were designed with Oligo 7.0 software,and all primer information is presented in Table S1.Actin-7 [37] was used as a reference gene.Each reaction was performed with three biological replicates,and fluorescence signal was collected at 72 °C with Step One Plus (Applied Biosystems,Foster City,CA,USA).The qPCR results were described by the 2-ΔΔCTmethod [38].
Three steps were used to search for candidate genes.First,the genetic intervals of the four major QTL were converted into physical intervals,and the marker sequences were screened by blastn against the reference genome ‘Darmor-bzh’ [9].If a marker had no physical location information,the sequences of the markers nearest to the QTL were searched.Second,genes with variations in the gene region or the flanking 1-kb regions were selected.Third,potential candidate genes were further screened by transcriptome analysis,and these genes with both differential expression(by transcriptome analysis)and sequence variation(identified by genome resequencing)were named differentially expressed and variant genes (DEVGs).
The DEGs and DEVGs in the four updated major QTL regions were subjected to Gene Ontology (GO,http://www.geneontology.org/) enrichment analysis using GOseq [39].Metabolic pathways associated with DEGs were identified in Kyoto Encyclopedia of Genes and Genomes (KEGG,https://www.genome.jp/kegg/) with KOBAS (2.0) software [40].GO and KEGG terms were selected at the adjusted P-value ≤0.05.
The epistatic effects of QTL were estimated using QTL IciMapping 2.2 software [41] with default parameters.Homologs of potential genes were retrieved from the Arabidopsis thaliana database (https://www.arabidopsis.org/).The STRING database(https://string-db.org/) was used to construct an interaction network for these genes,and Cytoscape 3.8.1[42]was used to display the interaction network.
A total of 208 QTL were identified,44 more than reported by Chao et al.[9].Of the 44,41 were located on chromosomes A08,A09,C03,C05,C06,and C09.Seven hot-spot QTL regions (Fig.S1)were observed in these linkage groups,in agreement with the results of Chao et al.[9] Of the QTL,58% were located on linkage groups A08,A09,C03,and C06,which harbored four QTL hotspot regions with higher LOD and PVE values than those on C05 and C09 (Figs.1A,S1).Four significant consensus QTL,cqOC-A08-4,cqOC-A09-9,cqOC-C03-5 and cqOC-C06-3,were detected in different trials in the QTL hot-spot regions and selected for further analysis (Table S2).The four major QTL regions contained 1713 genes.
A total of 631,629 SNPs and 237,726 InDels were polymorphic between KenC-8 and N53-2 (Fig.2A).Of these,7794 SNPs and 3882 InDels were present in the four major QTL regions but unevenly distributed across them(Fig.2B–E).The major QTL region on cqOC-A08-4 harbored a higher density of SNPs (1633 SNPs/Mb)and InDels (1079 InDels/Mb) than the other three major QTL regions,while the lowest density was found in cqOC-C06-3 (333 SNPs/Mb and 133 InDels/Mb).The mean SNP and InDel densities in the four major QTL regions were respectively 542 SNPs/Mb and 270 InDels/Mb,lower than those of the whole genome.The mean SNP density of cqOC-A08-4 and cqOC-A09-9 in the A subgenome was higher than that of cqOC-C03-5 and cqOC-C06-3 in the C subgenome,and a similar relationship was found for InDel density (Figs.2F,S2).
A total of 90 pairs of InDel primers were developed based on InDel variations,including 31,23,20 and 16 pairs for cqOC-A08-4,cqOC-A09-9,cqOC-C03-5,and cqOC-C06-3,respectively.Of these respectively 13,9,5,and 5 primers were obtained (Table S3),and all these markers were used for genotyping in the KN DH population(Fig.1B).Finally,a new linkage map containing 12,7,5,and 4 new InDel markers on A08,A09,C03,and C06,respectively,was constructed and named the updated map.The genetic map was highly colinear with the physical maps of the Darmor-bzh and Zhongshuang 11 genomes (Fig.S3).These results indicated that the new linkage map of A08,A09,C03,and C06 was of high quality and that either the genome of Darmor-bzh or Zhongshuang 11 could be used as a reference genome.
The QTL were rescanned using the same oil content data and the updated map (Fig.S4),and the four major QTL were remapped and named cqNOC-A08-4,cqNOC-A09-6,cqNOC-C03-5,and cqNOC-C06-4 (Table S4).The interval for cqNOC-A08-4(0.87 cM) was much narrower than that for cqOC-A08-4(2.08 cM).Because the cqNOC-A08-5 region showed high overlap with the cqNOC-A08-4 region,they were treated as one QTL.The main difference between the cqNOC-C03-5 and cqOC-C03-5 regions,both containing nine markers,was that the original marker B025K04 (M8) was replaced by a new marker C03-52935245(M39),leading to a narrower confidence interval for cqNOC-C03-5(2.33 cM) than for cqOC-C03-5 (2.52 cM).The cqOC-C06-3 and cqNOC-C06-4 regions shared two molecular markers,and the confidence interval of cqNOC-C06-4 was narrowed from 0.47 of cqOCC06-3 to 0.31 cM.Like cqNOC-A08-4 and cqNOC-A08-5,the two QTL cqNOC-C06-4 and cqNOC-C06-5 were also treated as one QTL sharing a common physical interval.However,cqNOC-A09-6 contained all three markers linked to cqOC-A09-9 and another five new markers,so that both its genetic and physical intervals were slightly larger than those of the original QTL (Fig.1D).
Fig.1.Four major QTL were updated by addition of new markers.The light blue dashed line parallel to the x-axis indicates the LOD score threshold of 2.5,the transverse lines of varying length represent QTL and their position.The legend describes 19 environments.An original(A)or updated(C)QTL hot-spot region is shown on chromosome C06.The other three QTL hot-spot regions are shown in Fig.S4.(B)Polymorphic InDel markers were used for genotyping in the KN DH population.M,P1 and P2,respectively,refer to 100-bp DNA marker,KenC-8,and N53-2.(D)Diagram showing the differences between the four original and updated QTL.Markers shown in the same color were shared by original and updated QTL,and the genetic distance was consistent.
In the four updated major QTL intervals,1396 annotated genes were obtained,fewer than the 1713 in the original intervals.Because gene sequence variation is the basis of gene function difference,we first characterized the sequence differences of these genes between the two parents.Among the 1396 genes,690 genes with single nucleotide or InDel variations were identified and 226 genes with effective variations(nonsynonymous single-nucleotide variations)were identified.Because an InDel is more likely to cause gene mutation than a SNP in a gene coding region[43],we considered all genes with InDels as effective variations,and 404 genes with InDel variations were detected in coding regions,of which 174 harbored SNP variations at the same time.Thus,there were 456 genes with effective variations in gene regions.In addition,208 genes harbored variation in the 1 kb flanking regions of gene,making a total of 664 variant genes (genes with sequence differences between the two parents) were obtained.Among these 664,there were 207,80,81,and 296 genes in cqNOC-A08-4/5,cqNOC-A09-6,cqNOC-C03-5,and cqNOC-C06-4/5,respectively.The variant gene density of cqNOC-A08-4/5 (128 genes/Mb) and cqNOC-A09-6 (139 genes/Mb) was higher than those of cqNOCC03-5 (73 genes/Mb) and cqNOC-C06-4/5 (44 genes/Mb).
There were 48,280 DEGs differentiating KenC-8 and N53-2 on 19 linkage groups,and 605 DEGs were observed in the four major QTL regions.Among these 605,542 were up-regulated in N53-2 and 63 genes were up-regulated in KenC-8.Of 20 genes from the four major QTL regions randomly selected for qPCR analysis,most showed expression consistent with that of the transcriptomic experiment (Fig.S5).
Compared with KenC8,the number of up-regulated genes in N53-2 was almost equal to that of down-regulated genes in most regions of the 19 linkage groups(Fig.2A),but they were asymmetrically distributed in cqNOC-A09-6 and cqNOC-C06-4/5.For example,the number of up-regulated genes (437) was much larger than that of down-regulated genes (8) in cqNOC-C06-4/5 (Fig.2B),suggesting that these DEGs might be associated with oil content.In the four major QTL regions,635 genes were co-expressed in the two parents,318 were expressed specifically in N53-2,and 61 genes were expressed specifically in KenC-8 (Fig.3A).There were 68,67,25,and 445 DEGs in the confidence intervals of cqNOC-A08-4/5,cqNOC-A09-6,cqNOC-C03-5,and cqNOC-C06-4/5,respectively (Fig.3B).Most up-regulated genes from N53-2 were in the confidence interval of cqNOC-C06-4/5;however,the upregulated genes from KenC-8 were distributed mainly in cqNOCA08-4/5.QTL cqNOC-A09-6 (98.85 genes/Mb) and cqNOC-C06-4/5(1.19 genes/Mb) had the highest and lowest density of DEGs,respectively.The mean density of DEGs of cqNOC-A08-4/5 and cqNOC-A09-6 (79.17 DEGs/Mb) was higher than that of cqNOCC03-5 and cqNOC-C06-4/5 (44.34 DEGs/Mb).
Fig.2.Distribution of SNPs and InDels.(A) Distribution trend of SNPs and InDels on the whole genome:sections 1 and 2 represent the numbers of SNP and InDel markers,respectively;sections 3 and 4 represent the numbers of down-regulated and up-regulated genes,respectively.(B) Highly magnified regions of the four major QTL;only cqNOC-C06-4,5 is shown here.Comparison of the distribution of SNPs(C)and InDels(D)between the four major QTL regions and their whole chromosomes.(E)Numbers of SNPs and InDels in four major QTL regions.(F)Density of SNPs and InDels in the four major QTL regions.A represents the mean density of SNPs and InDels in cqOC-A08-4 and cqOC-A09-9.M indicates the mean density of SNPs and InDels in the whole genome.C refers to the mean density of SNPs and InDels in cqOC-C03-5 and cqOC-C06-3.SNPs and InDels were detected by GATK 3.3,and SNP calling was determined by Bayes rule and genotype likelihood.
A total of 297 DEVGs were detected by the integration of genome resequencing and transcriptomics in the four updated major QTL regions.Among these,respectively 247 and 50 genes were up-regulated in N53-2 and KenC-8 (Table S5).Of the 297 genes,204 were assigned to cellular component,molecular function,and biological process categories.In the cellular component,‘‘intracellular part” and ‘‘organelle” were the most significant GO terms,and more than half (34 of 54) of genes were enriched for these two GO terms.Most of the 54 genes were assigned to intracellular components and located in organelles such as ribosome,mitochondrion,and Golgi apparatus.In molecular function,the top four significant GO terms were ‘‘ligase activity”,‘‘structural constituent of ribosome”,‘‘carbohydrate derivative binding” and‘‘ion binding”.These results indicated that the associated genes have the function of ligation with ions and carbohydrates,suggesting that they function in the process of assembling small molecules into macromolecules.In biological process,the top five significant GO terms were‘‘cellular biosynthetic process”,‘‘organic acid metabolic process”,‘‘lipid metabolic process”,‘‘cellular nitrogen compound metabolic process”,and ‘‘organonitrogen compound metabolic process”,of which approximately 96% genes were up-regulated in N53-2 (Fig.3C).The genes enriched for the five GO terms were associated with substance metabolism,such as Synthesis of organonitrogen compounds and lipids.Besides the five subcategories,there were some other important GO terms.For example,‘‘signaling” genes might be involved in signal transduction.
Fig.3.Analysis of genome resequencing and transcriptome in four major QTL regions.(A)Venn diagram showing commonly and specifically expressed genes in N53-2 and KenC-8(B)Number of DEGs in four major QTL regions.(C)The significant GO terms in three categories.(D)DEVGs associated with the GO term‘‘lipid metabolic process”.The FPKM threshold value was set to 1,and the significance thresholds of both DEGs and GO terms were set to the adjusted P-value ≤0.05.
Among the 297 DEVGs,64 DEVGs participated in 53 KEGG pathways.The most significant pathways (P-value <0.05) were ‘‘glycine,serine and threonine metabolism”,‘‘cutin,suberine and wax biosynthesis” and ‘‘sphingolipid metabolism”.However,there were only seven genes in these.Among the 53 KEGG pathways,nine (‘‘cutin,suberine and wax biosynthesis”,‘‘sphingolipid metabolism”,‘‘fatty acid elongation”,‘‘glycerolipid metabolism”,‘‘glycerophospholipid metabolism”,‘‘linoleic acid metabolism”,‘‘biosynthesis of unsaturated fatty acids”,‘‘a(chǎn)lpha-linolenic acid metabolism”,‘‘fatty acid biosynthesis”,and ‘‘fatty acid metabolism”) belonged to lipid metabolism,accounting for 17%of the total pathways.Other pathways were also important in biological processes,such as ‘‘citrate cycle (TCA cycle)”,‘‘carbon metabolism”,and ‘‘starch and sucrose metabolism”,and the genes involved might function in substance exchange and transformation.Genes of ‘‘MAPK signaling” might regulate the efficiency of product synthesis.‘‘Flavonoid biosynthesis” genes are beneficial in plant disease prevention.
The GO term‘‘lipid metabolic process”contained 10 genes from the four major QTL regions (Fig.3D),and all of them were upregulated in N53-2,possibly contributing to oil content.Eight genes (BnaA08g09710D,BnaA09g48250D,BnaC06g26180D,BnaC06g26670D,BnaC06g28830D,BnaC06g31770D,BnaC06g 32770D,and BnaC06g34390D) were involved in nine lipid metabolism pathways.‘‘Fatty acid elongation”,‘‘biosynthesis of unsaturated fatty acids”,and ‘‘fatty acid metabolism” pathways shared the same gene,BnaC06g28830D.Similarly,BnaC06g26670D participated in ‘‘linoleic acid metabolism” and ‘‘a(chǎn)lpha-linolenic acid metabolism” pathways.The GO term ‘‘fatty acid biosynthetic process” was derived from the GO term ‘‘lipid metabolic process”,which contained only three genes,BnaA09g48250D,BnaC06g26670D and BnaC06g28830D,in the four QTL regions(Fig.S6),suggesting that they were closely associated with lipid metabolism.In addition to these 10 genes,there were other genes directly involved in the lipid metabolism.For example,the gene BnaC06g30250D from the ‘‘phospholipid transport” of biological process was also involved in the ‘‘lipid transporter activity” of molecular function,which might be associated with lipid transport.Further study showed that the genes directly associated with lipid metabolism were all from biological process and molecular function process,and 73 of 204 genes were shared by the two categories.There were some genes that might be indirectly involved in lipid metabolism,such as BnaA08g11450D,BnaA09g48390D,and BnaC06g27700D,which were shared by the ‘‘cellular biosynthetic process” of biological process and ‘‘carbohydrate derivative binding” of molecular function,potentially functioning in providing the carbon skeleton for the synthesis of fatty acids.BnaA08g10700D and BnaC06g27300D were both involved in the‘‘citrate cycle (TCA cycle)” and ‘‘carbon metabolism” pathways,and BnaA08g10130D was involved in the ‘‘starch and sucrose metabolism” pathway,which might provide precursor substances for fatty acids.BnaC06g33910D participated in the ‘‘MAPK signaling” pathway,which could affect the synthesis of very long chain fatty acids (VLCFAs) [44].BnaC06g28360D from the ‘‘pyruvate metabolism”pathway might affect lipid metabolism by regulating the metabolism of pyruvate and its derivatives.KEGG analysis also verified the role of the above genes in the process of fatty acid metabolism(Fig.4).The gene CAC2(BnaA09g48250D)was involved in the initial steps of fatty acid formation and could affect the synthesis of all fatty acids.KCR1(BnaC06g28830D)was involved in the elongation of VLCFAs and in the formation of triacylglycerol.The two genes SPHK2 (BnaA08g09710D) and SBH2 (BnaC06g32770D)were associated with sphingolipid metabolism pathways.The oxidation–reduction reaction of linoleic acid and α-linolenic acid could be triggered by the gene LOX6 (BnaC06g26670D).The gene CLO4 (BnaC06g31770D) has the function of converting oleic acid into epoxystearic acid,which belonged to the biosynthesis process for cutin,suberine,and wax.
Transcription factors (TFs) function in the regulation of gene function and metabolism [45].A total of 39 TFs were found in the four major QTL regions,among which 15 were DEVGs at the stage of 30 DAF.Among the 15 TFs,more than 60% were located in cqNOC-C06-4/5,and 13 TFs were expressed in seed,including 11 up-regulated and 2 down-regulated genes in N53-2(Table S6).Among the 13 TFs,BnaC06g27170D was associated not only with seed development but also with flower development.BnaC06g31830D and BnaC06g33640D function in regulating the jasmonic-acid-mediated signaling pathway to regulate defense response.BnaC06g30320D participates in the abscisic acid pathway and cellular response to hypoxia.BnaC03g63240D is present in cytoplasm and involved in signal transduction.BnaC06g30110D is located in the nucleus and has the function of regulating gene expression.These TFs were expressed in seed and some of them may be involved in lipid metabolism.
Analysis of epistatic QTL for oil content revealed two pairs of interacting loci among the four major QTL and two regions with no QTL detected on the A10 linkage group,and one each of their interacting loci were located in cqNOC-C06-4/5 (Fig.5A).The two epistatic loci explained respectively 10% and 15% of phenotypic variation.These results suggested that some genes in cqNOC-C06-4/5 might play a key role in their interactions.
To directly study the interactions between these DEVGs and other genes and further search for potential candidate genes involved in lipid metabolism,an interaction network was constructed from the DEVGs.There were 267 nodes and 852 edges in the network,and 86(including 11 TFs)of the 297 DEVGs showed interacting genes.The interacting genes were assigned mainly to six categories:‘‘signaling”,‘‘transcription factor related”,‘‘organic substance”,‘‘lipid metabolic”,‘‘carboxylic acid metabolic”,and‘‘carbohydrate derivative” (Fig.5B),which represented relatively basic metabolic pathways in biological processes.Among the 86 genes,some were associated with ‘‘lipid metabolic” genes.For example,CAC2,a subunit of heteromeric ACCase and encoded by BnaA09g48250D,had a direct relationship with more than half of the‘‘lipid metabolic”genes,suggesting that this gene might be closely associated with lipid metabolism.FAB1C,a member of the FAB1 gene family,affects the function of FAB1A and FAB1B,and the two genes promote the synthesis of phosphatidylinositol 3,5-bisphosphate.KCR1 interacted with FAB1,PAS2,EMB3147,CER10,and AT5G59770.Of these genes,PAS2,together with CER10,functioned in the synthesis of VLCFAs.Indirect interactions between the 86 genes and genes of ‘‘lipid metabolic” were also observed.
LIG6,NRPB2,CAK4,emb1507,AT1G71710 and AT1G70650 (all from the 86 genes) could indirectly interact with ATSAC1 (from ‘‘lipid metabolic”) via MAC3A (from ‘‘carbohydrate derivative”) to FRA3(from the 86 genes).There were 11 TFs in the 86 genes,and their interaction network contained 11 nodes and 24 edges.Four TFs(AGL12,RRTF1,PAN,and bZIP19) showed direct interactions with‘‘transcription factor related” genes.CHB3 was directly associated with CDC5 and NRPD1B from‘‘carboxylic acid metabolic”.The other six TFs found indirect interactions with genes from the other four categories by another interaction genes,indicating that the regulation by TFs of target genes was complex.
The densities of SNPs and InDels in the four major QTL regions were higher in the A than in the C subgenome,and this trend held for the full genome.This finding suggested that the C genome is more highly conserved than the A genome over evolutionary history,possibly as a consequence of multiple introgressions from B.rapa into the A genome of B.napus [46].In contrast,A subgenome is associated with plant defense,and some important agronomic traits,such as lipid accumulation and flowering [2,46].These results indicate a high intensity of selection in the A genome in evolutionary and breeding history.
Fig.4.The lipid metabolism pathway based on the DEVGs in the four major QTL regions.The red labeled genes are DEVGs.Overlapping arrows indicate that more than two steps are omitted.
Fig.5.Epistatic interaction for oil content and interaction network of the DEVGs in the four major QTL regions.(A) Epistatic interaction of oil content on 19 chromosomes.Different chromosomes are shown in different colors.Dashed lines indicate an epistatic interaction between the two loci,and the value along the line is the LOD score.The number on the circle identifies the locus of epistatic interaction.(B) Interaction network shows the role of DEVGs in biological processes.The two middle circles with pink ellipses and red triangles are DEVGs from the four major QTL regions,with ‘‘Cluster a” representing 11 TFs and ‘‘Cluster b” representing 75 DEVGs without TFs.
The genetic or physical intervals of the updated QTL cqNOCA08-4/5,cqNOC-C03-5,and cqNOC-C06-4/5 were smaller than those of the original QTL.However,both the genetic and physical intervals of cqNOC-A09-6 were slightly longer than those of cqOC-A09-9,possibly as a result of the very high marker density of our linkage map,and the marker number was near saturation in some local regions of the genome,it was necessary to construct advanced segregation lines to further locate the genetic locus.A total of 23 polymorphic markers were in the updated QTL physical intervals,while only 12 markers were detected in the updated QTL genetic intervals,and 11 markers were detected in the neighbor QTL regions.Considering that cqNOC-A08-5 and cqNOC-A08-4,or cqNOC-C06-4 and cqNOC-C06-5 were the same QTL,we speculated that most QTL in hot-spot regions share common physical intervals and represent only one or a few QTL,and that these four QTL represent most of the QTL in hot-spot regions.
These four major QTL physical intervals were in accord with those identified in other studies;the intervals in other studies overlapped for cqNOC-A08-4 [47],cqNOC-C03-5 [48],and cqNOCC06-4 [49],and very nearly with cqNOC-A09-6 [47,49],showing high repeatability and confidence in different populations.Still,it was difficult to obtain accurate candidate genes by QTL mapping alone.Multi-omics analysis could improve the efficiency of candidate gene identification [21],and is frequently used to find candidate genes.An integrated omics analysis was performed to search for the oil content candidate genes in B.napus by the combination of transcriptome-wide association studies and GWAS [50],GWAS and transcriptome analysis [51],proteomic analysis and genomic analysis [52],and linkage mapping and GWAS analysis [49].The combination of transcriptome and QTL analysis was also applied to different traits to discover candidate genes controlling phenotypes such as pod number[53]and seed aging[54].In the present study,the integrated analysis of QTL,resequencing of parental lines,and transcriptomics were performed to identify candidate genes in four major QTL for oil content.In contrast to the above methods,we also performed resequencing of the two parents and used it to search for oil-content candidate genes.Highquality reads were obtained by the resequencing of the two parental lines,and high-quality bioinformatics of these reads required an available reference genome [55].The present linkage maps were highly colinear with the physical maps of the Darmor-bzh and Zhongshuang 11 genomes(Fig.S3).With the combined resequencing and transcriptome analysis,the number of genes in the four major QTL regions was reduced from 1713 to 297 by exclusion of genes without variations or differential expression.More than 80% of them were up-regulated in N53-2.GO and KEGG analysis suggested that these DEVGs were associated with primary metabolism in biological processes,such as the ‘‘pyruvate metabolism”pathway,as also reported by Chao et al.[9].
Both B.napus and A.thaliana belong to Brassicaceae,and they have a close relationship[56].Among 297 DEVGs,some candidate genes for oil content were identified.The genes BnaA09g48250D,BnaC06g26670D,and BnaC06g28830D were homologous to AT5G35360,AT1G67560,and AT1G67730,respectively,and they were involved in metabolic pathways of ‘‘lipase”,‘‘fatty acid elongation”,and ‘‘plastidial fatty acid synthesis”,respectively.CAC2,which was reported to be involved in seed oil biosynthesis in A.thaliana [57],encodes a subunit of heteromeric ACCase that took part in the initial process of fatty acid biosynthesis and converts acetyl-CoA to malonyl-CoA [58].AT1G67560,which encodes LOX6(a member of the lipoxygenase family),is expressed mainly in seedling tissues and embryogenesis and functions in the synthesis of fatty acid derivatives such as jasmonates[59].We speculate that LOX6 participates in lipid transport from other tissues to seed embryos during fatty acid metabolism.In A.thaliana,AT1G67730(KCR1),one of two beta-ketoacyl reductases(KCR)genes,is located in the endoplasmic reticulum,chloroplast and mitochondria.KCR1 participates in the elongation of VLCFAs,including the formation of triacylglycerol and sphingolipids[60],and was also identified in B.napus by a mixed linear model based on SNP [61] and by GWAS analysis [62].The KCR1 gene was also identified on C06 in the present study,in contrast to the finding of Qu et al.[62].The relative expression levels of KCR in high-and low-erucic acid varieties differed during seed development [63],a finding consistent with ours.Some genes in the four major QTL regions may be indirectly associated with lipid metabolism,such as BnaC06g28360D,which functions in pyruvate metabolism.Pyruvate is a glycolysis product and a substrate for the synthesis of fatty acids and other macromolecules [64].Another gene,BnaC06g33910D,was homologous to AT1G72770 (HAB1),involved in the abscisic acid (ABA) signal transduction pathway [65].
TFs also function in lipid metabolism [45].Some TFs regulate fatty acid biosynthesis in A.thaliana,including LEC1 [66],LEC2[67],and WRI1 [68].In the present study,the distribution of TFs was uneven in the four major QTL regions,and more than 60% of TFs with DEVGs were located in cqNOC-C06-4/5.In our previous study[69],epistatic interactions occurred more frequently in some intervals of C06.Coincidentally,most of the epistatic loci were in or near cqNOC-C06-4/5,suggesting that cqNOC-C06-4/5 might be a hot region of TFs and that these TFs might indirectly control some traits including oil content in B.napus.For example,CHB3 could interact with ATSAC1 by interacting with intermediary genes from the‘‘carboxylic acid metabolic”and‘‘carbohydrate derivative”categories.The interactions centered on these DEVGs were mainly primary carbohydrate metabolism,such as ‘‘lipid metabolic”,‘‘signalling” and ‘‘carbohydrate derivative”.
In this study,we integrated genome resequencing and transcriptome,reducing the number of genes in the four major QTL regions from 1713 to 297.Among the 297 DEVGs,some were associated with lipid metabolism such as BnaA09g48250D,BnaC06g26670D,and BnaC06g28830D.TFs,such as BnaA08g11190D,BnaC06g30310D,and BnaC06g33640D may function in lipid metabolism.
Four major QTL for oil content represented the regions on 19 linkage groups most strongly associated with oil content variation in the KN DH population.New InDel markers were developed and added to these QTL regions,and their intervals (except for cqOCA09-9) were narrowed.Candidate genes for oil content were further selected by integration of resequencing and transcriptome analysis.Finally,297 DEVGs were identified.Some were directly associated with lipid metabolism,such as BnaA09g48250D (CAC2),BnaC06g26670D (LOX6),and BnaC06g28830D (KCR1).TFs such as BnaA08g11190D (CHB3),BnaC06g30310D (PAN),and BnaC06g33640D (JAZ6) were expressed in seeds and upregulated in N53-2,and might play a key role in lipid metabolism.However,there are still many genes in these regions.Advanced segregation lines or more effective technologies should be used to further narrow these confidence intervals,and field experiments must also be performed to verify the role of candidate genes in lipid synthesis.The present results provide a foundation for further research on the cloning and functional analysis of oil-content genes,and may be helpful for developing new cultivars with high oil content.
CRediT authorship contribution statement
Shuxiang Yan:Writing– original draft,Data curation,Investigation,Formal analysis.Huaixin Li:Investigation.Hongbo Chao:Data curation.Jianjie He:Data curation.Yiran Ding:Data curation.Weiguo Zhao:Investigation.Kai Zhang:Investigation.Yiyi Xiong:Investigation.Kang Chen:Investigation.Libin Zhang:Investigation.Maoteng Li:Conceptualization,Resources,Supervision,Writing– review &editing.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This study was supported by the National Natural Science Foundation of China (31871656 and 32072098).
Appendix A.Supplementary data
Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2022.01.002.