Xinpeng Qi,Bingjun Jing,Tingting Wu,Shi Sun,Cijie Wng,Wenwen Song,Cunxing Wu,Wensheng Hou,Qijin Song,Hon-Ming Lm*,Tinfu Hn,*
a MARA Key Laboratory of Soybean Biology (Beijing),Institute of Crop Sciences,Chinese Academy of Agricultural Sciences,Beijing 100081,China
b Centre for Soybean Research of the State Key Laboratory of Agrobiotechnology and School of Life Sciences,The Chinese University of Hong Kong,Shatin,Hong Kong,China
c Soybean Genomics and Improvement Lab,USDA-ARS,Beltsville,MD 20705,USA
Keywords:Soybean Widely planted cultivars Genomic re-sequencing Breeding strategy
ABSTRACT Soybeans specially the widely planted cultivars have been dramatically improved in agronomic performance and is well adapted to local planting environments after long-time domestication and breeding.Uncovering the unique genomic features of popular cultivars will help to understand how soybean genomes have been modified through breeding.We re-sequenced 134 soybean cultivars that were released and most widely planted over the last century in China.Phylogenetic analyses established that these cultivars comprise two geographically distinct sub-populations:Northeast China (NE) versus the Huang-Huai-Hai River Valley and South China (HS).A total of 309 selective regions were identified as being impacted by geographical origins.The HS sub-population exhibited higher genetic diversity and linkage disequilibrium decayed more rapidly compared to the NE sub-population.To study the association between phenotypic differences and geographical origins,we recorded the vegetative period under different growing conditions for two years,and found that clustering based on the phenotypic data was closely correlated with cultivar geographical origin.By iteratively calculating accumulated genetic diversity,we established a platform panel of cultivars and have proposed a novel breeding strategy named‘‘Potalaization”for selecting and utilizing the platform cultivars that represent the most genetically diversity and the highest available agronomic performance as the ‘‘plateau” for accumulating elite loci and traits,breeding novel widely adapted cultivars,and upgrading breeding technology.In addition to providing new genomic information for the soybean research community,the ‘‘Potalaization” strategy that we devised will also be practical for integrating the conventional and molecular breeding programs of crops in the post-genomic era.
Soybean [Glycinemax (L.) Merr.] which was domesticated in Asia~5000 years ago[1]has become a crop of global economic significance,and is listed as the top crop on the American continent in terms of planting area (USDA Foreign Agricultural Service.http://www.fas.usda.gov/).In 2019/2020,the world soybean production was 336.47 Mt,and Brazil,the United States,and Argentina are the three largest soybean producers making up 80.7%of global soybean production (USDA Foreign Agricultural Service,http://www.fas.usda.gov/).In China,which is currently the 4th largest producer,soybean is grown over a large latitudinal range and in diverse ecological environments and cropping systems that extend from the frigid regions of Northern China to the tropical regions of Southern China [2].
As the center of origin,China maintains the highest genetic diversity of soybean germplasm [3].Modern soybean breeding began in China in 1913[2],which was approximately 30 years earlier than in North America [4].The first officially released soybean cultivar,‘Huangbaozhu’,was reported in 1923 [5],and >2700 soybean cultivars have been released since then.As a result of the century-long intensive breeding efforts,approximately 5% of the released cultivars have been widely planted by farmers.These soybean cultivars are well adapted to the diverse environments and have stable agronomic performance traits,such as plant architecture [6],seed quality [7],and photo-thermal sensitivity [8].
Since the release of the soybean genome sequence in 2010[9],a number of studies on soybean genomics,domestication,and evolution have been reported [10–14].These studies have established that the gene pool of wild soybean is more diverse than that of domesticated soybean.The selection footprint in the soybean genome provides new insights into domestication (from annual wild to cultivated soybean) and subsequent improvements (from landraces to modern soybean cultivars).Furthermore,genome-wide association studies (GWASs) have been undertaken to identify genomic regions associated with domestication-related and agronomically important traits such as plant height,flowering time,and maturity date [12,15–17].Although such studies have helped us to understand the genetic basis of the increases in soybean yield[18],their practical application is limited.
In this study,we re-sequenced 134 of the most widely planted soybean cultivars released in the last 100 years in China.In order to develop a breeding platform that is highly informative and is ready to be integrated into modern breeding programs,we developed a breeding strategy called ‘‘Potalaization”,inspired by the Potala Palace situated at an altitude of~ 3700 m above sea level in the Lhasa River Valley on the Tibetan Plateau.Briefly,‘‘Potalaization”refers to the selection of a small collection of elite cultivars that offer the highest available agronomic performance while retaining the broadest genetic diversity.Being complementary to‘‘pyramiding”,a strategy in which multiple desired genes are assembled into a single genotype,Potalaization identifies platform cultivars from available elite germplasm resources that are ideal for breeding applications such as generating populations for recurrent selection.
A total of 134 of the most widely planted soybean cultivars were selected for genome re-sequencing,including 80 cultivars from Northeast China,38 cultivars from the Huang-Huai-Hai River Valley,and 16 cultivars from South China.These widely planted cultivars were officially released from 1923 to 2009,with 94 out of 134 having documented pedigrees (Table S1).They are well adapted to local environments and represent different ecotypes(North-spring sowing,Huang-Huai-Hai-spring sowing,Huang-Huai-Hai-summer sowing,South-spring sowing and Southsummer sowing) in China [8,19,20].Two wild soybean accessions collected from Northeast China (Heilongjiang,China),one wild accession from the Huang-Huai-Hai River Valley (Hebei,China),and one wild accession from South China (Guangxi,China) were also included as references (Table S1).
Soybean seeds were obtained from breeders or from the National Crop Genebank of China,Institute of Crop Sciences,the Chinese Academy of Agricultural Sciences (CAAS),Beijing,China.The seeds were planted in pots at the CAAS campus.A typical individual plant for each cultivar was selected and propagated;DNA was extracted from each cultivar.
Genome sequencing libraries were constructed according to the Illumina manufacturer’s instructions.We generated >1.26 Tb of data (13.98 billion paired-end reads of 90 bp each) on an Illumina HiSeq2000 sequencer.Raw reads were quality-trimmed using TRIMMOMATIC [21] (ILLUMINACLIP:adaptor.seq:2:30:10 TRAILING:3 SLIDINGWINDOW:4:10 MINLEN:20).Reads from the 302-soybean accessions re-sequencing project [12] were processed using the same parameters.Clean reads were then mapped to the soybean reference genome [9] (versionWm82.a2.v1) using BWA-mem [22] with the default parameters.
SNPs and InDels were called using the Genome Analysis Toolkit(GATK) HaplotypeCaller [23] (-stand_call_conf set to 30.0,-stand_emit_conf set to 10.0,and -glm set to BOTH).The variants were then recalibrated by the Gaussian mixture model,and outliers were discarded.Variants were further filtered using BCFtools(version 1.2,QUAL ≥ 50.0,DP ≥ 5.0,QD ≥ 5.0,MQ ≥ 30,MAF ≥0.03,Coverage ≥90%).InDels longer than 6 bp were discarded.After recalibration using the GATK VariantRecalibrator,the variants were further filtered by MAC ≥20.SNP and InDel loci distributed over the 20 soybean chromosomes were displayed using visualization tool Circos [24].
Bi-allelic SNPs identified from the 134 widely planted soybean cultivars and from the 302-soybean accessions re-sequencing project were further filtered by minor allele frequency ≥0.05,and missing data frequency ≤0.2.Four-fold degenerate SNP sites [25]were extracted (MAC ≥100 and AN ≥800) and pruned by PLINK[26] (version 1.90b3b) using the default parameters.A maximum-likelihood phylogenetic tree was constructed using PHYLIP [27] (version 3.695,1000 permutations) based on these SNPs.Tightly-linked SNPs were filtered using the R function snpgdsLDpruning (LD threshold equal to 0.2) in the SNPRelate package [28] (version 1.2.0).Principle component analysis (PCA)was also performed using the SNPRelate package (version 1.2.0).
Population structure analysis was performed using STRUCTURE[29] (version 2.3.4,500,000 burn-in period,1,000,000 repeats,K values of 2,3,4,and 5).After pruning,the SNPs were used to identify linkage disequilibrium patterns using Haploview [30] and to calculate genetic diversity by VCFtools [31].
Copy number variations(CNVs)were identified by CNV-seq[32]based on pooled reads from cultivars planted in the same geographic region.Genes affected by CNVs were annotated based on the soybean reference genome [9].
Whole genome scanning of selective regions was performed by XP-CLR [33] (parameters:gWin=0.005;snpWin=100;grid-Size=100).The genetic map required by XP-CLR was a pseudomap constructed with reference to the 302 soybean accessions re-sequencing project [12].In brief,the genetic position of each SNP was assigned linearly along the genetic map from SoyBase[34] under the assumption that recombination between mapped markers is uniform [12].
Selective regions were then identified reference to a prior study[35].In brief,the soybean genome was divided into 10 kb segments,and the average cross-population composite likelihood ratio (XP-CLR) score was calculated for each segment.Segments in the top 20% percentile of XP-CLR scores were then selected.Selected segments that were separated by no more than one low XP-CLR score segment were merged to form candidate selective regions.The candidate selective regions with the top 10 percentile highest segmental XP-CLR scores were then identified.Selective regions of<30 kb in length were discarded because of the short history of modern soybean breeding.
The windowed Weir and Cockerham fixation index (Fst) was calculated by VCFtools [31] with a window size of 30 kb and step size of 30 kb.Nucleotide fixation locus was defined as where one sub-population maintains a monotypic genotype while the other exhibits polymorphic genotypes [36].Genes in selective regions were categorized by WEGO [37] according to their gene ontology(GO) annotation.
Pedigree analysis was conducted using IBDseq [38] (LOD cutoff of 3.0) on the 94 widely planted cultivars that have documented pedigrees.
GO term enrichment analysis was performed on the AgriGO web server [39].KEGG pathway module enrichment analysis was performed with the R package clusterProfiler [40].
The detailed phenotyping method was described in our previous study where we evaluated the changing trend of photothermal sensitivity over the modern breeding history of soybean based on 63 widely planted cultivars [8].In brief,the experiment was conducted using a completely randomized design.Plants were sown in the spring and summer of 2010 and 2011 in Beijing,China.The number of days from the soybean plant development stage[41] emergence (VE) to the first flower (R1) was recorded as the trait of vegetative period [34].
A total of 13.98 billion raw reads(90 bp in length) were generated using the Illumina Hiseq2000 platform,resulting in an average of~8X depth of clean reads for each of the 134 widely planted cultivars.We also sequenced four wild soybean accessions(Glycine sojaSiebold and Zucc.)that were used as the out-group for subsequent phylogenetic analyses.For each of the re-sequenced accessions,the clean reads were mapped to the soybean reference genome [9],resulting in an average mapping coverage of 93.81%(Table S1).After quality filtering,variants including 4,163,977 high-quality single-nucleotide polymorphisms (SNPs) and 478,985 small insertions/deletions(InDels)were identified(Tables 1,S2;Fig.S1).
We examined the phylogenetic relationships within the 134 widely planted cultivars combined with the 302 previously resequenced soybean accessions [12] based on the SNPs identified in the two datasets.The three major sub-populations of wild soybeans,landraces,and improved cultivars identified in this combined analysis were consistent with sub-groups reported in the 302 re-sequenced soybean accessions (Fig.S2).The 134 cultivars sequenced in our study were distributed in the sub-groups of landrace and improved cultivar(Fig.S2),indicating the broad genomic diversity of widely planted cultivars.
PCA based on 6666 four-fold degenerate SNPs showed that no obvious clusters emerged from combinations of any two of the top three PCs(Fig.1).Individually,the plots based on the PC1 value(upper left in the diagonal)resulted in major peaks that were consistent with the tree branches in the phylogenetic analysis(Fig.S2).The plots based on PC2 (middle in the diagonal) overlapped with the peaks between the widely planted cultivars and the 302 previously re-sequenced soybean accessions,while the plots based on PC3(lower right in the diagonal)differentiated the widely planted cultivars from the 302 re-sequenced accessions,providing evidence of genomic uniqueness in the collection of widely planted cultivars.
The 134 widely planted soybean cultivars were grown in a latitudinal range from 50°N in Heilongjiang to about 21°N in Guangxi.A phylogenetic analysis based solely on this collection showed two major geographically structured clusters that correspond to the Northeast China (NE) sub-population and the Huang-Huai-Hai River Valley and South China (HS) sub-population (Fig.2A).This result is consistent with the phylogenetic tree and the PCA clusters based on the combined dataset of the 134 widely planted cultivars and the 302 re-sequenced soybean accessions.We checked the origins of 539 ancestors of the 134 widely planted cultivars,and the majority of these ancestors originated from the same geographical regions as their respective descendant cultivars (179 out of 191 ancestors were recorded as ‘‘1/2” contributors,Table S3).Therefore,geographically structured sub-population clustering was the major characteristic of the widely planted soybean cultivar collection.
The NE sub-population comprises 75 cultivars planted in Northeast China except for CA086 (‘Qihuang 10’),a cultivar from Shandong province in the Huang-Huai-Hai River Valley.The HS sub-population comprises 53 cultivars from the Huang-Huai-Hai River Valley and South China and six cultivars from HS-adjacent NE areas(CA054,CA064,CA071,CA074,and CA077 from Liaoning,and CA079 from Beijing).Analysis of the admixture of the cultivars in the two sub-populations using STRUCTURE[29]also showed the geographically based clustering of the cultivars(Fig.2B).Increasing the K value resulted in the same major sub-groups (Fig.S3).Not surprisingly,admixture was observed for cultivars that originated from the HS-adjacent NE areas.PCA based on 5889 SNPs identified from the widely planted cultivars showed that the NE subpopulation is tightly clustered together,while the HS subpopulation is scattered along both the horizontal axis and the vertical axis (Fig.2D).
Nucleotide diversity calculated in a 10 kb window along each chromosome is substantially higher in the HS sub-population than in the NE sub-population (Table 1).Linkage disequilibrium decayed more rapidly in the HS sub-population than in the NE sub-population (Fig.S4).Collectively,the HS sub-population has a broader genetic background than does the NE sub-population.
To further explore structural variations,we identified 3705 CNVs with an average length of 6.49 kb in cultivars that share the same geographical origins (Table S4).The CNVs were distributed unevenly among the chromosomes and in different chromosomal regions(Figs.S5,S6).Chromosome 18 had the most CNVs(621),while Chromosome 11 had the fewest(36).CNV gains were mostly observed in Chromosomes 14,15,and 18 in cultivars from the HS sub-population,and in Chromosome 6 in cultivars from the NE sub-population.Gene ontology (GO) analyses of genes affected by CNVs showed enrichment of phosphorylation-related and binding-related functions (Table S5).
Fig.1.Principle component analyses of the 134 widely planted soybean cultivars and the 302 soybean accessions re-sequencing project[12].PC1/2/3,the first/second/third principle components.
To capture the chromosomal regions that have undergone selection between the NE and HS sub-populations,we calculated the cross-population composite likelihood ratios (XP-CLR) [33]along the soybean genome.A total of 167 genomic regions (those with the top 10% highest XP-CLR scores) were identified as selective regions that differed between the HS and NE subpopulations (HS vs.NE),and 186 selective regions were identified in the reciprocal comparison(NE vs.HS;Figs.3,S7).These selective regions comprise 2.51% and 2.64% of the total genome length and harbor 1600 and 1251 genes,respectively(Fig.S8).The mean genomic sequence length of the selective regions was 142.99 kb and 134.73 kb,with the largest regions of 900 kb and 580 kb in the HS vs.NE and the NE vs.HS comparisons,respectively.A total of 21 selective regions identified from the HS vs.NE comparison overlapped with 23 selective regions identified from the NE vs.HS comparison(Tables S6,S7).These overlapping selective regions are the regions of the genome that were commonly impacted during breeding,while the unique selective regions from each of the comparisons represent selections that occurred specifically in a single sub-population.A total of 309 non-redundant genomic regions were identified as selective regions that were impacted by geographical origins.These non-redundant genomic regions are 42,790 kb in length (4.5% of the genome sequence),and harbor 2434 annotated genes.
Table 1 Statistics of the genomic variations identified in the 134 widely planted soybean cultivars and the four wild soybean accessions.
The selective regions identified by the XP-CLR analyses had markedly higherFstvalues compared to the whole genome(Fig.S9).When we examined the genetic variations between the two sub-populations by scanning for nucleotide fixation,we found that the selective regions had a higher proportion of fixed SNPs than did the whole genome (Fig.S10).In other words,although the selective regions contain a relatively larger number of polymorphisms,the diversity of these polymorphisms in such regions is actually reduced.
Fig.2.Analyses of population structures and geographical distributions of the widely planted soybean cultivars.(A) Maximum-likelihood phylogenetic tree.(B) Bayesian clustering based on the genomic composition of population structure analyzed by STRUCTURE.(C)Geographical distribution of the widely planted cultivars.Total production during 1995–2011 is indicated by scaled colors according to different Chinese provinces.The blue circles indicate the number of cultivars from the corresponding province.(D) Principal component analysis of widely planted cultivars.PC1/2,the first/second principle component.
Soybean plant development stage from VE to R1,described as the trait of ‘‘vegetative period” by Soybase [34],was recorded under short-day(SD)and long-day(LD)conditions in two different seasons for our widely planted soybean collection.All four records showed differences between the groups of cultivars based on the two sub-populations (Tables S8,S9).K-means clustering analysis with k equal to 2 showed that the vegetative period recorded under spring-planting SD conditions (SP-SD) and summerplanting LD conditions (SU-LD) successfully separated the widely planted cultivars into two groups.This grouping was consistent with the origins of the cultivars in the NE and HS subpopulations,with Spearman’s rho correlation coefficients of 0.83 and 0.89,respectively (Fig.4A,B).While,the K-means clustering of vegetative period recorded in spring-planting LD conditions(SP-LD)and summer-planting SD conditions(SU-SD)was not consistent with geographical origins,with Spearman’s rho correlation coefficients of 0.68 and 0.51,respectively(Fig.4C,D).Even though larger genotype-environment interactions and environmental effects were reported for soybean vegetative period [8],the kmeans clustering results provide evidence for the genotypic contribution to this trait.
For the purpose of uncovering related genes,we examined all four vegetative period QTL[42]documented at Soybase[34].These four vegetative period QTL are located in 28 identified selective regions.A total of 121 annotated genes were found in these genomic regions (Tables S6,S7),including genes annotated as having binding and cellular process functions(Fig.S11).We also examined other soybean plant development traits,and eight genes related to flowering time [43–46] are located in selective regions (Tables S6,S7).
Of the 134 widely planted cultivars,94 cultivars have at least one ancestor that is also a widely planted cultivar (Table S1).CA056 (‘Huangbaozhu’),CA025 (‘Mancangjin’) and CA135 (‘Zihua 4’) are the top three most frequently used ancestral cultivars in Northeast China,while CA085 (‘Juxuan 23’),CA084 (‘Xinhuangdou’)and CA083(‘Yidupingdinghuang’)are the top three most frequently used in the Huang-Huai-Hai River Valley and South China(Fig.S12).The pedigrees of 25 cultivars could be traced back at least 10 generations (Table S1),and four cultivars (CA020,CA048,CA045,and CA076) have 10 or more ancestors that were also resequenced in this study.IBDseq [38] was performed to identify the identity-by-descent (IBD) segments in these four cultivars based on their pedigrees.In brief,110,134,115,and 112 IBD segments were identified that cover 26.72%,45.78%,26.10%,and 36.05% of the whole genome,respectively (Table S10).A majority of the detected IBD segments are shared by cultivars released in adjacent generations,but only 18 IBD segments were identified from the earliest recorded ancestors to the current widely planted cultivars (Figs.S13,S14).
Fig.4.K-means clustering of the soybean trait of vegetative period.(A) Spring-planted short-day (SP-SD) conditions.(B) Summer-planted long-day (SU-LD) conditions.(C)Spring-planted long-day(SP-LD)conditions.(D)Summer-planted short-day(SU-LD)conditions.Clusters differentiated by K-means are indicated by red and black.The NE and HS sub-populations are indicated by triangles and circles,respectively.
Fig.5.The soybean breeding strategy ‘‘Potalaization”.Accumulated genomic diversity of widely planted soybean cultivars.(B) ‘‘Potalaization” strategy for selecting,exploiting,and utilizing elite soybean germplasm accessions.
We sought to identify the most diverse representative cultivars from the 134 widely planted cultivars using an iterative calculation of accumulated genetic diversity based on SNP data.Specifically,the two cultivars with the highest percentage of segregated SNPs among the widely planted cultivar collection were selected as the initial pair in the platform panel,then each of the remaining 132 cultivars was assessed and the cultivars contributing the largest proportion of segregated SNPs would be selectively added to the platform panel.This iterative selection continued until the platform panel captured 95%of total segregated SNPs in the whole collection.Eventually,a total of 14 of the widely planted cultivars were defined as the platform panel (Fig.5A).Eight of the platform cultivars are from the HS sub-population and six are from the NE sub-population.In addition to their geographical diversity,these platform cultivars were released in different years (Table S11).Our platform panel complements the current Chinese soybean mini-core collection consisting of 299 soybean accessions that were selected based on morphological traits and traditional molecular markers [47].
To document this protocol,we conceived a breeding strategy that we call‘‘Potalaization”for selecting,managing,and exploiting elite crop germplasm in the post-genomic era (Fig.5B).The whole germplasm collection and the released cultivars are analogous to the high altitude of the Tibetan Plateau.Metaphorically,we recapitulated the impressive breadth of the‘‘Red Hill”in a demonstration case study by iteratively identifying the most genetically diverse cultivars from amongst the elite cultivars.The cultivars identified thus will be ideal as starting materials for gene pyramiding while retaining the highest genetic diversity for future breeding efforts.
Although there have been large-scale genomic re-sequencing studies reported for soybean,none has focused primarily on identifying the genetic diversity present within the genomes of the most popular cultivars.Here we collected 134 of the most widely planted soybeans from the last hundred years of Chinese soybean breeding and conducted a re-sequencing study.Cultivars in this collection have been widely adopted in soybean production regions and are extensively used in current soybean breeding programs [20],indicating that they have excellent agronomic performance that is appreciated by producers while retaining unique genomic information as demonstrated by our phylogenetic analyses.These widely planted cultivars were divided into two geographically distinct sub-populations,NE and HS.We identified selective regions in each sub-population,and found that the cultivars clustered into two consistent groups based on the vegetative period when grown under SP-SD and SU-LD conditions.
The re-sequencing of other soybean accessions reported to-date,including wild soybean relatives,landraces,and modern cultivars,has helped deepen our understanding of the process of soybean domestication and its subsequent improvements due to continued soybean breeding efforts [10–12,17].Our analyses of the widely planted cultivars confirmed that geographical origin is a major factor in differentiating elite Chinese soybean cultivars.Specifically,we found that the genetic diversity (calculated Pi value)of the HS sub-population was comparable to that previously reported for the landraces (1.31 × 10-3versus 1.40 × 10-3),whereas the Pi value for the NE sub-population was comparable to that of the improved cultivars (1.04 × 10-3versus 1.05 × 10-3).The overall diversity was lower than that in the 302 re-sequenced soybean accessions[12].Linkage disequilibrium decayed more quickly among the 302 soybean accessions than among the widely planted soybean cultivars (Fig.S4).
We have devised the new breeding strategy of ‘‘Potalaization”,which takes advantage of traditional breeding,because the widely planted cultivars we re-sequenced represent the highest levels of achievement in terms of overall agronomic performance,especially in terms of yield and plant morphology[6,48],of Chinese soybean breeding over the last 100 years.Using this approach,we defined a platform panel of cultivars that could be used as parental materials or receptors for advanced cultivar breeding so that their overall superiority will remain and only the desired traits will be targeted.A recent study in wheat[49]describes a strategy evaluating breeding value based on the accumulation of beneficial haplotypes in centromere-spanning linkage disequilibrium blocks.In the ‘‘Potalaization” strategy,we used genomic diversity based on the proportion of segregated SNPs in the entire genome.Besides the statistical index differences that are due to different genome features of these two crops,the wheat strategy can be applied to individual cultivars,while the‘‘Potalaization”strategy is optimized for a collection of elite cultivars.In the case of soybean,the National Crop Genebank of China contains the largest number of soybean accessions in the world that represent extensive genetic diversity[4].The‘‘Potalaization”strategy we propose here would be helpful in germplasm manipulation,which will increase the efficiency of advanced soybean breeding.
CRediT authorship contribution statement
Tianfu Han and Hon-Ming Lam:designed and managed the entire research project.Tingting Wu,Shi Sun,Caijie Wang,Cunxiang Wu,and Wensheng Hou:collected and prepared soybean cultivars.Tingting Wu,Caijie Wang,and Wenwen Song:recorded the phenotypic data.Tianfu Han,Hon-Ming Lam,and Qijian Song:supervised the data analyses.Bingjun Jiang and Xinpeng Qi performed the bioinformatics analyses.All authors contributed to manuscript writing.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors would like to thank Dr.Lijuan Qiu and Zhangxiong Liu (Institute of Crop Sciences,Chinese Agricultural Academy of Sciences,China) for providing some of the soybean germplasm used in this study.The authors would also like to thank Jee-Yan Chu for copy-editing the manuscript.
This work was supported by the National Key Research and Development Program of China(2017YFD0101400),China Agriculture Research System (CARS-04),and the Agricultural Science and Technology Innovation Program of CAAS.This work was also supported by a grant from the Hong Kong Research Grants Council Area of Excellence Scheme (AoE/M-403/16) awarded to Hon-Ming Lam.
Accession numbers
The datasets supporting the conclusions given in this article are available in the National Center for Biotechnology Information(NCBI) repository.The short re-sequencing reads were deposited in the Short Read Archive (SRA) under accession number SRP062560.
Appendix A.Supplementary data
Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2021.01.001.