Filtering for SNPs with high selective constraint augments mid-parent heterosis predictions in wheat(Triticum aestivum L.)

2023-01-30 04:47:54AbhishekGognaJieZhangYongJiangAlbertSchulthessYushengZhaoJochenReif

The Crop Journal 2023年1期

Abhishek Gogna,Jie Zhang,Yong Jiang,Albert W.Schulthess,Yusheng Zhao,Jochen C.Reif

Leibniz Institute of Plant Genetics and Crop Plant Research(IPK),06466 Stadt Seeland,Germany

Keywords:Hybrid wheat Genomic evolutionary rate profiling Deleterious SNP Heterotic Quantitative trait loci

ABSTRACT To extend the contemporary understanding into the grain yield heterosis of wheat,the current study investigated the contribution of deleterious alleles in shaping mid-parent heterosis(MPH).These alleles occur at low frequency in the genome and are often missed by automated genotyping platforms like SNP arrays.The deleterious alleles herein were detected using a quantitative measurement of evolutionary conservation based on the phylogeny of wheat and investigations were made to:(1)assess the benefit of including deleterious alleles into MPH prediction models and(2)understand the genetic underpinnings of deleterious SNPs for grain yield MPH using contrasting crosses viz.elite×elite(Exp.1)and elite×plant genetic resources(PGR;Exp.2).In our study,we found a lower allele frequency of moderately deleterious alleles in elites compared to PGRs.This highlights the role of purifying selection for the development of elite wheat cultivars.It was shown that deleterious alleles are informative for MPH prediction models:modelling their additive-by-additive effects in Exp.1 and dominance as well as associated digenic epistatic effects in Exp.2 significantly boosts prediction accuracies of MPH.Furthermore,heterotic-quantitative trait loci’s underlying MPH was investigated and their properties were contrasted in the two crosses.Conclusively,it was proposed that incomplete dominance of deleterious alleles contributes to grain yield heterosis in elite crosses(Exp.1).

1.Introduction

Most common genotyping platforms including SNP arrays accommodate and exploit common genotypic variance and are especially suitable for inexpensive,automated genotyping of large population panels on the one hand.But on the other hand,they are notoriously limited in their capacity to capture deleterious rare variants(or alleles)[1–3].Missing informative SNP variants,especially rare variants,limit studies on quantitative genetic parameters of the trait under consideration:Several studies have pointed to a disproportionate and hence biologically relevant contribution of rare genomic variants in shaping the phenotype[4–6],necessitating re-evaluation of ways to derive genotypic data for dissecting genetic underpinnings of complex quantitative traits.While rare variants putatively contributing to complex traits can be accurately captured by modern sequencing technologies like whole genome sequencing,not all variants discovered by such advanced sequencing technologies are biologically relevant.Therefore,detection of informative biological priors is crucial to:(1)screen and filter the wealth of marker data generated,and(2)draw meaningful estimates of quantitative genetic parameters especially for complex traits like grain yield heterosis.

Deleterious alleles are SNP variants with negative impact on phenotype,arising due to mutations inherent during evolution of a crop species,and which depending on their impact on fitness of a population may be strong,moderate,or slightly deleterious[7,8].The prevalence of a deleterious allele in a population depends heavily on its putative impact and the corresponding selective constraint at the respective SNP/region of the genome.Explicitly,whereas deleterious alleles at regions with high selective constraint viz.strongly deleterious alleles are highly likely to be purged with time,those in regions of low selective constraint viz.slightly and moderately deleterious may persist in the crop genome at low population frequencies and contribute to phenotypic variation[9].Exploitation of selective constraint has therefore been proposed for identifying SNP variants with deleterious alleles using comparative genomics approaches like SIFT[10]and PolyPhen-2[11],GERP++[12]and BAD_mutations[13].

Heterosis has been successfully exploited in hybrid breeding of allogamous crops and its potential in autogamous crops like wheat(Triticum aestivum L.)has been subject of substantial investments by breeding companies over the past decade[14].The utility of hybrid wheat lies in its potential to surmount the productivity plateau,wherein the yield has been stagnant for～37%of wheat growing areas cultivating elite lines globally[15].The higher grain yield stability of wheat hybrids coupled with heterotic advantage of～10%[16],not only implies economic benefits,but also longterm sustainability of associated agro-ecosystems given the forthcoming climate projections[17].Further,merits like improved grain protein content,enhanced fertilizer response,better root penetrance,increased rate of grain filling,higher stress tolerance,and reduced disease susceptibility[18],endorse hybrid wheat cultivation and highlight the importance of heterosis therein.Since phenotypic superiority of hybrids for important traits like yield and stress tolerance[19]implies better adaptation capacity and agronomic performance,knowledge-based use of heterosis is crucial for improving the efficiency of hybrid breeding in wheat.

At the genetic level heterosis may arise due to genetic complementation of deleterious alleles(dominance hypothesis)/or allelic interactions(epistatic hypothesis)at multiple loci.Whereas,it is debatable which hypothesis best explains heterosis,in selfing species at least epistatic interactions necessarily contribute to the phenomenon[20,21]and accordingly favorable genetic markup resulting from beneficial alleles,or combination thereof is desirable for maximal exploitation of heterosis.Previous investigations into grain yield heterosis of wheat hybrids have been successful in elucidating the heterotic genetic architecture[20,22]on the one hand,but were limited in the genetic resolution of resulting genomic variants owing to the use of SNP arrays.The limitation of the genotyping platform implies that putatively informative rare variants,such as deleterious alleles,were missed and their value in deriving estimates of heterotic parameters remains uninvestigated.

Naturally,investigations into grain yield heterosis are likely to benefit from novel information sequestered utilizing biological priors like deleterious alleles,since the underlying trait viz.grain yield is heavily selected for in elite breeding and is therefore influenced by slightly or moderately deleterious alleles.

Evidently,incomplete dominance of deleterious alleles,captured a priori using whole genome sequencing data,has been shown contributing to heterosis in maize[23]thus substantiating previous studies in the crop proposing the utility of such exercise on predictions of inbred lines breeding values[24].In selfing species like barley,compelling studies have demonstrated a desirable shift in mean phenotypic values resulting from sequential cycles of crossing-selfing-selection and gradual decrease in population frequency of deleterious alleles therein[25].Moreover,studies contrasting genetic architecture of grain yield heterosis in elite versus wide wheat crosses have reported that the negative dominance and dominance-by-dominance epistatic effects responsible for reduced mid-parent heterosis in wide crosses,are purged in hybrids resulting from elite crosses[22].The last study immediately links selective constraint to existence of rare deleterious variants in elite wheat lines but little research has been done so far to characterize these variants and assess their impact on grain yield heterosis in wheat.

Here we sought to better understand the role of deleterious alleles in grain yield variation and heterosis in wheat using contrasting wheat crosses,namely elite and wide crosses.Assuming that previously detected deleterious variants might have an impact on grain yield heterosis,their novel information content was evaluated and studies were conducted to:(1)tag deleterious alleles in the wheat genome and investigate their distribution patterns in genetically contrasting wheat populations,(2)investigate the advantage of deleterious SNPs(dSNPs)for predicting grain yield heterosis,and(3)study the effect of including dSNPs in elucidating the genetic architecture of MPH in wheat.

2.Materials and methods

2.1.Plant material and phenotypic data

This study is based on published phenotypic data[14]of two experimental series,Exp.1 and Exp.2.The parental lines used in Exp.1 consisted a subset of 135 diverse elite winter wheat lines,which were appropriately clustered into 15 male and 120 female groups based on pollen capability,plant height,as well as flowering time and their 1604(out of possible 1800)hybrids(Data S1).The selection of these lines was meticulously done from a larger set of 68 male and 275 female lines provided by four breeding companies,to represent the spectrum of genetic diversity in wheat breeding material that exists in Europe[9].The Exp.2 consisted of a subset from 667 hybrids generated from crosses following an incomplete factorial mating design between 45 elite female winter wheat lines and 361 diverse male accessions(Plant genetic resources or PGRs)of the German Federal ex situ Genebank for Agricultural and Horticultural Crops(Data S2).Further details about Exp.1 and Exp.2 may be accessed at description of experimental series one and five in the materials and methods section of[14].For the generation of hybrids in both experimental series,the female lines were treated with chemical hybridizing agent and their sterility was verified by bagging one to three female parents.Lastly,in field trials of both series,grain yield was adjusted to 14% moisture and expressed in Mg ha-1.

2.2.Phenotypic data analysis

The Best Linear Unbiased Estimations(BLUEs)of grain yield used to calculate mid-parent heterosis(MPH)were derived from the outlier screened data using the method‘‘Bonferroni–Holm with re-scaled median absolute deviation standardized residuals”[26]by fitting a linear mixed model appropriately including the effects of genotypes,trials,replications nested within trials,and blocks nested within trials and replications,as detailed elsewhere[14,22].Subsequently,the MPH and BPH for the hybrids were calculated as.

where,MPHi,MPVi,BPHiand BPiare the mid-parent heterosis,mid-parent value,better parent heterosis and better parent value for the ith single-cross hybrid with a hybrid performance of F1i.

2.3.Genotypic data

The genotypic data for parents of Exp.1 and Exp.2 was produced using different but established protocols,viz.whole genome sequencing(WGS)for Exp.1 and genotyping by sequencing(GBS)for Exp.2(see library preparation and sequencing section for GBS and WGS data at[27]).Statistics of GBS read mapping may be obtained by cross referencing individuals of the present study(original names available on request)with‘‘Material”column in supplementary table three of[27],whilst the read mapping statistics for WGS are provided in Data S3.It must be noted that not all parental genotypes were successfully genotyped,resulting in 116 females/15 males for Exp.1 and 25 females/177 males for Exp.2 being used for the respective genetic analysis.

The two datasets,viz.GBS and WGS,were found containing 722,212 common markers based on overlapping physical positions according to the reference genome.However,both the GBS and WGS files filtered for the common subset were processed separately with vcftools(v0.1.13)[28]for(1)-Max-missing parameter set to 0.5 and(2)-MAC(minor allele count)parameter set to 1.The missing markers in the two datasets were imputed using Beagle v4.1[29]with parameters-ibd=‘‘True”,window=10,000 and overlap=1000.The resulting data were pruned for markers in high linkage disequilibrium using plink v1.9[30]to get manageable number of highly informative markers for downstream analysis.The genotype data was converted to additive coding{0,1,2}based on minor allele frequency with plink v1.9.The genotype data for hybrids(1557 for Exp.1 and 306 for Exp.2)was derived from respective parents in R(version 4.0.2)[31].

2.4.Detection and functional annotation of GERP SNPs

The potential SNP sites with evolutionary selective constraint(henceforth GERP-SNPs)in the wheat genome were identified using GERP++[12]based on multiple sequence alignment(MSA)of 9 species(7 monocots:Triticum aestivum,Aegilops tauschii,Triticum dicoccoides,Triticum turgidum,Brachypodium distachyon,Hordeum vulgare,and Oryza sativa,and 2 dicots:Arabidopsis thaliana

and Vitis vinifera).The pairwise alignments of wheat with other species were obtained from https://plants.ensembl.org(wheat genome-v48)and MSA was produced using MULTIZ[32].The GERP-SNPs were classified as neutral,slightly and moderately deleterious SNP(dSNP)according to their rejected substitution(RS)score of≤0,≤2 and≤4 respectively.The RS score,calculated for each column in the MSA,is a quantitative measure of‘‘rejected substitutions”and is defined as the difference between neutral(expected)(Ni)and observed evolutionary rates(ki.Oi).

where i denotes the ith column in the multiple sequence alignment(MSA).

Further,for each SNP position the ancestral and derived alleles were defined by comparing the reference and alternative alleles in Triticum aestivum to(1)Triticum dicoccoides and Triticum turgidum for A and B subgenome and(2)Aegilops tauschii for D genome[33].The functional annotation of GERP-SNPs was done using SnpEff[34].

2.5.Genetic diversity studies

A common subset from the imputed marker data of the two experimental series was used to assess the population structure among the parental genotypes using principal coordinate analysis(PCoA),and cluster analysis based on pairwise Rogers’distances[35].A neighbor joining tree was constructed using the R package‘‘a(chǎn)pe”[36].Signals of selection were assessed by estimating degree of variability within each of the broad parental groups viz.elites and PGRs[Theta pi[37]]and extent of genetic differentiation between pairs of parental subgroups belonging to Exp.1 and Exp.2[Fst[38]]using VCFTools[28].Originally,pi is an estimate of pairwise nucleotide diversity at SNP variants in a population and has been proposed as an estimator of Theta-pi[39].The pi statistic was derived by defining the parameters window-pi=299,999(～3 Mb)and windows-pi-step=29,999 in vcftools using diploid coding(0|0 for homozygote reference,0|1 or 1|0 for heterozygote and 1|1 for homozygous alternate genotype calls).Theta-pi ratios were thereafter defined as the ratio of pi values for overlapping genome intervals in respective groups.Fst statistic was calculated segment wise with marker window size of 299,999(～0.3 Mb)and window step size of 29,999 respectively.Thereafter,a weighted estimated value for the pair of parental groups under consideration was derived and used to populate a distance matrix for the parental groups.The patterns in pairwise Fst values were visualized using ggtree[40].Lastly,linkage disequilibrium was analyzed with PopLDdecay[41]using squared correlations(r2)as a measure of pairwise linkage disequilibrium between markers[42].

2.6.Genomic prediction with deleterious SNPs

An extended genomic best linear unbiased prediction model(EGBLUP)[13,39]was used for genomic predictions.The model was derived as an extension of the G-BLUP model[44]and it builds upon the latter by including digenic epistatic or interaction effects.The model is described as follows.

In order to assess gain in prediction accuracies with inclusion of dSNPs the two sets of markers,viz.(1)random set and(2)random set+dSNPs with RS score＞2 each containing equal numbers of markers,were modelled differently for each population.For set 1 the model described in eq.(3)was used as,

whereas for set 2 an extension of the model described in eq.(3)was used as,

Wherein separate design/kinship matrices were derived using random and dSNPs for effects with marker set as indicated by respective subscripts[(r)implying random and(dSNP)implying dSNPs].Genomic prediction for MPH was done for the hybrids in the two experimental series,following a fivefold cross-validation scheme for 100 rounds.Each model was trained,fitting effects in various combinations viz.d,aa,d_aa,d_aa_ad and d_aa_ad_dd using the training set which for each round comprised of 80%individuals randomly sampled from the set of hybrids.Genomic prediction ability between actual and predicted values of MPH for genotypes in test set was recorded as the correlation between the two.Prediction accuracy was defined as the prediction ability divided by the square root of the genomic heritability.The genomic heritability was calculated for each experimental series by fitting all the marker data into eq.(3)as detailed further in the next section.The model was implemented with BGLR package[45]and genotypes were split into test and training set using cvTools[46].

2.7.Deriving genetic variance components for mid-parent heterosis

The genetic variance components associated with each of the effects contributing to heterosis were estimated by fitting the model described in eq.(3)and genomic heritability(H2)was calculated as,

A distinction has to be made here onwards,wherein the default(F∞)marker coding was used to derive kinship matrices and the calculation of variance components for estimating genomic heritability,the marker coding was changed[47]when deriving variance components to illuminate genetic architecture of MPH.The alteration of marker coding was needed due to the nonorthogonal parametrization following the use of F∞coding[22].Further,the correlation was also calculated between kinship matrices derived using modified marker coding.The variance components for the effects in eq.(3)for both deriving genomic heritability and inferring genetic architecture of grain yield MPH were estimated using reproducing kernel hilbert space(RKHS)regression embedded in the R package BGLR[45].

2.8.Detection of dominance and epistasis associated effects

An association study was performed following linear mixed model:

2.9.Genome-wide association mapping of heterotic QTL

Following the notations proposed in[20]the F∞metric[49]was used to denote genotypes,viz.0,1 and 2 for homozygous with zero copies of reference allele,heterozygous and homozygous with both copies of the reference allele.Consecutively,the heterotic effect of individual loci could be decomposed into dominance and digenic epistatic components as:dominance effect at i-th QTL,additive-by-additive,additive-bydominance and dominance-by-dominance epistatic effects between the pair of QTL,respectively.The significant markers from the previous step were incorporated into heterotic QTL(H-QTL)according to eq.(8).

Since,for a given marker locus the MPH values vary with hybrid(s),the correlation was derived between heterotic effects of a given marker for all hybrids and the MPH values(from the phenotypic data).A permutation test to assess the significance of the correlation coefficient was also performed using cor.test[31].

2.10.Calculation of degree of dominance for grain yield

The degree of dominance(dod)for grain yield in hybrids in either of the population was calculated by fitting a linear model with additive and dominance effects as follows:

Where Yis the BLUEs of hybrid yield across environments,X and Z are marker matrices coded Xij∈{-1,0,1}and Zij∈{0,1,0};α is the additive and d is the dominance effect for ith marker,respectively.The degree of dominance was then defined for ith marker as the ratio of dominance and additive effects for that marker obtained from the model.Since for a marker with small additive effect,this method can produce massive values for degree of dominance,the values|dod|＞2 were truncated for further analysis.

3.Results

3.1.Selective constraint on other wise conserved sites can be used to identify deleterious alleles and quantify their deleteriousness

Evolutionarily,sites with high selection pressure are conserved and harmful mutations arising at these sites are often maintained at low population frequency or purged[50].Accordingly,stronger the selection pressure at a site,the more stressful or deleterious the arising mutations would be,and therefore,alternative or derived alleles at the site are putatively deleterious.Amongst the 722,122 common SNPs,295,122 were informative in the multiple sequence alignment(Fig.1A),i.e.,they had≥3 non-wheat genome alignments and were termed GERP-SNPs.Amongst the latter,111,642 SNPs were assigned rejected substitution(RS)score＞0 and represent the candidate pool of deleterious SNPs(dSNPs).RS score is a quantitative measure of deleteriousness and can be used as a proxy to account for reduced substitutions at sites with selective constraint[12].A positive score of 3 therefore means 3 less substitutions at the site compared to a site with neutral rate of substitutions.Deleterious SNPs were further grouped into slightly dSNPs(RS score∈(0,2])and moderately dSNPs(SNPs with RS score∈(2,4]).

Here,taking the total number of QTL for the original trait was denoted as Q and coding each QTL in Q as 0,1 or 2 indicating the number of chosen allele(s)at a given linked SNP loci,Rst(s,t=0or2)denotes the subset of loci where female parent has genotype s and male parent has genotype t.For i,j∈Q and i≠j,the symbols di,aa,ad,dd with their respective subscripts denote the

3.2.Moderately deleterious SNPs are enriched with variants having negative impact on protein metabolism

Fig.1.Overview of GERP-SNPs.(A)Euler plot showing different subgroups based on rejected substitution(RS)score of deleterious SNPs.The interval RS(0,2]has been split into two to show the decreasing trend in number of SNPs with high positive rejected substitution score viz.3067(red)and 2543(blue).(B)Functional annotation and clustering of slightly(RS(0,2])and moderately deleterious(RS(2,4])SNPs into respective impact groups.

Slightly dSNPs were mostly in the‘‘MODIFIER”impact group implying these were non-coding variants or variants affecting non-coding genes with little to no evidence of impact whatsoever(Tables S1,S2).The moderately dSNPs however had substantial numbers(～37%)in‘‘MODERATE or HIGH”impact group(Fig.1B)implying that these variants either probably influence protein effectiveness or have a disruptive impact leading to‘‘protein truncation,loss of function or triggering nonsense mediated decay”[51].Derived alleles at high or moderate impact SNPs are deleterious given the important role of respective SNPs in protein metabolism.That moderately dSNPs were enriched in high or moderate impact SNPs,suggests that SNPs in conserved regions involved in protein metabolism are captured at high positive values of RS score.

3.3.The genetic peculiarities of the two populations reveal intrapopulation genetic differentiation and directional inter-group selection

The marker dataset used to assess population structure and genetic diversity consisted of GERP-SNPs in the two populations.The SNPs of Exp.1 and 2 were independently imputed and merged thereafter to account for distinct parental populations i.e.plant genetic resources(PGRs)and elites.The filtering resulted in 221,888 SNPs for Exp.1 and 54,567 SNPs for Exp.2.The low number of SNPs recovered in Exp.2 was due to the high rate of missing values in genotyping-by-sequencing datasets[1].

At the population level the parental genotypes of Exp.1 and Exp.2 showed differences in linkage disequilibrium decay patterns with almost four times faster decay in Exp.2(r2＜0.1 at 2.33 Mb)compared to Exp.1(r2＜0.1 at 9.80 Mb)(Fig.2A).The faster decay in Exp.2 suggests smaller haplotype blocks in the population and points towards its ancestral origin compared to Exp.1.Given the abundance of PGRs in Exp.2,clustering and potential signals for selection amongst the parental genotypes were investigated.

The assessment of genetic differentiation between the four parental groups via.pairwise estimate of Fst revealed profound genetic differentiation(Fig.2B).Males of Exp.2,representing PGRs,clustered significantly differently(Pairwise Fst＞0.12)from other groups which comprise elite lines and thus cluster together(Pairwise Fst＜0.05).Similar patterns of clustering between parental genotypes were also evident in the principal coordinate analysis(PCoA)plot derived using common markers for the parental genotypes of both populations(Fig.2C).The first two principle coordinates segregated the genotypes into the expected groups.

Consecutively,theta-pi ratios derived for similar segments of the genome in the elite and PGR groups to quantify and appraise genetic diversity within each of the groups revealed markedly high genetic diversity in the PGR group compared to elite group as substantiated by a mean theta-pi ratio of 15.11 for the top 1%genomic segments with at least 5 variant[52](Table S3).The diminishing genetic diversity from the PGR to elite group indicates selection resulting from breeding activities across Europe.

Given the directional selection,it may be expected that deleterious alleles at sites with high selective constraint were slowly purged in elite lines represented herein by elite group.This would have then produced differential patterns in distribution of deleterious alleles in the two groups viz.elites and PGRs.Nonetheless,such patterns are trait specific and the potential effects of altered proportion of deleterious alleles in the elites especially for important biotic and abiotic stress related traits need further investigations.

3.4.Patterns of intergroup selection are captured by deleterious SNPs

The derived allele frequency of GERP-SNPs was negatively correlated with the RS score,determined in bins of 0.01,of the respective SNP position(Fig.3A)in both groups,i.e.,PGRs and elites,with the higher frequencies in genotypes of the PGR group compared to genotypes of the elite group for moderately deleterious dSNPs.Interestingly,the inversion of trends was noted at an RS score of 2,underscoring the validity of the moderately dSNPs detected by GERP++.Little to no variation in derived allele frequency for GERP-SNPs with RS＜0 entails their neutral nature with respect to selection pressure.The trend implies that the higher the RS score of a deleterious SNP,the less frequent the derived allele at the deleterious SNP is in the elites compared with PGRs.

The differences in the distributions of dSNPs in the two groups were further elaborated by assessing the distribution of deleterious alleles in frequency bins of 0.05(Fig.3B).Clearly,the patterns of differences across bins in the two populations were higher at both the head and tail of the distribution,apparently indicating that the differences in allele frequency distributions that arise as a result of selection are captured by dSNPs.Interestingly,the decrease in allele frequency from the center of the distribution for PGRs towards elites could result from selection or linkage drag,whereas the slight hike observed at the tail of the distribution for elites is likely due to linkage drag alone.

Fig.2.The parental genotypes of Exp.1 and Exp.2 were analyzed for linkage disequilibrium and genetic diversity.(A)Decay of linkage disequilibrium(r2)with physical distance.(B)Clustering based on pairwise estimates of Fst.The red boxes cluster the two groups viz.elites and PGR’s.(C)Population structure based on principal coordinate analysis(PCoA)using classical multidimensional scaling based on pairwise estimates of Rogers’distance(s).The same color scheme was used in all figures.

Fig.3.Patterns of distributions for deleterious alleles in two broad groups viz.elites and plant genetic resources(PGRs).(A)Variation of derived allele frequency of GERPSNPs with(in bins of 0.01)rejected substitution(RS)score.RS score＞0 represents deleterious allele.(B)Derived allele frequency of deleterious SNPs in bins of 0.05.

3.5.Realized heterosis is reduced in crosses with parents of wide genetic divergence

The populations showed significantly different values of average mid-parent heterosis(MPH)(Fig.4A),with lower average mean MPH(P-value＜0.0001)in Exp.2 than in Exp.1(Table S4).The values of better parent heterosis shows an even stronger disparity between the two populations,with average better parent heterosis being significantly different(P-value＜0.0001)and negative in Exp.2(Fig.4B).Looking at these distributions in light of the correlations between mid-parent value and MPH(Fig.4C),it is further apparent that lower average MPH in Exp.2 is nonintuitive since the trends in correlation are stronger(more negative)in Exp.2,arguing for the expectation of higher heterosis in the population.Furthermore,the higher mean genetic distance(～7%;Fig.S1)between parental genotypes of Exp.2[GD(Rogers):Exp.2=0.353(se=0.022)vs GD(Rogers):Exp.1=0.33(se=0.028)]strengthens the presumption about reduced heterosis in Exp 2.The opposite distribution of MPH could possibly be explained by differential genetic underpinnings responsible for heterosis in the two populations.

3.6.dSNPs are informative and slightly increase prediction accuracy

Fig.4.Analysis of hybrids of Exp.1(blue)and Exp.2(green).Distributions of(A)relative mid-parent heterosis(MPH%)(B)relative better parent heterosis(BPH%)(C)correlation between mid-parent value(MPV)and mid-parent heterosis(MPH).

Fig.5.Prediction accuracy for(A)Exp.1 and(B)Exp.2 for models with 1.random markers(M1)and 2.random+dSNPs(M2).The mean prediction accuracies for each model are given at the bottom of the figure.Effects or combinations thereof are represented as E1(d only),E2(aa only),E3(aa and d),E4(aa,d,and ad),E5(aa,d,ad and dd).Individual effects are denoted therein as dominance(d),additive-by-additive(aa),additive-by-dominance(ad),dominance-by-additive(da)and dominance-by-dominance(dd)effects.

Comparing prediction accuracies in the two experimental series using two sets with equal numbers of(1)random markers or(2)random markers and dSNPs with RS Score＞2,the prediction accuracy was higher in the second set(up to～1.5%in Exp.1 and～2.4%for Exp.2)in either of the population(s).Specifically,the addition of dSNPs to a set of random markers augmented the prediction accuracies for model with(1)additive-by-additive effects in Exp.1(Fig.5A)and(2)all except additive-by-additive effects in Exp.2(Fig.5B).The increase in prediction accuracies is in line with previous results wherein(1)additive-by-additive epistasis effects in crosses with parents of moderate genetic divergence and(2)dominance and associated digenic effects in crosses with parents of wide genetic divergence were reported superior for predicting hybrid performance[22].The different treatment of SNPs in the two models despite ignoring epistasis provides better prediction accuracies and is an indication that dSNPs should be modelled separately.

3.7.Deleterious SNPs contribute to grain yield heterosis

Quantitative genetic theory describing linear mixed models augmented with epistatic interactions to capture heterotic QTL(H-QTL)in wheat was used to assess whether the GERP-SNPs,particularly dSNPs,contributed differently to heterosis in the two experimental series.The initial number of markers was pruned given the different patterns of linkage disequilibrium decay in the two populations to obtain conservative but highly informative 28,447 SNPs in Exp.2 and 28,142 SNPs in Exp.1(pruned at r2of 0.95 and 0.5 respectively).The thresholds of pruning markers are necessary since haplotype blocks are larger in Exp.1,as expected,in contrast to Exp.2,where the blocks are smaller,as expected,given the trends in LD decay(Fig.2A).

High number of H-QTL,and its components,i.e.dominance and digenic epistatic effects were detected in Exp.1(Fig.6A–D)compared to Exp.2(Table S5).Amongst the 520 detected H-QTL,229 were dSNPs with RS score＞0 and interestingly,most of the significant interactions(～70%)for any given digenic effect were contributed by few significant H-QTL on chromosome 3B(Fig.6E).Further,amongst the 6H-QTL with RS score＞2,four had a mean negative effect considering its dominance and epistatic interaction effects with other SNPs.

In Exp.2,one significant H-QTL was detected with conventional threshold or marker effects,however,relaxing this limit to false discovery rate of 0.1 led to detection of 75 unique H-QTL,36 of which were dSNPs with RS score＞0.

The variance component decomposition for MPH in either population with modified marker coding(Table S7)showed similar trends wherein additive-by-additive effects were found contributing predominantly to the phenomenon.

3.8.dSNPs contributing to heterosis have an incomplete degree of dominance

The absolute degree of dominance(dod)evaluated using significant H-QTL for grain yield in Exp.1 displayed a positive correlation(0.12,P-value=0.06)with RS score,implying that more deleterious dSNPs are likely to be recessive for derived allele in hybrids of Exp.1(Fig.7).Whereas the correlation between(-)dod and RS score was negative(-0.15),that between(+)dod and RS score was slightly positive(0.05)(Fig.S2).Specifically,the minor allele will be maintained at low(+d/+a or+d/-a)to intermediate(-d/-a or-d/+a)population frequency given the coding for dominance and additive effects in Exp.1 and this trend is strongly captured by dSNPs since a dSNP with high RS score is also expected to have low derived allele frequency(Fig.3A)in the parental populations.Due to lacking data points in Exp.2 with conventional thresholds,assessing correlation was not possible.Even with the relaxed thresholds of FDR 0.1 only one marker H-QTL was with RS score＞2,which made analysis into the degree of dominance in Exp.2 futile.

Fig.6.Circos plots for significant effects and digenic interactions between the heterotic QTL localized at different chromosomes for Exp.1.(A)The inner circle shows pairs of interacting loci for aa effect.The pan-ultimate sectors show significant HQTL and the outermost sectors show significant dominant effects.(B–D)Circle plots showing significant digenic interactions for ad,da and dd effects.(E)Stacked pie plot showing proportions of different epistatic interactions originating from significant H-QTL on different chromosomes.From inside out;additive-by-additive(aa),additive-by-dominance(ad),dominance-by-additive(da)and dominance-by-dominance(dd)effects.

Fig.7.Degree of dominance was derived for grain yield in Exp.1.(A)Distribution of degree of dominance(dod)for significant H-QTL.(B)Trend plot showing correlation between the absolute degree of dominance and RS score.The regression line was obtained by fitting a linear model.The grey area around the line shows the confidence intervals for the fitted model.

4.Discussion

The limitation of SNP arrays in capturing rare variants[2,3]translates to a loss of valuable information and a potential bias in estimating quantitative genetic parameters for the trait of interest.Identification of rare variants contributing to the phenotypic variation of complex traits is possible by:(1)performing genome-wide association study studies(GWAS)for rare variants and selecting significant variants[53],(2)exploiting biological information on protein annotations to identify functional variants[54],or(3)utilizing comparative sequence information to identify putative deleterious variants[13].The latter approach is especially relevant for studying complex traits such as heterosis because it is independent of the effects of small population size and exploits selective forces to mark rare variants in the genome.

4.1.Selective constraint across the genome allows phenotype independent identification of putative deleterious SNPs

Genomic evolutionary rate profiling(GERP++)captures genetic variants in both coding and non-coding regions of the genome,whereas the contrasting approaches that also use comparative sequence information,namely SIFT[10],PolyPhen-2[11],and BAD_mutations[13],are restricted in their search to coding regions of the genome.Noncoding regions are not only under selective constraint[55],but they are also important for phenotypic variation,so their inclusion is likely to augment the analyses of complex traits[56].

While the low allele frequency of moderately deleterious alleles in the elite group compared to plant genetic resources(PGRs)(Fig.3A)indicates the robustness of GERP++in capturing SNPs harboring rare variants in the current study,it is also possible that GERP++may have missed certain deleterious SNPs(dSNPs).Since identification of deleterious variants is inherently dependent on calculating the neutral rate for each alignment position of the reference genome,estimates of deleteriousness for a site are affected by alignment quality,genomic region under consideration[12],and functional turnover due to population specific selection[7].Additionally,the number of alignments that fall below the threshold(＞3 in the current study)may cause a particular site to be excluded from the analysis,even if the site is putatively deleterious.It is also possible that alignment species lack whole regions of the genome,i.e.,are not informative owing to differences in genome size with the target species,viz.wheat,therefore restricting the power for detecting dSNPs at such sites.Adding more species to the alignment does not necessarily augment the detection of slightly deleterious dSNPs,but that of moderately deleterious dSNPs might improve marginally[7].Interestingly,an alternative approach to tag slightly deleterious SNPs can benefit by combining results of dSNP detection from multiple approaches[25],but due to computational costs,these were not implemented in the current study.

Offsets due to lineage-specific functional turnover were addressed by detecting selective constraint across the wheat genome and not between individual representative lineages/breeding populations.This method has the obvious disadvantage of overlooking some population-specific dSNPs,but this kind of resolution will be difficult to achieve given the existence of numerous breeding populations in wheat[57].While detection of dSNPs may be compromised for above reasons,those detected with a high positive RS score were putatively deleterious with a low false discovery rate[7],as shown by the enrichment of high and moderate impact SNPs(Fig.1B).

4.2.Sequencing platform used for genotyping affects the resolution for detection of dSNPs

Genotyping by sequencing(GBS)has been reported to capture rare alleles responsible for phenotypic variation in genebank accessions and thus suitable for genomic analyses[1].However,the limited sequencing depths typically applied in GBS platforms is marred with unique problems including chiefly the high proportion of missing values for large number of SNP variants.Amongst the 295,122 GERP-SNPs for instance,only 16.8% SNPs had＜50%missing values with minor allele count of 1 in the GBS data used for sequencing 177 male parents in Exp.2.The whole genome sequencing on the other hand produced 221,888 SNPs(～75.15%)with the same threshold(s)for parental genotypes of Exp.1.The high number of recovered markers advocates the utility of the latter sequencing technology especially for capturing rare alleles since it ultimately translates to reduced reliance on imputed data.

Further,given the low population frequency of deleterious alleles,whereas GBS platform is advantageous to access the SNPs contributing to genetic diversity in PGRs and thus likely harboring deleterious alleles,the impending imputation of those SNPs implies that the‘‘rarity”of alleles is lost owing to high missing values.This possibly explains why non-consequential(low)number of heterotic QTL(H-QTL)was detected in Exp.2(Table S5)wherein parental genotypes were sequenced by GBS.On the other hand,whole genome sequencing not only accesses relevant SNPs,but the low missing count for a SNP means that the detected rare variants comply with biological expectation.Additionally,the large size of hybrid population in Exp.1 also contributed to mining significant H-QTL.Accordingly,a large number of HQTL were uncovered in Exp.1 and consecutively favorable variance decomposition was detected for mid-parent heterosis(MPH)in the population.

4.3.Differential weighting of moderately deleterious SNPs with additional kernels boosts prediction accuracy

Significantly higher(P-values＜0.05)prediction accuracies were observed with incorporation of dSNPs to an otherwise random marker dataset(model 2)for additive-by-additive effects in Exp.1 and dominance and dominance associated epistatic effects in Exp.2(Fig.5).The models for different combination of effects and population(Exp.1 or Exp.2)were tested for significant difference using pairwise t-test since the split of hybrid genotypes into training and test set was the same for a given population.Comparable studies albeit considering only additive effects for calculation of breeding values in dairy cattle have also reported a favorable impact on prediction accuracy by additional inclusion of causal‘‘rare and low frequency variants”into the prediction model[58].Similarly,boost in prediction accuracy by modelling genetic complementation at dSNPs has also been reported in maize[23].The improvement observed in prediction accuracies indicates novel information captured by dSNPs,which may be utilized in future prediction models for MPH in wheat.

4.4.Moderately deleterious SNPs contribute to grain yield heterosis in Exp.1

Investigations into the variance decomposition for MPH revealed predominance of additive-by-additive effect in Exp.1(Table S7),and is in accordance to previous results within the same elite hybrid population investigated with a SNP array[20,22].However,for Exp.2 also,the additive-by-additive(aa)effect was found dominant compared to others(Table S7).Interestingly,the relative contribution of effects to MPH changed substantially in Exp.2,compared to previous findings in wide crosses[22].The aberrant contributions of aa effects to heterosis in Exp.2 could be attributed to correlation between kinship matrices(Table S9;values for Exp.1 also given for comparison in Table S8)for Exp.2.Given that using the marker dataset for Exp.2 only a few H-QTL were detected,it is possible that the variance decomposition using the same marker dataset fails to capture the biologically significant shares of marker effects.There is scope for additional investigations into the proportion of phenotypic variation explained by coding and noncoding variants,but such investigations are presently unprobeable since the design matrices of the two would be correlated.Furthermore,understanding the genetic architecture of complex traits using variance decomposition is a debating area,with few studies downright rejecting the validity of the procedure[59].Conclusively,the results of variance decomposition for Exp.2 sheds light on the limitations of the existing methods and encourage further research in this area.

Moderately deleterious alleles were found to have lower derived allele frequency in elites compared to the PGR group(Fig.3A).And since Exp.1 involved crosses between elites,an extension of the aforementioned result was observed in form of a mildly positive correlation between the degree of dominance for grain yield and RS score(Fig.7).Accordingly,the moderately dSNPs are also likely to have low frequency of deleterious alleles in the hybrids of Exp.1.The results base on the assumption that parents with low allele frequency of deleterious alleles are crossed,as is expected for the parental genotypes of Exp.1.The correlation also highlights the incompletely dominant inheritance of dSNPs in Exp.1,however,even though the correlation observed between degree of dominance and RS score was significant,the actual points responsible for correlation were low(only 6 SNPs with RS score＞2).Therefore,this correlation despite meeting theoretical and practical expectations needs further confirmation.

In conclusion,(1)the derived alleles at dSNPs with high selection pressure are purged with selection and are robustly detected by leveraging comparative sequence information;(2)a phenotype-independent detection of dSNPs opens avenues for their targeted elimination from breeding populations and allows sifting markers from otherwise doomed to be pruned fraction;(3)dSNPs interact chiefly via.epistasis in hybrids and,as a result,influence heterosis;and(4)incomplete dominance of dSNPs suggest their low to intermediate population frequency in hybrids from elite crosses.

CRediT authorship contribution statement

Abhishek Gogna:Formal analysis,Data Curation,Writing-Original Draft,Visualization.Jie Zhang:Validation,Supervision.Yong Jiang:Methodology,Software.Albert W.Schulthess:Investigation,Supervision.Yusheng Zhao:Conceptualization,Investigation.Jochen C.Reif:Conceptualization,Resources,Writing-Review & Editing,Project administration,Funding acquisition.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the German Federal Ministry of Food and Agriculture(FKZ2818408B18),the Federal Ministry of Education and Research of Germany(FKZ031B0184A,B),and the China Scholarship Council(201906350045).

Appendix A.Supplementary data

Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2022.06.009.

The Crop Journal2023年1期

The Crop Journal的其它文章: Fusarium pseudograminearum and F.culmorum affect the root system architecture of bread wheat; See the color,see the seed:GmW1 as a visual reporter for transgene and genome editing in soybean; Effects of paclobutrazol application on plant architecture,lodging resistance,photosynthetic characteristics,and peanut yield at different single-seed precise sowing densities; Mixing trait-based corn(Zea mays L.)cultivars increases yield through pollination synchronization and increased cross-fertilization; Increase in root density induced by coronatine improves maize drought resistance in North China; Responses of photosynthetic characteristics and leaf senescence in summer maize to simultaneous stresses of waterlogging and shading

亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放