Yang Song ,Cong Jiang ,Kun-Hua Li ,Jing Li ,Hong Qiu ,Megan Price ,Zhen-Xin Fan,3,Jing Li,3,*
1 Key Laboratory of Bio-resources and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu,Sichuan 610064, China
2 Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu,Sichuan 611130, China
3 Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, College of Life Sciences, Sichuan University, Chengdu,Sichuan 610064, China
ABSTRACT The genus Macaca serves as an ideal research model for speciation and introgressive gene flow due to its short period of diversification (about five million years ago) and rapid radiation of constituent species.To understand evolutionary gene flow in macaques,we sequenced four whole genomes (two M.arctoides and two M.thibetana) and combined them with publicly available macaque genome data for genome-wide analyses.We analyzed 14 individuals from nine Macaca species covering all Asian macaque species groups and detected extensive gene flow signals,with the strongest signals between the fascicularis and silenus species groups.Notably,we detected bidirectional gene flow between M.fascicularis and M. nemestrina. The estimated proportion of the genome inherited via gene flow between the two species was 6.19%.However,the introgression signals found among studied island species,such as Sulawesi macaques and M.fuscata,and other species were largely attributed to the genomic similarity of closely related species or ancestral introgression. Furthermore,gene flow signals varied in individuals of the same species (M.arctoides,M.fascicularis,M.mulatta,M.nemestrina and M.thibetana),suggesting very recent gene flow after the populations split.Pairwise sequentially Markovian coalescence (PSMC) analysis showed all macaques experienced a bottleneck five million years ago,after which different species exhibited different fluctuations in demographic history trajectories,implying they have experienced complicated environmental variation and climate change.These results should help improve our understanding of the complicated evolutionary history of macaques,particularly introgressive gene flow.
Keywords: Macaca; Whole genome; Introgression;Gene flow; Demographic history
Natural hybridization between closely related species plays an important role in evolution and is now recognized as common in animals,including primates (Arnold & Meyer,2006;Baiz et al.,2020;Cortés-Ortiz et al.,2007;Fan et al.,2018;Gante et al.,2016;Kuhlwilm et al.,2019;Martin & Jiggins,2017;Rogers et al.,2019;Zhang et al.,2017).Hybridization is an important source of genetic variation during the evolutionary process and can introduce adaptive variants to recipient populations and even drive new species formation (Figueiro et al.,2017;Martin & Jiggins,2017;Zinner et al.,2011).The footprints left by hybridization in genomes allow the detection of ancient hybridization and gene flow,with sequencing technology enabling research on whole-genome data.The ever-growing database of whole genomes from multiple species and populations can now be used to address important evolutionary questions,such as introgressive hybridization,incomplete lineage sorting (ILS),and speciationwith-gene-flow (árnason et al.,2018).
At present,the genusMacacais comprised of 23 extant species.Macaques are the most widely distributed non-human primates and occupy a wide range of habitats (Ito et al.,2018;Roos et al.,2019).Except forM.sylvanus,which is distributed in Africa (Algeria and Morocco),all other species are found in Asia (see IUCN Red List).Historically,macaques dispersed into Eurasia during the late Miocene,roughly coinciding with the sea level drop associated with the Messinian Salinity Crisis (Abbott et al.,2013;Roos et al.,2019).The oldest putativeMacacafossils in Europe and Asia date to 5.9–5.3 million years ago (Ma) and 4.9–3.6 Ma,respectively (Roos et al.,2019).After reaching Asia,Macacaexperienced rapid speciation and radiation due to climate and environmental change during the late Pliocene and Pleistocene (Abbott et al.,2013;Roos et al.,2019;Woodruff,2010).Macacacan be classified into four or seven species groups (Delson,1980;Fooden,1976;Groves,2001;Roos et al.,2014,2019).The seven species group classification includes three new species groups,i.e.,arctoides,mulatta,andsulawesi(Roos et al.,2014),by separatingM.arctoides,M.mulatta+M.fuscataand all Sulawesi macaque species (respectively) from three(sinica,fascicularis,andsilenus) of the four species groups(Delson,1980).In this study,we used the four species group classification (sylvanus,silenus,fascicularis,andsinica)proposed by Delson (1980).Our samples covered all Asian groups of macaques,includingsilenus(M.nemestrina,M.nigra,andM.tonkeana),fascicularis(M.fascicularis,M.fuscata,andM.mulatta),andsinica(M.arctoides,M.assamensis,andM.thibetana).
Hybridization has been previously reported inMacaca,e.g.,betweenM.fuscataandM.cyclopis(Hamada et al.,2012),M.mulattaandM.fascicularis(Hamada et al.,2016;Ito et al.,2020;Rovie-Ryan et al.,2021),M.thibetanaand ChineseM.mulatta(Fan et al.,2014),M.arctoidesandfascicularisspecies group (Fan et al.,2018;Jiang et al.,2016;Li et al.,2009;Tosi et al.,2000,2003b),and between Sulawesi macaques (Ciani et al.,1989).Active tectonics and changing climate and sea levels during the Pliocene and Pleistocene and the periodic formation of land bridges in Southeast Asia significantly influencedMacacaevolution and allowed possible hybridization or introgression among species/species groups(Meijaard,2003).In addition,macaques have an almost identical chromosome karyotype,a factor that may facilitate hybridization among species and the production of viable hybrid offspring (Hamada et al.,2012;Yang & Shi,1994).In recent years,there has been a considerable increase in whole-genome-based studies on hybridization in macaques.For example,Fan et al.(2014) reported on hybridization betweenM.thibetanaandM.mulattabased on whole-genome sequencing analysis ofM.thibetana.Fan et al.(2018)identified an introgression event in the evolutionary history ofM.arctoidesby genome-wide analysis.Osada et al.(2021)reported on sex-biased admixture between thefascicularisandsinicaspecies groups based on whole-genome analysis of 17 macaques.Vanderpool et al. (2020) performed phylogenomic analysis of primates,including three species of macaques,and found introgression betweenM.nemestrinaandM.fascicularis.Ito et al.(2020) suggested potential gene flow betweenM.fascicularisandM.mulattafollowing secondary contact after a period of isolation,and that the migration rate fromM.mulattatoM.fascicularismay be slightly higher than in the opposite direction.
However,most field observations and molecular-based analyses ofMacacahybridization have focused on the most recently diverged species groups offascicularisandsinica,with gene flow between the more basalsilenusgroup and other groups less studied.Moreover,genus-wide hybridization or gene flow studies ofMacacaare scarce,particularly with whole-genome data.Here,we included three species from thesilenusgroup and presented high-coverage whole-genome data of four macaques (twoM.arctoidesand twoM.thibetana).Combined with other publicly available data,our study included 14 individuals from nineMacacaspecies,covering all species groups of Asian macaques.We estimated local evolutionary histories,reconstructed a multispecies coalescent phylogenetic tree,and explored ancient or ongoing introgressive gene flow in macaques based on whole-genome DNA sequences.Furthermore,we also analyzed the genetic diversity and demographic history of macaques.
All samples were collected following the Regulations for the Implementation of the Protection of Terrestrial Wildlife of China (State Council Decree [1992] No.13).Throughout the procedures,all animals received humane care to avoid suffering and to ensure animal welfare.All animal protocols involved in this study were reviewed and approved by the Ethics Committee of the College of Life Science,Sichuan University,China (Permit No.:20 200 327 009).
The genomes of twoM.arctoidesand twoM.thibetanawere sequenced using Illumina HiSeq 2 500 technology with a 150 bp paired-end strategy.DNA was isolated from blood using a commercially available DNeasy Blood and Tissue Kit (Qiagen,Germany).Sequencing libraries were prepared with an insert size of 300 bp.The twoM.arctoidesindividuals were from the Yunnan Province and Guangxi Zhuang Autonomous Region of China,respectively.The twoM.thibetanamonkeys were from the Anhui Province and Guangxi Zhuang Autonomous Region of China,respectively (Table 1).In addition,we downloaded 11 genome sequence data from the NCBI SRA database.In total,we used whole-genome data from 14 individualsbelonging toMacacaand one individual ofPapio anubisin our study.The 15 whole-genome sequences covered all species groups of Asian macaques and included 10 species:i.e.,M.arctoides(stump-tailed macaque),M.thibetana(Tibetan macaque),M. assamensis(Assamese macaque),M.nemestrina(southern pig-tailed macaque),M.nigra(blackcrested macaque),M.tonkeana(Tonkean macaque),M.fascicularis(crab-eating macaque),M. mulatta(rhesus macaque),M.fuscata(Japanese macaque),andP.anubis(olive baboon).Detailed information on the 15 genomes and sequencing data is given in Table 1 and the “Detailed information on samples” section in the Supplementary Materials.
Table 1 Information on samples and genomic data
Quality control of reads was performed using fastp (v.0.20.0)(Chen et al.,2018) and Trim Galore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/,v.0.6.1).Firstly,lowquality bases were trimmed at both ends of the reads using fastp with parameters:fastp --detect_adapter_for_pe -5 -3.Adapter sequences were then cut using Trim Galore with parameters:trim_galore -q 20 --phred33 --stringency 3 -e 0.1 --length 50 --max_n 0 --trim-n -j 4 -paired.Filtered reads were mapped to theM.mulattagenome Mmul_8.0.1 (NCBI Assembly ID GCA_000772875.3) with Bowtie2 (v.2.3.5.1)(Langmead & Salzberg,2012).As a new Mmul_10 reference genome is now available,genome-wide analyses based on the new reference may be slightly different from the present study.From the mapped reads,SNVs were called using the Genome Analysis Toolkit (GATK,v.4.1.2.0) (DePristo et al.,2011) following the GATK germline short variant discovery workflow (https://software.broadinstitute.org/gatk/bestpractices/workflow?id=11 145).Repetitive sequences of the Mmul_8.0.1 assembly were annotated using RepeatMasker(v.open-4.0.9) (Tarailo-Graovac & Chen,2009).The SNV sites were filtered out when one or more of the following criteria were met:overlap with repetitive sequences,DP<3.0,QD<2.0,SOR>3.0,FS>60.0,MQ<40.0,MQRankSum<–12.5,and ReadPosRankSum<–8.0.Statistics of SNVs were calculated using bcftools (v.1.9) (Li,2011) (Table 2).
Table 2 SNV information for each analyzed macaque
The SNVs of all samples were merged using bcftools (v.1.9)with the parameter -0.The SNVs from every non-overlapping 100 kb genome fragment were concatenated separately.Heterozygous SNVs were represented according to the degenerate base naming rules of the International Union of Pure and Applied Chemistry (IUPAC). We called the concatenated sequences “SNV-fragment”.In total,26 633 SNV-fragments were obtained for phylogenetic analysis.For each SNV-fragment,a phylogenetic tree was computed using IQ-TREE (v.1.6.10) (Hoang et al.,2018;Kalyaanamoorthy et al.,2017;Nguyen et al.,2015) with parameters:-st DNA -bb 1 000-m MFP+ASC.Branches of the SNV-fragment trees with support values lower than 30% were contracted using Newick utilities (v.1.6) (Junier & Zdobnov,2010).Based on all SNV-fragment trees,ASTRAL (v.5.6.3) (Zhang et al.,2018) and MP-EST (v.2.0) (Liu et al.,2010) were used to estimate the species tree with default parameters.The species tree was rooted withP.anubis.These two programs generated the same phylogenetic topology.We tested multiple genome fragment sizes (30,50,70,and 100 kb).All genome fragment sizes led to consistent results.
We randomly selected one of the first 1 000 SNVs and selected one every 1 000 SNVs with this SNV as the starting point.We then concatenated these SNVs to form an alignment to estimate divergence time.The final alignment length was 28 498 bp.To reduce the consumption of computing resources and time,the first 20 000 sites of the alignment were used to estimate divergence time with SNAPP (Bryant et al.,2012) in BEAST (v2.6.0) (Bouckaert et al.,2014) under the following parameters:(1) four age constraints,which were derived from our unpublished genome assembly-based divergence time estimations ofMacacaby MCMCTree (Yang,2007) (Supplementary Table S1);(2) chain length of 1 000 000 MCMC iterations;and (3) species tree generated from ASTRAL as a guide tree.An ascertainment-bias correction was applied to the divergence time estimations because of no constant site in the SNV data (Bryant et al.,2012;Lewis,2001).We performed divergence time estimation three times to confirm similar results (see Stange et al.(2018) for detailed methods of divergence time estimation). The topology generated from SNAPP was used for subsequent analyses.
TheDstatistic (Green et al.,2010) utilizes the number of biallelic sites that meet the ABBA or BABA site pattern in a four-taxon phylogeny,and requires a phylogenetic topology following (((P1,P2),P3),O) with P1 to P3 being ingroups and O being the outgroup.We applied theDstatistic to all asymmetric four-taxon phylogenies that could be extracted from the species tree withP.anubisas an outgroup using Dsuite (v.0.2 r17) (Malinsky et al.,2021).This resulted in 364 gene flow analyses.Four-taxon phylogenies with correctedP<0.001 were retained. Thefdstatistic quantifies the proportion of the genome affected by introgression and utilizes four-taxon phylogeny equaling theDstatistic (Martin et al.,2015).Thefdstatistic of the retained four-taxon phylogenies was calculated using Dsuite.Gene flow direction can be estimated by a derivation of theDstatistic,i.e.,DFOILanalysis(Pease & Hahn,2015).DFOILanalysis requires a symmetric five-taxon phylogeny with a specific topology.Therefore,not all combinations of five taxa could be analyzed.All symmetric five-taxon phylogenies (withP.anubisas the outgroup) that fitDFOILanalysis were extracted andDFOILanalyses were finished using ExDFOIL(Lambert et al.,2019).
We found that all species pairs between thefascicularisandsilenusspecies groups exhibited very strong gene flow signals.Therefore,further analyses were undertaken to determine if these signals reflected hybridization between the examined extant species after all dichotomous divergence events.We calculated thefdstatistic for all species pairs based on non-overlapping 30 kb windows.The script for calculatingfdbased on sliding windows was deposited in GitHub (https://github.com/sy-snake/Sliding_window_D_and_fd).We selected pairs from theM.fascicularis-silenusgroup species (M.nemestrina,M.nigra,andM.tonkeana) andM.nemestrina-fascicularisgroup species (M.fascicularis,M.fuscata,andM.mulatta) as target pairs.We then selected the top 5% of windows with the strongestfdvalue of target species pairs for subsequent analyses.Of these windows,we discarded any that met the following criteria:(1) appeared in more than 60% of all species pairs;(2) window coverage was larger than two times or less than one third of the sample’s average coverage;and (3) window heterozygosity was larger than two times the sample’s whole-genome heterozygosity.The overlapping ratios of the remaining windows of target pairs from theM.fascicularis-silenusgroup species andM.nemestrina-fascicularisgroup species were calculated to determine whether these strong hybridization signals were due to genomic similarity of closely related species or hybridization between checked species after all dichotomous divergence events.If we assume that the genome fragments exchanged during hybridization are random,then these windows should not be shared at a high level among species pairs.If this ratio is high,we can assume that strong hybridization signals are largely due to the genomic similarity of closely related species.
We also checked genes (including upstream 5 kb) located in the top 5% of windows of species pairM.fascicularis-M.nemestrinausing bedtools with parameters:bedtools intersect-wo -f 0.5 -F 0.5 -e.Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO)-based enrichment analyses of these genes were performed using g:Profiler (Reimand et al.,2007).We setM.mulattaas the background organism,statistical domain scope to “all known genes”,significance threshold to “g:SCS thread”,and remaining parameters to default.
PhyloNetworks (Solís-Lemus et al.,2017) uses maximum pseudolikelihood methods to estimate phylogenetic networks.Here,we used the “snaq!” program in PhyloNetworks and applied different maximum numbers of hybrid nodes (hmax:0-6) using the 26 633 SNV-fragment trees as input for 100 iterations.Each network generated from PhyloNetworks has a network score,with lower scores indicating better networks.Similarly,PhyloNet (Wen et al.,2018) was specifically developed to reconstruct reticulate phylogenies from a set of gene trees.We used the “InferNetwork_MPL” method with 1 000 runs and six as the maximum number of reticulations,yielding five optimal networks to analyze the 26 633 SNVfragment trees.We ran the program multiple times with a different maximum number of reticulations (0–6).When the maximum number of reticulations was set to five,the number of reticulations in the phylogenetic networks no longer increased.We set the maximum number of reticulations to six to ensure that the number of reticulations did not increase.Consensus networks of the 26 633 SNV-fragment trees were generated using phangorn (v.2.5.5) (Schliep,2011) with different proportions.
The demographic histories of the 14Macacaindividuals were inferred from genome sequences using the pairwise sequentially Markovian coalescence model (PSMC) (Li & Durbin,2011). Only autosomes of reference assembly Mmul_8.0.1 were used in the demographic history analyses.Input files for PSMC were generated using bcftools mpileup and vcfutils with the minimum and maximum depth of coverage thresholds set to one third and two times the sample’s average depth,respectively.PSMC was run for 25 iterations,with aN0-scaled maximum coalescent time of 15,initial θ/ρ ratio of 5,and time intervals of 64 parameterized as“1×6+58×1”.Bootstrapping was performed for 100 runs.PSMC plots were scaled with a mutation rate of μ=0.58×10-8per bp per generation (Wang et al.,2020) and a generation time of g=10.According to published reports,the reproductive age ofM.mulattaranges from about four years old to late teens,resulting in a median parent birth age of about 10 years(Kubisch et al.,2012;Xue et al.,2016).Therefore,we used 10 as the generation time.
Genetic diversity was assessed using heterozygosity of each individual.To exclude potential effects of different sequencing coverage of each macaque,the BAM files were downsampled to~15× using GATK.Heterozygosity was estimated based on the downsampled BAM files.SNVs of autosomes were called using bcftools with the parameter “bcftools mpileup -C50 -Ou |bcftools call -m”.Heterozygosity of individuals was calculated as the number of heterozygous sites divided by the total number of callable sites across the whole genome.In addition,we performed a calculation of heterozygosity from genomic non-overlapping 1 Mb windows.
Sequencing of genomic DNA from the twoM.arctoidesand twoM.thibetanausing Illumina technology yielded 93.7–152.0 Gb of data.Combined with the 11 whole-genome sequences downloaded from the NCBI SRA database,a total of 15 whole-genome sequences were used for subsequent analyses.Reference genome mapping of the 15 wholegenome sequences against theM.mulattagenome assembly Mmul_8.0.1 yielded genome coverages of 17.8–43.0×.Table 1 and Supplementary Table S2 detail the sequencing and mapping data and provide accession numbers of the included species.RepeatMasker (Tarailo-Graovac & Chen,2009) identified 51.95% of repetitive sequences in theM.mulatta(reference) genome.Of these,12.75% and 18.94%were short and long interspersed elements (SINE and LINEs),respectively (Supplementary Table S3). The repetitive sequence regions were excluded in SNV calling.After a series of filters,we identified 4.06–8.73 million SNVs in the 14 macaque genomes.Detailed information on the SNVs is shown in Table 2.Concatenating SNVs from every nonoverlapping 100 kb genome fragment yielded 26 633 alignments,totaling 28 497 813 bp in length for each sample.These alignments were used for subsequent evolutionary analyses.
Model testing suggested different models for different SNVfragments. The 26 633 SNV-fragment trees were then constructed based on SNV-fragment-specified models.The distributions of SNV-fragment trees across genomes at the individual level and species group level are shown in Figure 1 and Supplementary Figure S1.The local evolutionary histories of the macaques showed many conflicts (Figure 1),indicating different genomic areas had different evolutionary signals across the macaque genome,which implies complex evolutionary histories of macaques.Various programs were used to reconstruct the macaque species tree based on the 26 633 SNV-fragment trees,which generated similar results except for the position of Mfas_1 (M.fascicularisspecimen from China) (Figure 2A,B).Our general topology conforms to previous nuclear marker analyses of the phylogenetic relationships withinMacaca(Fan et al.,2018;Jiang et al.,2016;Li et al.,2009).Here,the macaques clustered into three branches,corresponding to thefascicularisspecies group (M.mulatta,M.fuscata,andM.fascicularis),sinicaspecies group(M.thibetana,M.assamensis,andM.arctoides),andsilenusspecies group (M.nemestrina,M.nigra,andM.tonkeana)(Figure 2C).With respect to the position of Mfas_1,it was either grouped by itself at the basal node ofM.mulattaandM.fuscataby ASTRAL and MP-EST (Figure 2A) or grouped with Mfas_Mau (M.fascicularisspecimen from Mauritius) by SNAPP (Figure 2B).The estimated quartet scores for these two clusters were 0.420 8 (ASTRAL and MP-EST) and 0.404 7(SNAPP),which are too close to describe the position of Mfas_1 accurately.A similar phylogenetic inconsistency was also observed following CONSENSE (Felsenstein,1989)analysis of the SNV-fragment trees.Although a majority-rule consensus tree (Supplementary Figure S2) confirmed the topology of the ASTRAL and MP-EST species tree(Figure 2A),the separate grouping of the twoM.fascicularisspecimens occurred 9 959 times,accounting for 37.39% of the 26 633 trees (Supplementary Table S4).
Figure 1 Local evolutionary history of macaques at individual level
Figure 2 MSC (multispecies coalescent) tree
Divergence time was estimated based on 20 000 SNV sites(Figure 2B).The estimated divergence time of baboons and macaques was~10.38 (95% CI:9.30–11.51) Ma,congruent with previous reports (Abbott et al.,2013;Roos et al.,2019).Thesilenusandfascicularis/sinicaspecies groups diverged~1.95 (95% CI:1.74–2.17) Ma,and thefascicularisandsinicaspecies groups diverged~1.56 (95% CI:1.39–1.73) Ma.
Consensus network analysis of the SNV-fragment trees yielded cuboid structures of connecting alternative branches,thus indicating reticulate phylogenetic relationships among macaques (Figure 3).Cuboid structures mainly occurred in thefascicularisandsinicaspecies groups at a threshold of 11%,thus indicating phylogenetic conflicts within/between the two species groups and potential interspecific gene flow.There was no cuboid structure in the net center,suggesting that the evolutionary relationship of ancestral macaques is relatively clear.At lower thresholds,the phylogenetic signal became more complex,and a cuboid structure appeared in the center of the net,thus implying additional phylogenetic conflict(Supplementary Figure S3).
Figure 3 Consensus network of 26 633 SNV-fragment trees with 11% threshold
Figure 4 Gene flow signals in macaques
BothDandDFOILanalyses identified substantial gene flow signals among macaques (Figure 4;Supplementary Tables S5,S6).The gene flow signals detected by theDstatistic(correctedP<0.001) at the individual and species level are shown in Figure 4A,B,respectively.Most species pairs showed significant gene flow signals,with strong signals detected in all species pairs between thefascicularisspecies group (M.fascicularis,M.fuscata,M.mulatta) andsilenusspecies group (M.nemestrina,M.nigra,M.tonkeana).DFOILanalyses identified the direction of gene flow signals between the 11 individual pairs (Supplementary Table S6).Ten of these signals were significant according to bothDandDFOILstatistical analyses.We mapped the direction of these gene flow signals in the species tree (Figure 4C).Unidirectional gene flow signals were found fromM.mulattatoM.assamensisandM.fascicularis,fromM.nemestrinatoM.fuscataandM.mulatta,and fromM.arctoidestoM.mulattaandM.fascicularis.A bidirectional gene flow signal was detected between MauritianM.fascicularisand BorneanM.nemestrina.
Different introgression signals were observed in individuals from different populations of the same species.For instance,the AnhuiM.thibetana(Mthi_HT1) had an introgression signal withM.fuscata,but the GuangxiM.thibetana(Mthi_R25) did not have this signal.Different individuals ofM.fascicularisandM.nemestrinaalso showed different introgression signals(Figure 4A,C).Mfas_1 (a captive breed specimen collected from China,M.fascicularis) showed introgression signals with the GuangxiM. thibetana(Mthi_R25) andM. fuscataspecimens,while the MauritianM.fascicularisspecimen(Mfas_Mau) did not show these signals (Figure 4A).Mnem_2(specimen from Borneo,M.nemestrina) showed a significant introgression signal withM.nigraandM.tonkeana,but Mnem_1 (specimen ofM.nemestrina) did not (Figure 4A).
The fraction of the genomes shared through gene flow was estimated usingfd(Martin et al.,2015) (Figure 5).TheM.fascicularisandM.nemestrinaspecies pair generated the highestfdvalue (0.061 9).In addition,thefdvalues of species pairs between thefascicularisandsilenusspecies groups(0.023 9–0.061 9) were higher than that of the other species pairs.TheM.fascicularisandM.mulattaspecies pair also generated a relatively highfdvalue (0.024 0).
Figure 5 Proportion of genome shared through gene flow estimated by fd statistic at (A) individual level and (B) species level
In addition to analyses based on site patterns,gene flow was investigated by topology-based analyses using PhyloNetworks (Solís-Lemus et al.,2017) and PhyloNet (Wen et al.,2018).Consistent with theDandDFOILresults,gene flow signals were identified between thefascicularisandsilenusspecies groups (Figure 6;Supplementary Figures S4,S5).Different maximum numbers of reticulations (0–6) were set when estimating networks using PhyloNetworks.We generated seven phylogenetic networks,each with a network score (lower score indicates better network).Networks with the maximum number of reticulations set to six had the lowest score (Figure 6B).However,networks with five or six reticulations (Supplementary Figure S6) could not be re-rooted byP.anubisdue to hybrid edges on theP.anubisbranch.Therefore,we only presented phylogenetic networks with four reticulations,which had the lowest score next to networks with five and six reticulations.PhyloNetworks identified gene flow signals between the following pairs:fascicularisandsilenusspecies group,M.fuscataand ChineseM.mulatta,two individuals ofM.fascicularis,and unsampled species and AnhuiM. thibetana(Figure 6A). PhyloNetworks also determined the proportion of genes inherited via gene flow between these pairs (Figure 6A).For the ancestral lineage of thefascicularisspecies group,39.5% of genes were from an ancestor of thesilenusspecies group.Furthermore,33.2% ofM.fuscatagenes were from ChineseM.mulattaand 23.2% of AnhuiM.thibetanagenes were from an unsampled species.In addition,PhyloNet identified several gene flow signals between thefascicularisandsinica(M. arctoides,M.assamensis,M.thibetana) species groups (Supplementary Figures S4,S5).
Given that strong introgression signals appeared in all species pairs between thefascicularisandsilenusspecies groups,we compared the top 5% of windows showing the strongest introgression signals of several species pairs to determine if these signals reflected hybridization between the examined extant species after all dichotomous divergence events.These windows showed similar genomic distributions among the species pairs (Figure 7;Supplementary Figure S7).An example distribution of these windows across chromosome 18 is shown in Figure 7.In total,the proportion of shared top 5% of windows was 60.14%–67.93% for theM.fascicularissilenusgroup species pairs and 45.84%–47.33% for theM.nemestrina-fascicularisgroup species pairs.The windows specific to each species pair only accounted for 8.70%–21.43% (Table 3).
Figure 6 Rooted network of macaques
Figure 7 Distribution of top 5% of windows with strongest introgression signals according to fd across chromosome 18
There were 893 genes located in the top 5% of autosomal windows of theM.fascicularis-M.nemestrinaspecies pair.Enrichment analysis (Supplementary Tables S7) showed that there were no enriched terms related to fertilization or reproduction.This implies that introgression may not have contributed to the formation of reproductive isolation.
Genetic diversity was assessed using individual heterozygosity.Genome-wide heterozygosity varied considerably among macaques,ranging from 0.000 363 to 0.002 971 (Table 2;Supplementary Figure 8A).Average heterozygosity was lowest forM.thibetana(0.000 551) and highest forM.nemestrina(0.002 829).
Table 3 Statistics on top 5% of windows with strongest introgression signals based on fd for species pairs between fascicularis and silenus species groups
The historical effective population sizes (Ne) of macaques were modeled from the distribution of heterozygous sites across the genome using PSMC analysis (Li & Durbin,2011)(Figure 8B–D;Supplementary Figure S8).The ancestral effective population sizes for all macaques were almost the same and all experienced a bottleneck~5 Ma.After that,theNeof most macaques increased gradually,although that ofM.nigraremained relatively stable.TheM.thibetanapopulationwas largest~0.80 Ma,after which it decreased.TheM.arctoidespopulation was largest ~2 Ma and declined thereafter. The variation trend of theM. assamensispopulation was similar to that ofM.thibetana,but more gradual.TheM.mulattapopulation increased after 5 Ma until 0.13 Ma,but there were two slight declines during the increasing period.TheM.fascicularispopulation reached its largest size~1.30 Ma,and then declined.TheM.fuscatapopulation remained small after 5 Ma,with small fluctuations.TheM.nemestrinapopulation increased from 5 Ma until 0.20 Ma.TheM.tonkeanapopulation remained small from 5 Ma to 1.70 Ma,and then increased to its peak at~0.13 Ma.
Figure 8 Genome-wide heterozygosity and demographic history
Our genome-wide-based multispecies coalescent phylogenetic tree shows a topology consistent with previous research based on nuclear markers and genome-wide data(Deinard & Smith,2001;Fan et al.,2018;Jiang et al.,2016;Li et al.,2009).The reconstructed species tree (Figure 2C)supported the classification of both the four and seven species groups,the differences simply being different hierarchical classifications of the macaques.The twoM.fascicularisspecimens had inconsistent positions in the phylogenetic trees when we used different software (Figure 2A,B).The quartet scores supporting the two phylogenetic topologies were very close (0.420 8 and 0.404 7).CONSENSE analysis separately grouped the twoM.fascicularisspecimens in 37.39% of the 26 633 trees.These results imply that distinct genetic backgrounds exist between the twoM.fascicularis.Mfas_1 was a captive bred individual from China not within its natural range (Ito et al.,2020;Osada et al.,2021;Trask et al.,2013)and may have been introduced from nearby countries such as Vietnam and Thailand.As reported previously,unidirectional introgression hybridization from ChineseM. mulattato IndochineseM.fascicularishas played an important role in the formation of the IndochineseM.fascicularisgenome (Stevison& Kohn,2009;Tosi & Coke,2007;Yan et al.,2011).The Mfas_1 genome may carry fragments originally fromM.mulattaand the mosaic genome may result in a closer phylogenetic relationship between Mfas_1 andM.mulattathan its conspecific Mfas_Mau (Figure 2A).
It should be noted that our samples included not only wildcaught macaques but also captive-bred and unknown-origin samples (e.g.,Mfas_1 and Mnem_1).Therefore,certain results regarding these two samples may not necessarily be conclusive.Further studies that include only wild-caughtMacacaindividuals are necessary to confirm our results.
Phylogenetic inconsistencies among different markers are signals of complex evolutionary history and should not be ignored when analyzing genome fragments longer than 25 kb(Kumar et al.,2017;Nakhleh,2013).There are many possible reasons for phylogenetic conflict,such as mutation,drift,selection,ILS,and hybridization (Nakhleh,2013). Our genome-wide analyses showed that the local evolutionary histories of various genomic regions in the macaques were significantly different (Figure 1).Coalescent-based analyses of whole-genome sequences and network analyses also indicated that the macaque genomes were characterized by contradictory genealogies (Figures 2,3,6).Consensus network and gene flow analyses also suggested a reticulate evolutionary history of macaques (Figures 3–6).Phylogenetic networks provide a better model than trees for capturing reticulate evolutionary history (Nakhleh,2013).Macaque evolution appears to be a process of gradual divergence with complex gene flow.Therefore,the divergence of macaques is best understood as a phylogenetic network.This may be a reason why the evolution of macaques has been reconstructed differently in published studies (Fan et al.,2017;Jiang et al.,2016;Li et al.,2009;Li & Zhang,2005).
Reticulate evolutionary relationships have also been found in other mammals (árnason et al.,2018;Figueiro et al.,2017;Kumar et al.,2017),as well as in birds (Zhang et al.,2017),fish (Gante et al.,2016),amphibians (Pabijan et al.,2017),reptiles (Barley et al.,2019),and insects (Edelman et al.,2019). Therefore,speciation with genetic exchange is common in animals,and should be viewed as a selective process that maintains species divergence even under gene flow (Kumar et al.,2017;Lexer & Widmer,2008;Nosil,2008).
Although we used a variety of software that consider ILS in gene flow analysis,ILS is an important factor that can lead to incongruences between gene trees or between gene and species trees and may not be completely ruled out.If several speciation events have occurred in one lineage within a relatively short time,different alleles may eventually fix in different descendant lineages,leading to incongruences as mentioned above.ILS is an important factor in evolutionary history but was unexamined in this study.Future studies including ILS analysis are necessary to improve our understanding of the evolutionary history ofMacaca.
Various macaque hybridization studies based on different molecular markers (e.g.,autosome,mtDNA,Y chromosome,and genome-wide) and morphological characteristics have been published (Evans et al.,2017;Fan et al.,2014,2018;Hamada et al.,2016;Ito et al.,2020;Jiang et al.,2016;Matsudaira et al.,2018;Osada et al.,2021;Rovie-Ryan et al.,2021;Tosi et al.,2000,2003a,2003b;Vanderpool et al.,2020;Yan et al.,2011),which suggest complicated gene flow patterns in macaques.However,most previous studies have focused on intragroup patterns or on thefascicularisandsinicaspecies groups,e.g.,betweenM.mulattaandM.fascicularis(Hamada et al.,2016),and proto-M.fascicularis aureaandsinicaspecies group (Matsudaira et al.,2018).However,there is limited information about gene flow between thesilenusgroup and other species within the genusMacaca.
Here,we analyzed moreMacacaspecies,including three species from thesilenusgroup,and found strong introgression signals in all species pairs between thefascicularisandsilenusgroups.In particular,Danalysis confirmed the strongest introgression signals betweenM.nemestrinaandM.fascicularis(Figure 4B).These two species have a wide geographical distribution overlap (Ang et al.,2020;Eudey et al.,2020;Roos et al.,2014),and natural hybridization events have been reported by field observation (Bernsteil,1966).A recent study on the reference genomes ofM.nemestrina,M.fascicularis,andM.mulattaalso indicated strong gene flow signals among them (Vanderpool et al.,2020),consistent with our results.However,the previous study included only three macaque species,whereas our genome-wide analyses were based on multiple species and population samples fromsilenusandfascicularisspecies groups,which indicated thatM. nemestrina-M. fascicularisshowed the strongest introgression signal among all species pairs (Figure 4B).The signal betweenM.nemestrinaandM.fascicularisis most likely caused by ancient hybridization events between the two species after their dichotomous divergence.We further estimated that the proportion of genome inherited via gene flow between the two species was 6.19% (fdstatistic;Figure 5A).
We also noted that species pairs between thefascicularisandsilenusspecies groups,namelyM. fascicularis-M.nigra/M.tonkeanaandM.fuscata/M.mulatta-M.nemestrina,showed slightly weaker introgression signals thanM.nemestrina-M.fascicularis.Other species pairs from these two groups,namelyM.nigra-M.fuscata/M.mulattaandM.tonkeana-M. fuscata/M. mulatta,showed the weakest introgression signals (Figure 4B).These findings indicate weaker introgression signals between species pairs that have more distant phylogenetic relationships withM.nemestrinaorM.fascicularis.Furthermore,considering the non-overlapping distribution of species in other species pairs,excludingM.nemestrina-M.fascicularis,we suspect that introgression signals between thefascicularisandsilenusspecies groups may be largely due to the similarity of genomes of closely related species,especially for insular species such asM.tonkeana,M.nigra,andM.fuscata,for instance,introgression signals between the two Sulawesi macaques (M.tonkeanaandM.nigra) and thefascicularisspecies group.The two Sulawesi macaques are restricted to Sulawesi island and have no current distribution overlap with other macaques.However,previous studies have suggested that BorneanM.nemestrinaexpanded eastwards and reached Sulawesi,most likely by natural rafting,and diverged into numerous distinct Sulawesi species about 1.9–2.0 Ma (Meijaard,2003;Ziegler et al.,2007). Sulawesi macaques thus share many genomic similarities with BorneanM.nemestrina.Owing to the strong introgression betweenM.nemestrinaandM.fascicularis,the introgression signals detected between thefascicularisgroup and two Sulawesi macaques are most likely attributed to the genomic similarity of the closely related species.Similar to the Sulawesi macaques,introgression signals betweenM.fuscataand thesilenusspecies group are probably due to the genomic similarity ofM.fuscatato the ChineseM.mulatta,which both belong to thefascicularisspecies group (Chu et al.,2007).In addition,approximately 60.14%–67.93% of the top 5% of windows for theM.fascicularis-silenusgroup species pairs,and 45.84%–47.33% for theM.nemestrinafascicularisgroup species pairs were shared (Table 3).These high ratios of shared windows confirmed that the introgression signals detected between Sulawesi macaques and thefascicularisgroup species,and betweenM.fuscataand thesilenusgroup species are most likely due to genomic similarity,rather than hybridization after all dichotomous divergence events.
However,genomic similarity does not seem to be the cause of the introgression signal between MauritianM.fascicularis(Mfas_Mau) and BorneanM.nemestrina(Mnem_2),which showed a bidirectional introgression signal,as detected byDFOIL(Figure 4C).Mauritius is over 6 600 km from Borneo and naturally occurring introgression between these species seems impossible,implying that the introgression signal between them is most likely from an ancient hybridization event.Macaca fasciculariswas introduced to Mauritius by the Portuguese in the 16th century (Tosi & Coke,2007),with possible origin from Sumatra,Java,or Peninsular Malaysia(Tosi & Coke,2007).Therefore,the gene flow signal between MauritianM.fascicularisand BorneanM.nemestrinamirrors the gene flow between BorneanM.nemestrinaandM.fascicularisfrom Sumatra,Java,or Peninsular Malaysia.Sumatra,Java,Peninsular Malaysia,and Borneo were connected during most of the Pleistocene due to lower sea levels (Louys & Turner,2012).The connection of these islands provided avenues for hybridization.As the bidirectional signal was only detected in one of the twoM.nemestrina,there must have been a hybridization event after divergence of the two populations,i.e.,after 0.33 Ma according to our divergence time estimation (Figure 2B).
It should be noted that there may be another explanation for the strong hybridization signals between thefascicularisandsilenusspecies groups.For example,introgression may have occurred between ancestral lineages of these two species groups,and these signals were eventually retained in different descendant lineages.This explanation is consistent with the PhyloNetworks findings (Figure 6).The observed hybridization pattern may result in a similar distribution of hybridization signals across genomes among species pairs,consistent with Figure 7.It is also possible that ancestral hybridization and hybridization after divergence both occurred,which would make it difficult to determine all hybridization events correctly(Vanderpool et al.,2020).Studies with more species,larger sample sizes,and modeling analyses will allow detection of a more complex evolutionary history and clarification of any separate hybridization events.
Our study also detected introgression signals congruous with previous studies.Hybridization betweenM.fascicularisandM.mulattahas been extensively reported and there is a postulated hybrid zone (Hamada et al.,2016;Ito et al.,2020;Yan et al.,2011).Here,Danalysis also detected a significant signal between these species.Similar to previously reported gene flow betweenM.thibetanaand ChineseM.mulatta(Fan et al.,2014),our results also suggested a significant gene flow signal between these two lineages.Additionally,we detected a significant introgression signal betweenM.thibetanaand IndianM.mulatta.Given the non-overlapping geographical distributions of these two lineages,the introgression signal is probably a result of genomic similarity between the Chinese and IndianM.mulattaor ancestral introgression.
Previous research has suggested that ~30% of the VietnameseM.fascicularisgenome is of ChineseM.mulattaorigin,and~8.84% of theM.thibetanagenome contains putative introgression regions when compared to ChineseM.mulatta(Fan et al.,2014;Yan et al.,2011).According to our estimations,the proportion of introgression reached 5.93% forM.fascicularisand ChineseM.mulatta,and 3.53% forM.thibetanaandM.mulatta(Figure 5A).Asfdis a conservative estimator,the proportion of genome shared through gene flow may be underestimated (Martin et al.,2015).Although the amount of introgression varied between thefdstatistic and other estimators,we still obtained the approximate proportions of introgression (Martin et al.,2015),which allow us to compare the degree of hybridization between different species.
In our study,individuals from five species originated from different populations,which allows for the assessment of introgression after the divergence of different populations.Surprisingly,introgression signals varied in the different populations of all five species (Figure 4A,C). This phenomenon has also been reported inM.mulatta,M.fascicularis,andM.arctoides(Fan et al.,2018;Osada et al.,2021;Rovie-Ryan et al.,2021).These results suggest that gene flow occurred after the split of populations,and thus may be very recent.Although morphological characteristics,genital structure,and sexual behavior differ significantly among macaque species,the chromosome karyotypes of macaques are very similar,and hybrid offspring ofMacacaare fertile(Bunlungsup et al.,2017;Ciani et al.,1989;Evans et al.,2017;Hamada et al.,2012;Yang & Shi,1994).The results of our research and previous studies suggest that gene flow among macaques is an ongoing process,and there is incomplete reproductive isolation.Therefore,hybridization occurs between different sympatric macaques.
The heterozygosity ofM.thibetanawas the lowest among macaques,with a similar level as the endangered snub-nosed monkeys (0.015%–0.068%) (Zhou et al.,2016).Previous study also reported a lower genome-wide heterozygosity ofM.thibetana(0.089 8%) than ofM.fascicularis(0.300 4%–0.317 9%) andM.mulatta(0.261 2%–0.261 7%) (Fan et al.,2018).The heterozygosity of the other macaques is higher than that of humans (0.08%–0.12%) and great apes (0.065%–0.178%)(Prado-Martinez et al.,2013).The high heterozygosity of macaques is consistent with their widespread distribution(Roos et al.,2019).
We reconstructed the demographic histories of nine species ofMacacausing PSMC.Fluctuation in demographic history is usually considered to be from adaptations to changes in the environment and climate. The varied and complicated fluctuation patterns in the different macaque species suggested that they experienced different environmental variation and climate change.According to our study,all macaques went through a bottleneck~5 Ma.This bottleneck may indicate the ancestral population of all Asian macaques when they first arrived in Eurasia (Roos et al.,2019).The bottleneck may be due to the long-term cooling,drying,and even formation of the ephemeral Northern Hemisphere glaciations between 6.0 and 5.5 Ma during the late Miocene(Holbourn et al.,2018).Subsequent increases in population sizes may have been in response to the warm and wet early Pliocene climates (Sniderman et al.,2016). A similar bottleneck has also been identified in other studies (Fan et al.,2014,2018;Liu et al.,2018).After the bottleneck,macaques underwent differentiation and radiation. Inconsistent demographic historical dynamics were produced when adapting to different environments and climate.Overall,these findings strongly indicate that the population sizes of macaques varied differently and considerably during the Pliocene and Pleistocene,with successive fluctuations likely resulting in different genetic diversity between these lineages.
Our genome-wide analyses covered all species groups of Asian macaques.We found that complex introgressive gene flow played an important role in the evolutionary history ofMacaca,and the evolution of macaques should be more accurately understood as phylogenetic networks rather than trees.Introgression signals between thefascicularisandsilenusspecies groups were the strongest of all signals detected.In particular,bidirectional gene flow occurred between MauritianM.fascicularisand BorneanM.nemestrina.However,introgression signals between the island species(e.g.,Sulawesi macaques andM.fuscata) and other species may be attributed to genomic similarities between closely related species or ancestral introgression. Different introgression signals in different populations of the same species suggest that gene flow in macaques is likely to be an ongoing process. In conclusion,our study provides comprehensive insight into gene flow in macaques from a genome-wide perspective.However,the contribution of highlevel introgression to adaptive history in macaque species remains unclear and is worthy of further study with more extensive population sampling.
Supplementary data to this article can be found online.
The authors declare that they have no competing interests.
J.L.and Z.X.F.designed the research.J.L.and Y.S.collected the samples.Y.S.and C.J.performed the experiments.Y.S.,J.L.,K.H.L.,and H.Q.contributed to data analysis.Y.S.,Z.X.F.,and J.L.wrote the manuscript.M.P.revised the manuscript and improved the language.All authors read and approved the final version of the manuscript.
We thank Song Wang at Nanning Zoo and Zhan-Long He at the Chinese Academy of Medical Sciences for their support in sample collection.