Developed by the Centre for Genomic Pathogen Surveillance. 6, 8391 (2015). Mol. Background & objectives: Several phylogenetic classification systems have been devised to trace the viral lineages of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). By mid-January 2020, the virus was spreading widely within Hubei province and by early March SARS-CoV-2 was declared a pandemic8. PubMed Are pangolins the intermediate host of the 2019 novel coronavirus (SARS-CoV-2)? Trends Microbiol. The authors declare no competing interests. Phylogenies of subregions of NRR1 depict an appreciable degree of spatial structuring of the bat sarbecovirus population across different regions (Fig. Accurate estimation of ages for deeper nodes would require adequate accommodation of time-dependent rate variation. We thank all authors who have kindly deposited and shared genome data on GISAID. Split diversity in constrained conservation prioritization using integer linear programming. 874850). It compares the new genome against the large, diverse population of sequenced strains using a Since experts have suggested that pangolins may be the reservoir species for COVID-19, the scaly anteater has been catapulted into headlines, news reports, and conversationsand some are calling COVID-19 "the revenge of the . To gauge the length of time this lineage has circulated in bats, we estimate the time to the most recent common ancestor (TMRCA) of SARS-CoV-2 and RaTG13. Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, PA, USA, Department of Microbiology, Immunology and Transplantation, KU Leuven, Rega Institute, Leuven, Belgium, Department of Biological Sciences, Xian Jiaotong-Liverpool University, Suzhou, China, State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Hong Kong SAR, China, Department of Biology, University of Texas Arlington, Arlington, TX, USA, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK, MRC-University of Glasgow Centre for Virus Research, Glasgow, UK, You can also search for this author in Furthermore, the other key feature thought to be instrumental in the ability of SARS-CoV-2 to infect humansa polybasic cleavage site insertion in the Sproteinhas not yet been seen in another close bat relative of the SARS-CoV-2 virus. Li, X. et al. To employ phylogenetic dating methods, recombinant regions of a 68-genome sarbecovirus alignment were removed with three independent methods. Patino-Galindo, J. 190, 20882095 (2004). Evol. matics program called Pangolin was developed. performed recombination analysis for non-recombining alignment3, calibration of rate of evolution and phylogenetic reconstruction and dating. A distinct name is needed for the new coronavirus. Posterior means (horizontal bars) of patristic distances between SARS-CoV-2 and its closest bat and pangolin sequences, for the spike proteins variable loop region and CTD region excluding the variable loop. B.W.P. We aimed to analyze 3 naso-oropharyngeal swab samples collected between August and December 2021 to describe the amino acid changes present in the sequence reads that may have a role in the emergence of new . Means and 95% HPD intervals are 0.080 [0.0580.101] and 0.530 [0.3040.780] for the patristic distances between SARS-CoV-2 and RaTG13 (green) and 0.143 [0.1090.180] and 0.154 [0.0930.231] for the patristic distances between SARS-CoV-2 and Pangolin 2019 (orange). The virus then. Lin, X. et al. Methods Ecol. Preprint at https://doi.org/10.1101/2020.04.20.052019 (2020). Viruses 11, 979 (2019). These shy, quirky but cute mammals are one of the most heavily trafficked yet least understood animals in the world. The estimated divergence times for the pangolin virus most closely related to the SARS-CoV-2/RaTG13 lineage range from 1851 (17301958) to 1877 (17461986), indicating that these pangolin lineages were acquired from bat viruses divergent to those that gave rise to SARS-CoV-2. Regions AC were further examined for mosaic signals by 3SEQ, and all showed signs of mosaicism. master 4 branches 94 tags Code AngieHinrichs Add entries for pangolin-data/-assignment 1.18.1.1 ( #512) ad16752 4 days ago 990 commits .github/ workflows Update pangolin.yml 7 months ago docs docs need guide tree now 3 years ago pangolin If the latter still identified non-negligible recombination signal, we removed additional genomes that were identified as major contributors to the remaining signal. =0.00075 and one with a mean of 0.00024 and s.d. Virus Evol. It performs: K-mer based detection Map/align, variant calling Consensus sequence generation Lineage/clade analysis using Pangolin and NextClade Access the DRAGEN COVID Lineage App on BaseSpace Sequence Hub This is notable because the variable-loop region contains the six key contact residues in the RBD that give SARS-CoV-2 its ACE2-binding specificity27,37. 5, 536544 (2020). A second breakpoint-conservative approach was conservative with respect to breakpoint identification, but this means that it is accepting of false-negative outcomes in breakpoint inference, resulting in less certainty that a putative NRR truly contains no breakpoints. Lemey, P., Minin, V. N., Bielejec, F., Pond, S. L. K. & Suchard, M. A. Biol. Two exceptions can be seen in the relatively close relationship of Hong Kong viruses to those from Zhejiang Province (with two of the latter, CoVZC45 and CoVZXC21, identified as recombinants) and a recombinant virus from Sichuan for which part of the genome (regionB of SC2018 in Fig. The divergence time estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent among the three approaches we use to eliminate the effects of recombination in the alignment. 32, 268274 (2014). from the European Research Council under the European Unions Horizon 2020 research and innovation programme (grant agreement no. 21, 15081514 (2015). Now, the two researchers used genomic sequencing to compare the DNA of the new coronavirus in humans with that in animals and found a 99% match with pangolins. Adv. D.L.R. Maciej F. Boni, Philippe Lemey, Andrew Rambaut or David L. Robertson. All authors contributed to analyses and interpretations. Pangolin relies on a novel algorithm called pangoLEARN. We demonstrate that the sarbecoviruses circulating in horseshoe bats have complex recombination histories as reported by others15,20,21,22,23,24,25,26. Abstract. The variable-loop region in SARS-CoV-2 shows closer identity to the 2019 pangolin coronavirus sequence than to the RaTG13 bat virus, supported by phylogenetic inference (Fig. Lam, T. T. et al. Graham, R. L. & Baric, R. S. Recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission. The rate of genome generation is unprecedented, yet there is currently no coherent nor accepted scheme for naming the expanding . Duchene, S., Holmes, E. C. & Ho, S. Y. W. Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. ISSN 2058-5276 (online). In addition, sequences NC_014470 (Bulgaria 2008), CoVZXC21, CoVZC45 and DQ412042 (Hubei-Yichang) needed to be removed to maintain a clean non-recombinant signal in A. Using these breakpoints, the longest putative non-recombining segment (nt1,88521,753) is 9.9kb long, and we call this region NRR2. Slider with three articles shown per slide. 90, 71847195 (2016). Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic, https://doi.org/10.1038/s41564-020-0771-4. Xiao, K. et al. RegionC showed no PI signals within it. obtained the genome sequences of 10 SARS-CoV-2 virus strains through nanopore sequencing of nasopharyngeal swabs in Malta and analyzed the assembled genome with pangolin software, and the results showed that these virus strains were assigned to B.1 lineage, indicating that SARS-CoV-2 was widely spread in Europe (Biazzo et al., 2021). Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. 5). And this genotype pattern led to creating a new Pangolin lineage named B.1.640.2, a phylogenetic sister group to the old B.1.640 lineage renamed B.1.640.1. Alexandre Hassanin, Vuong Tan Tu, Gabor Csorba, Nicola F. Mller, Kathryn E. Kistler & Trevor Bedford, Jack M. Crook, Ivana Murphy, Diana Bell, Simon Pollett, Matthew A. Conte, Irina Maljkovic Berry, Yatish Turakhia, Bryan Thornlow, Russell Corbett-Detig, Nature Microbiology Uncertainty measures are shown in Extended Data Fig. As illustrated by the dashed arrows, these two posteriors motivate our specification of prior distributions with standard deviations inflated 10-fold (light color). Press, H.) 3964 (Springer, 2009). By 2009, however, rapid genomic analysis had become a routine component of outbreak response. 21, 255265 (2004). CAS Boxplots show interquartile ranges, white lines are medians and box whiskers show the full range of posterior distribution. 11,12,13,22,28)a signal that suggests recombinationthe divergence patterns in the Sprotein do not show evidence of recombination between the lineage leading to SARS-CoV-2 and known sarbecoviruses. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. Phylogenetic supertree reveals detailed evolution of SARS-CoV-2, Origin and cross-species transmission of bat coronaviruses in China, Emerging SARS-CoV-2 variants follow a historical pattern recorded in outgroups infecting non-human hosts, Inferring the ecological niche of bat viruses closely related to SARS-CoV-2 using phylogeographic analyses of Rhinolophus species, Genomic recombination events may reveal the evolution of coronavirus and the origin of SARS-CoV-2, A Bayesian approach to infer recombination patterns in coronaviruses, Metagenomic identification of a new sarbecovirus from horseshoe bats in Europe, A comparative recombination analysis of human coronaviruses and implications for the SARS-CoV-2 pandemic, Pandemic-scale phylogenomics reveals the SARS-CoV-2 recombination landscape, https://github.com/plemey/SARSCoV2origins, https://doi.org/10.1101/2020.04.20.052019, https://doi.org/10.1101/2020.02.10.942748, https://doi.org/10.1101/2020.05.28.122366, http://virological.org/t/ncov-2019-codon-usage-and-reservoir-not-snakes-v2/339, http://virological.org/t/ncovs-relationship-to-bat-coronaviruses-recombination-signals-no-snakes-no-evidence-the-2019-ncov-lineage-is-recombinant/331. 36, 7597 (2002). Because these subclades had different phylogenetic relationships in regionD (Supplementary Fig. & Andersen, K. G. The evolution of Ebola virus: insights from the 20132016 epidemic. TMRCA estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent for the different data sets and different rate priors in our analyses. Katoh, K., Asimenos, G. & Toh, H. in Bioinformatics for DNA Sequence Analysis (ed. Share . Indeed, the rates reported by these studies are in line with the short-term SARS rates that we estimate (Fig. M.F.B. 1c). Our most conservative approach attempted to ensure that putative NRRs had no mosaic or phylogenetic incongruence signals. Visual exploration using TempEst39 indicates that there is no evidence for temporal signal in these datasets (Extended Data Fig. These rate priors are subsequently used in the Bayesian inference of posterior rates for NRR1, NRR2, and NRA3 as indicated by the solid arrows. DRAGEN COVID Lineage App This app aligns reads to a SARS-CoV-2 reference genome and reports coverage of targeted regions. Biol. b, Similarity plot between SARS-CoV-2 and several selected sequences including RaTG13 (black), SARS-CoV (pink) and two pangolin sequences (orange). Zhou et al.2 concluded from the genetic proximity of SARS-CoV-2 to RaTG13 that a bat origin for the current COVID-19 outbreak is probable. Novel Coronavirus (2019-nCoV) Situation Report 1, 21 January 2020 (World Health Organization, 2020). A deep dive into the genetics of the novel coronavirus shows it seems to have spent some time infecting both bats and pangolins before it jumped into humans, researchers said . Sign up for the Nature Briefing newsletter what matters in science, free to your inbox daily. While it is possible that pangolins, or another hitherto undiscovered species, may have acted as an intermediate host facilitating transmission to humans, current evidence is consistent with the virus having evolved in bats resulting in bat sarbecoviruses that can replicate in the upper respiratory tract of both humans and pangolins25,32. EPI_ISL_410538, EPI_ISL_410539, EPI_ISL_410540, EPI_ISL_410541 and EPI_ISL_410542) for the use of sequence data via the GISAID platform. RegionB is 5,525nt long. Schierup, M. H. & Hein, J. Recombination and the molecular clock. Nat Microbiol 5, 14081417 (2020). The key to successful surveillance is knowing which viruses to look for and prioritizing those that can readily infect humans47. Lu, R. et al. Lam, H. M., Ratmann, O. 4), that region and shorter BFRs were not included in combined putative non-recombinant regions. Among the 68sequences in the aligned sarbecovirus sequence set, 67 show evidence of mosaicism (all DunnSidak-corrected P<4104 and 3SEQ14), indicating involvement in homologous recombination either directly with identifiable parentals or in their deeper shared evolutionary historythat is, due to shared ancestral recombination events. Emergence of SARS-CoV-2 through recombination and strong purifying selection. Its genome is closest to that of severe acute respiratory syndrome-related coronaviruses from horseshoe bats, and its receptor-binding domain is closest to that of pangolin viruses. 13, e1006698 (2017). We used an uncorrelated relaxed clock model with log-normal distribution for all datasets, except for the low-diversity SARS data for which we specified a strict molecular clock model. Smuggled pangolins were carrying viruses closely related to the one sweeping the world, say scientists. The existing diversity and dynamic process of recombination amongst lineages in the bat reservoir demonstrate how difficult it will be to identify viruses with potential to cause major human outbreaks before they emerge. Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. At present, we analyzed the diversity of SARS-CoV-2 viral genomes in India to know the evolutionary patterns of viruses in the country through their pangolin lineage and GISAID-Clade. This new approach classifies the newly sequenced genome against all the diverse lineages present instead of a representative select sequences. Individual sequences such as RpShaanxi2011, Guangxi GX2013 and two sequences from Zhejiang Province (CoVZXC21/CoVZC45), as previously shown22,25, have strong phylogenetic recombination signals because they fall on different evolutionary lineages (with bootstrap support >80%) depending on what region of the genome is being examined. Posada, D., Crandall, K. A. When viewing the last 7kb of the genome, a clade of viruses from northern China appears to cluster with sequences from southern Chinese provinces but, when inspecting trees from different parts of ORF1ab, the N. China clade is phylogenetically separated from the S. China clade. SARS-CoV-2 is an appropriate name for the new coronavirus. J. Infect. Collectively our analyses point to bats being the primary reservoir for the SARS-CoV-2 lineage. Evol. 5). Next, we (1) collected all breakpoints into a single set, (2) complemented this set to generate a set of non-breakpoints, (3) grouped non-breakpoints into contiguous BFRs and (4) sorted these regions by length. 95% credible interval bars are shown for all internal node ages. EPI_ISL_410721) and Beijing Institute of Microbiology and Epidemiology (W.-C. Cao, T.T.-Y.L., N. Jia, Y.-W. Zhang, J.-F. Jiang and B.-G. Jiang, nos. These means are based on the mean rates estimated for MERS-CoV and HCoV-OC43, respectively, while the standard deviations are set ten times higher than empirical values to allow greater prior uncertainty and avoid strong bias (Extended Data Fig. Given what was known about the origins of SARS, as well as identification of SARS-like viruses circulating in bats that had binding sites adapted to human receptors29,30,31, appropriate measures should have been in place for immediate control of outbreaks of novel coronaviruses. Mol. Biol. Proc. Using the most conservative approach (NRR1), the divergence time estimate for SARS-CoV-2 and RaTG13 is 1969 (95% HPD: 19302000), while that between SARS-CoV and its most closely related bat sequence is 1962 (95% HPD: 19321988); see Fig. Evol. J. Virol. Wang, L. et al. July 26, 2021. 1. Aside from RaTG13, Pangolin-CoV is the most closely related CoV to SARS-CoV-2. Several of the recombinant sequences in these trees show that recombination events do occur across geographically divergent clades. Extended Data Fig. "This is an extremely interesting . We considered (1) the possibility that BFRs could be combined into larger non-recombinant regions and (2) the possibility of further recombination within each BFR. Sequences are colour-coded by province according to the map. Nature 538, 193200 (2016). This leaves the insertion of polybasic. & Li, X. Crossspecies transmission of the newly identified coronavirus 2019nCoV. Evol. However, formal testing using marginal likelihood estimation41 does provide some evidence of a temporal signal, albeit with limited log Bayes factor support of 3 (NRR1), 10 (NRR2) and 3 (NRA3); see Supplementary Table 1. Extended Data Fig. the development of viral diversity. Press, 2009). J. Virol. The new paper finds that the genetic sequences of several strains of coronavirus found in pangolins were between 88.5 percent and 92.4 percent similar to those of the novel coronavirus. 2). 2 Lack of root-to-tip temporal signal in SARS-CoV-2. Menachery, V. D. et al. Natl Acad. performed Srecombination analysis. 94, e0012720 (2020). Maclean, O. Subsequently a bat sarbecovirusRaTG13, sampled from a Rhinolophus affinis horseshoe bat in 2013 in Yunnan Provincewas reported that clusters with SARS-CoV-2 in almost all genomic regions with approximately 96% genome sequence identity2. Of the nine breakpoints defining these ten BFRs, four showed phylogenetic incongruence (PI) signals with bootstrap support >80%, adopting previously published criteria on using a combination of mosaic and PI signals to show evidence of past recombination events19. The relatively fast evolutionary rate means that it is most appropriate to estimate shallow nodes in the sarbecovirus evolutionary history. Unfortunately, a response that would achieve containment was not possible. Decimal years are shown on the x axis for the 1.2 years of SARS sampling in c. d, Mean evolutionary rate estimates plotted against sampling time range for the same three datasets (represented by the same colour as the data points in their respective RtT divergence plots), as well as for the comparable NRA3 using the two different priors for the rate in the Bayesian inference (red points). and P.L.) 5. A third approach attempted to minimize the number of regions removed while also minimizing signals of mosaicism and homoplasy. One study suggests that over a century ago, one lineage of coronavirus circulating in bats gave rise to SARS-CoV-2, RaTG13 and a Pangolin coronavirus known as Pangolin-2019, Live Science . 84, 31343146 (2010). Zhang, Y.-Z. If stopping an outbreak in its early stages is not possibleas was the case for the COVID-19 epidemic in Hubeiidentification of origins and point sources is nevertheless important for containment purposes in other provinces and prevention of future outbreaks. Consistent with this, we estimate a concomitantly decreasing non-synonymous-to-synonymous substitution rate ratio over longer evolutionary timescales: 1.41 (1.20,1.68), 0.35 (0.30,0.41) and 0.133 (0.129,0.136) for SARS, MERS-CoV and HCoV-OC43, respectively. 24, 490502 (2016). a, Breakpoints identified by 3SEQ illustrated by percentage of sequences (out of 68) that support a particular breakpoint position.