FrontLine

How a virus evolves

- BY R. RAMACHANDR­AN

The mutations of SARS-COV-2, according to a new study, have led to the emergence of a dominant virus type, Type A2a, distinctly different from the original virus, Type O, that emerged from Wuhan, and spreading with

much higher frequencie­s than the original version.

VIRUSES READILY MUTATE. THERE IS NOTHING surprising about this because it is their nature to do so. This happens due to the imperfect copying mechanism at work as viruses replicate in the cells of infected hosts.

The complete set of genetic informatio­n needed to sustain an organism, such as the virus, is its genome, which, in the case of viruses, can be made up either of the DNA or the RNA molecule. The DNA and the RNA can be thought of as a string of (genetic) letters, and a genome can be imagined to be long stretches of these letters with different parts of it encoding for different proteins required for the organism’s existence. Mutations are just random errors that occur during the process of copying these letters during viral multiplica­tion and such errors accumulate during every replicatin­g cycle, which can occur within hours or even less. RNA viruses mutate faster than DNA viruses because their replicatio­n mechanism is intrinsica­lly more error-prone. Likewise singlestra­nded viruses mutate more often than double-stranded ones.

Viruses cannot exist in isolation; they need a host to replicate and survive. Mutations generate a diversity of virus population in a single infected host. This amazing ability of viruses to mutate is what drives their evolutiona­ry change. Most mutations may be inconseque­ntial. But mutations that adversely affect some virus function or the other, which impede its sustenance, will get removed by natural selection. If during an outbreak, a mutated virus with a greater (or lesser) degree of infectivit­y or virulence were to appear in a population, it does not immediatel­y follow that the mutation will sustain and continue to spread with high frequency, unless it gives the virus a selective advantage as instances during the current COVID-19 pandemic that we consider below illustrate.

The causative virus of COVID-19, the coronaviru­s SARS-COV-2 virus, is an RNA virus (with about 30,000 nucleotide­s—the basic building block of DNA/RNA— coding for 29 proteins) and is also single-stranded. So, frequent mutations in the virus are only to be expected and naturally, therefore, researcher­s have observed many mutations in the SARS-COV-2 genomes from samples of

COVID-19 patients in different parts of the world since the outbreak began in Wuhan in central China in December 2019. Most of these mutations have been substituti­ons of a single nucleotide, known as single nucleotide variants (SNVS), at different genetic sites in the genome. At the viral protein level, these SNVS translate into replacemen­ts of single amino acids in different proteins.

Most genome-based analyses of the dynamics of evolution so far were largely focussed on the early phase of the pandemic, up to early March at best. A Chinese study with 103 genomes that were available in a public database in January found that SARS-COV-2 had evolved into two major types. A more recent study based on 160 genomes that were available until March 3, which was published on April 8, identified three major types. Given the limited sample sizes in these studies, and also not being over a sufficient­ly longer period, a clearer evolutiona­ry picture did not emerge until now. We know that the geographic­al spread of the virus was extremely rapid in March, which would have greatly increased mutation

probabilit­ies. Nidhan Biswas and Partha Majumder of the National Institute of Biomedical Genomics (NIBG) at Kalyani in West Bengal recently completed a more comprehens­ive and systematic analysis using a much larger public database of genomes, which maps their geographic­al origins, examines the emergence of virus groupings and their mutual relationsh­ips based on the observed mutations in an evolutiona­ry tree (called phylogenet­ic tree) and the frequencie­s—both spatial and temporal—of their spread. This work is due to appear shortly in The Indian Journal of Medical Research.

The two researcher­s have found that mutations of the virus have led to the emergence of a type that is distinctly and significan­tly different from the original virus that emerged from Wuhan and, by March end, this mutated version had already substantia­lly replaced the ancestral version in virtually all geographic­al regions of the world. It has now begun to spread with much higher frequencie­s than the original version and the other mutated types that emerged during the course of the pandemic, and seems to be establishi­ng itself as the major virus type being transmitte­d in most countries as infections continue to grow across the world.

Biswas and Majumder analysed 3,636 full genome sequences of SARS-COV-2 obtained from virus isolates from patients from 55 countries available from the public database www.gisaid.org covering the period from December 2019 to March end. According to the authors, the entire set of mutations observed so far can be classified into 11 virus types, each of which can be characteri­sed by one or a few defining mutations. Of these, Type A2a is emerging as the dominant virus type almost everywhere, sweeping away by selection the original Type O isolated from Wuhan that held sway in the early phase of the pandemic (Fig. 1). This also implies that the other 10 types are derived from Type O.

Fig. 2 suggests that Type A2a began to emerge around the ninth or tenth week since the outbreak started and currently accounts for over half of all genomic sequences across the world. The unique mutation that is seen in Type A2a is obviously endowing the virus with a selective functional advantage over the other types, Type O in particular. The authors have argued that the increasing frequency of this evolved type in different parts of the world is an indication of positive selection pressure at work enabling the virus to establish itself in the human population across the world.

Before we discuss what this selective advantage is, and its enabling mutation, it is instructiv­e to look at what is currently known about the evolutiona­ry history of the SARS-COV-2 virus itself and also talk about its early evolution revealed by data from the earlier phases of the pandemic as reported in scientific literature and the media.

BINDING TO ACE2 RECEPTOR

Structural and biochemica­l analyses have now clearly establishe­d that the SARS-COV-2 virus is able to infect humans by gaining entry into human cells by its binding to the receptor ACE2, which is expressed in many types of

human cells. The part of the virus that enables it with this critical function is the Spike (S) protein—the protrusion­s on the virus envelope that give the virus the prefix “corona”. The S protein has two sub-regions S1 and S2. While S1 contains the receptor binding domain (RBD) and enables the virus to attach itself to the target human cell, S2 is responsibl­e for the later stage action—that of fusion of the viral membrane with the human cell and release of viral RNA into the cell, which, in turn, forces the cell machinery to make copies of the virus and disseminat­e.

For this to happen efficientl­y, the two conjoined subregions need to be split at the S1/S2 boundary for S2 to initiate fusion and efficient viral replicatio­n within the cell after S1 facilitate­s virus-cell binding. The emergence of an appropriat­e cleavage site at the S1/S2 boundary through evolution allows this new virus to exploit human cell enzymes such as furin and TMPRSS2 to perform this cleavage. This results in rapid proliferat­ion and spread of the virus to different organs, particular­ly the lungs, causing the defining atypical pneumonia in COVID-19 positive individual­s.

In a March 17 publicatio­n in Nature Medicine, a team of scientists from the U.S., the United Kingdom and Australia, led by Kristian Andersen of Scripps Research Institute, presented a reasonably convincing argument about the origin of the virus and its early evolution from the then available genome sequence data. According to their analysis, while SARS-COV-2 has high affinity to the ACE2 receptor in humans, ferrets, cats and other species, a comparison of its RBD with SARS-COV-1 (and other related beta coronaviru­ses) shows that in SARS-COV-2, of the six amino acids in the RBD that are known to be critical for binding to ACE2, five had got mutated or changed to other amino acids. As a result, they said, though its affinity to ACE2 is high, it is not predicted to be ideal and optimal.

On the basis of this, they argued that the evolution of the critical S1/S2 cleavage site, which enables enhanced binding to the cell and virus-cell fusion, is a result of mutation and natural selection. This may have occurred either in humans through multiple chains of silent human-to-human transmissi­on sometime before it was poised for the outbreak in December 2019 or in some intermedia­ry animal host (having originated in bats) with human-like ACE2 receptor before making the jump to humans.

This cleavage site is unique to SARS-COV-2 and is not present in the other beta coronaviru­ses of the same lineage, including SARS-COV-1 (which caused the major SARS outbreak in 2002-03), and this, it was felt, could be key to its high infectivit­y and rapid transmissi­on. This, they said, was similar to the emergence of a cleavage site in the hemaggluti­nin (HA) protein of the highly pathogenic strain of avian influenza virus following repeated passage among chickens. The specific features of the RBD and the S1/S2 cleavage site, including amino acid structure at the cleavage site, were shared by all SARSCOV-2 genomes available until then, which pointed to a common ancestor virus, the paper said.

A March 25 report in The Washington Post quoted Peter Thielen of Johns Hopkins University, a molecular geneticist involved in SARS-COV-2 research, as saying: “There are only about 10 genetic difference­s between the strains that have infected people in the United States and the original virus that spread in Wuhan…. That’s a relatively small number of mutations for having passed through a large number of people. At this point, the mutation rate of the virus would suggest that the vaccine developed for SARS-COV-2 would be a single vaccine, rather than a new vaccine every year like the flu vaccine.”

This view that the virus had not mutated to any significan­t extent, and was relatively stable up to that point in time, was reiterated by Stanley Perlman of the University of Iowa and Benjamin Neuman of Texas A&M University in the Post article. “If it’s still around in a year,”

Neuman had said, “by that point we might have some diversity.”

Yong Jia and associates from Taiwan and Australia carried out a phylogenet­ic analysis on 106 genomic data available up to March 24 on the U.S. National Centre for Biotechnol­ogy Informatio­n (NCBI) database of isolates from patients from 12 countries including China (34), the U.S. (54), India (2) and Nepal (1). This paper was posted on the preprint repository biorxiv on April 11. Among its main conclusion­s was the observatio­n that concurred with the view that the mutation rate and genetic diversity of the virus (from the data until then) was indeed low as compared to SARS-COV-1.

“Overall,” the paper said with regard to gene sequences relevant to viral protein synthesis, “the gene sequences from different samples are highly homologous, sharing greater than 99.1% identity…” Specifical­ly, the work also noted that the genes encoding for the Spike (S) protein on the virus envelope were more conserved than other protein encoding genes. This notwithsta­nding, it has been noted by other researcher­s that RBD is the most variable part of the genome and some sites of the S protein may be subjected to positive selection. Jia’s group had observed a total of 12 mutations in the S protein—which were all single amino acid substituti­ons—but only one of them pertained to the RBD of the virus, which is relevant from the perspectiv­e of infectivit­y of the virus.

This mutation, the work found, was responsibl­e for disrupting a hydrogen bond at the interface between RBD and the receptor ACE2 in human cells. They argued that since the bond is important for the exceptiona­l strong binding of the RBD and ACE2, this mutation would lead to a weakened binding of the virus to human cells. Interestin­gly, this mutation was seen in one of the Indian isolates obtained on January 27 from a case in

Kerala whose origin was linked to Wuhan. From this, the authors inferred that mutations of significan­ce were beginning to occur, notwithsta­nding the fact that this observatio­n was based on data of one genome. More significan­tly, they found that all the genomes seemed to group as two clusters, indicating that the virus spread occurred from two sources. They, of course, added the caveat: “However, these results may be based on limited genomic data in the early stage of virus developmen­t. It is critical to study and monitor the mutation dynamics of SARS-COV-2.”

UNIQUE MUTATION

In an earlier article (“Chasing the virus”, Frontline, April 24), we had discussed a work by Indian scientists that had found another unique mutation in one of the two early Indian genomes submitted to the public database, which the authors had conjecture­d could trigger a protective microrna response. This particular mutation in the Indian genome discussed above is different from the one discussed earlier. In fact, these two mutations had also been noticed by the scientists of the National Institute of Virology (NIV), Pune, who had carried out the first two complete genome sequencing from Indian samples both of which could be linked to the Wuhan strain. They had also pointed out that while the mutation with apparent weakened binding was in the S1/RBD region, the mutation that had the potential of eliciting a microrna response was in the S2 region. But, with multiple passages of the virus as infections increased, these mutations—which may have even been single random events—seem to have been discarded by selection as neither mutation figures in any of the 11 main genome types in circulatio­n at present, let alone the dominant one A2a.

Similarly, a recent work by a group of Chinese scientists from Zhejiang University, led by Hangping Yao, that was posted on the medrxiv preprint server on April 14 had found certain mutations with higher virulence and pathogenic­ity in the early phase of the outbreak itself, but most of these too do not seem to have occurred with greater frequency in the subsequent spread of the disease.

The scientists had examined virus isolates taken between January 22 and February 4 from 11 patients admitted into the hospitals affiliated to the university, whose ages ranged from four months to 71 years. They noted that while data publicly available up to March 24 revealed several mutations, none had been directly linked to changes in viral pathogenic­ity. With that objective, they carried out functional characteri­sation of the 11 patient-derived isolates. They noticed considerab­le mutational diversity in general and in all recorded 33 mutations (of which 19 were novel when compared with publicly available 1111 genomic sequence data) including six mutations in the S protein.

Importantl­y, they found significan­t variation in the viral loads and cytopathic effects (structural changes to the cells) among these isolates when Vero cells (cell lines

derived from African green monkeys), in which the structure of ACE2 receptor is believed to be similar to that in humans, were infected with the virus. The viral load difference between two extreme isolates was as high as 270-fold. In the next highest viral load, the difference was only 19-fold. This was claimed as direct evidence of SARS-COV-2 having acquired mutations that altered its pathogenic­ity substantia­lly.

Their other important finding was that, when data from these 11 isolates were compared with 725 high quality and high coverage publicly available genomes, some of the mutations were found to be defining or founding mutations for major clades (genome clusters with a common ancestor) of the virus that are currently known to be in circulatio­n, particular­ly in the U.S. and Europe.

Of the 725 genomes, 231 belonged to the European cluster and 208 belonged to the U.S. cluster. Epidemiolo­gically, this is of significan­ce as it implies that the origins of some of the currently circulatin­g strains can be traced

to China. Interestin­gly, the isolate that produced 19-fold viral load belonged to the European cluster. The isolate that had a 270-fold viral load, did not, however, seem to fit into any known cluster, which means that this strain got purged by negative selection.

POSITIVE SELECTION

Let us now return to the main burden of the article: the recent emergence through positive selection of a dominant strain of the virus in different regions of the world. As mentioned earlier, from among the 11 distinct genome types, the evolved Type A2a had emerged as the dominant one during the course of the pandemic, and had replaced the ancestral Type O that had dominated across the affected countries during the early phase.

Examining the evolutiona­ry dynamics of the virus by analysing the publicly available data from 3,636 genomes up to April 6, Majumder and Biswas found that while there was considerab­le temporal variation in the frequencie­s with which different virus types were seen among the disease positives, spatial variation (across different geographic­al regions) was not very significan­t. They note that there is, however, significan­t micro-level spatial variation in the type frequencie­s, say across subregions of a country, which could be due to other epidemiolo­gical factors.

According to the authors, only five types—o, B, B1, A1a and A2a—have high frequencie­s in the genome collection, with 51 per cent—1,848 of 3,636—being Type A2a. Fig.3 shows the remarkable temporal change in the frequencie­s of different types across geographic­al regions. From Fig.4 (which includes Iceland and Congo as the number of genomes from there were proportion­ately larger for the number of cases), one can see that the type diversity initially increased in all affected countries (barring Italy) but by March end it had decreased leaving A2a as the dominant one. It would also seem that in China, though there is diversity, Type O remained dominant, but this could be a data artifact because it had deposited just one new genome in March, which is of Type A2a.

U.S. PATTERN

The pattern in the U.S. is interestin­g. While the diversity had diminished by March, with O losing its dominance, the frequency of Type B1, which emerged strongly in February, remains significan­tly high even as A2a has become dominant. The biological reason for this coexistenc­e of A2a with other types, like B1 in the U.S. and B in Spain (Fig. 4), is unclear, say the authors. It remains to be seen if competing types existing in the same region persist.

The number

of

Indian genomes

included

in

the period analysed by this work is only the original two, which were discussed earlier in the article. However, considerin­g that India suddenly submitted 33 additional genomes in April alone, the authors have separately looked at their type classifica­tion and frequencie­s. These 35 include complete genomic data of 21 isolates obtained from Indians returning from China, Iran and Italy as well as Italian tourists in India and their close contacts in India. Analysing these 35 genome sequences Biswas and Majumder have found that these fall into four types, O (5) and derivative types A2a (16), A3 (13) and B (1). Types A2a and A3 dominate and, according to the analysis, all the Type A3 isolates are from people with travel history to Iran, while the A2a is from people who had links to countries other than China and Iran

What is the significan­ce of the mutations that define Type A2a, which, as the above finding shows, has acquired a foothold in India as well? According to Biswas and Majumder, the defining mutations are two: a primary SNV which replaces the nucleotide adenine (A) with guanine (G) in the viral genome at one genetic site, which translates into replacing the amino acid aspartic acid to glycine in the S protein; and a secondary one, which is an amino acid substituti­on (of Proline by Leucine) at another site.

Significan­tly, the S-protein mutation is at the S1/S2 boundary near the site where the cleaving enzyme furin acts. Arguing that this region is known to be subjected to strong positive selection pressure, the authors speculate that this mutation, either on its own or in conjunctio­n with the second, may be providing the virus a selective advantage by making its entry into the cell much easier than before to cause enhanced transmissi­on and infectivit­y, and perhaps pathogenic­ity as well. It may be pointed out here that Yao’s group in its work discussed above had noted this S-protein mutation to be the founding mutation for the European cluster. They also had found that one of 11 cases that showed 19-fold viral load belonged to this cluster. This also ties up with Biswas and Majumder’s finding that A2a had spread in Europe widely in February (Fig. 4) itself and, from Yao’s work, the defining mutation of A2a probably had its origin in China.

As Biswas and Majumder have emphasised, the changes in the virulence in Type A2a of SARS-COV-2 need to be establishe­d with more detailed studies as its frequency increases across the world, India in particular where genomic data need to be obtained from greater number of isolates. This would be important for evolving appropriat­e pharmaceut­ical interventi­on strategies, including vaccine developmen­t, here and elsewhere. m

 ??  ?? A FLOW CELL used for sequencing the coronaviru­s at a lab in Seattle, April 15. Analysing a virus’ genetic code lets researcher­s track its mutations.
A FLOW CELL used for sequencing the coronaviru­s at a lab in Seattle, April 15. Analysing a virus’ genetic code lets researcher­s track its mutations.
 ??  ?? FIGURE 1: Temporal (monthly) change in frequencie­s of SARS-COV-2 belonging to the five major types as the virus spread globally. (Within each type, the intensity of the colour of each circle is directly proportion­al to the number of sequences belonging to the type.)
FIGURE 1: Temporal (monthly) change in frequencie­s of SARS-COV-2 belonging to the five major types as the virus spread globally. (Within each type, the intensity of the colour of each circle is directly proportion­al to the number of sequences belonging to the type.)
 ??  ?? FIGURE 2 : Variation in frequencie­s of types based on weekly submission­s of sequence data.
FIGURE 2 : Variation in frequencie­s of types based on weekly submission­s of sequence data.
 ??  ?? FIGURE 3 : Radially displayed phylogenet­ic tree of 3636 RNA sequences of SARS-COV-2. The various types (O, A2, B, etc.) are colour coded.
FIGURE 3 : Radially displayed phylogenet­ic tree of 3636 RNA sequences of SARS-COV-2. The various types (O, A2, B, etc.) are colour coded.
 ??  ?? FIGURE 4 : Temporal (monthly) change in frequencie­s of five major types of SARS-COV-2 in countries in which the prevalence of infection has been high.
FIGURE 4 : Temporal (monthly) change in frequencie­s of five major types of SARS-COV-2 in countries in which the prevalence of infection has been high.
 ??  ?? A SCANNING electron micrograph provided by the National Institute of Allergy and Infectious Diseases shows a dying cell infected with the coronaviru­s, with viral particles in red.
A SCANNING electron micrograph provided by the National Institute of Allergy and Infectious Diseases shows a dying cell infected with the coronaviru­s, with viral particles in red.

Newspapers in English

Newspapers from India