Abstract
In a previous study, we identified a 117 base severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequence in the human genome with 94.6% identity. The sequence was in chromosome 1p within an intronic region of the netrin G1 (NTNG1) gene. The sequence matched a sequence in the SARS-CoV-2 Orf1b gene in non-structural protein 14 (NSP14), which is an exonuclease and NSP15, an endoribonuclease. In the current study we compared the human genome with other viral genomes to determine some of the characteristics of human sequences found in the latter. Most of the viruses had human sequences, but they were short. Hepatitis A and St Louis encephalitis had human sequences that were longer than the 117 base SARS-Cov-2 sequence, but they were in non-coding regions of the human genome. The SARS-Cov-2 sequence was the only long sequence found in a human gene (NTNG1). The related coronaviruses SARS-Cov had a 41 BP human sequence on chromosome 3 that was not part of a human gene, and MERS had no human sequence. The 117 base SARS-CoV-2 human sequence is relatively close to the viral spike sequence, separated only by NSP16, a 904 base sequence. The mechanism for SARS-CoV-2 infection is the binding of the virus spike protein to the membrane-bound form of angiotensin-converting enzyme 2 (ACE2) and internalization of the complex by the host cell. We have no explanation for the NSP14 and NSP15 SARS-Cov-2 sequences we observed here or how they might relate to infectiousness. Further studies are warranted.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a positive-sense single-stranded RNA virus (1). In January 2020, SARS-CoV-2 was identified as the cause of an outbreak of viral pneumonia in Wuhan, PR China. The disease, COVID-19, quickly spread worldwide. In the first three months after COVID-19 appeared nearly 1 million people were infected and 50,000 died. The genome of SARS-CoV-2 is less than 30,000 bases, whereas the human genome is over 3 billion. SARS-CoV-2 genes have been identified for 29 proteins, which carry out a range of functions from making copies of the virus to suppressing the body's immune responses.
SARS-CoV-2 is related to two other coronaviruses, Middle East respiratory syndrome (MERS)-CoV and SARS-CoV. Both are much less infectious than SARS-CoV-2. MERS is a viral respiratory disease that was first reported in Saudi Arabia in September 2012 and has since spread to 27 countries. Humans infected with MERS coronavirus (MERS-CoV) develop severe acute respiratory illness, including fever, cough, and shortness of breath. From its emergence through January 2020, the World Health Organization (WHO) has confirmed 2,519 MERS cases and 866 deaths (about 1 in 3). Among all reported human cases, about 80% have occurred in Saudi Arabia. Only two people in the United States tested positive for MERS-CoV, both of whom recovered. They were healthcare providers who lived in Saudi Arabia, where they likely were infected before traveling to the U.S., according to the US Centers for Disease Control and Prevention (CDC).
SARS-CoV can also cause a severe viral respiratory illness. SARS was first identified in Asia in February 2003, though cases were subsequently traced to November 2002. SARS rapidly spread to 26 countries before being contained after about four months. More than 8,000 people contracted SARS and 774 died. Since 2004, there have been no reported SARS cases. Research evidence suggests that SARS-CoV and MERS-CoV originated in bats, and it is likely that SARS-CoV-2 did as well. SARS-CoV spread from infected civets to people, while MERS-CoV spread from infected dromedary camels to people.
SARS-CoV strains have 2 Orf1 (open reading frame) genes, Orf1a and Orf1b. The 16 Orf1ab non-structural proteins (NSPs) are directly involved in viral replication. 5 of the NSPs, NSP12 – NSP16, are on Orf1b (Figure 1). In a previous study, we identified a 117 base SARS-CoV-2 sequence in the human genome with 94.6% identity. The sequence was in chromosome 1p within an intronic region of the netrin G1 (NTNG1) gene. The sequence matched a sequence in the SARS-CoV-2 Orf1b gene (2). In the current study we compared the human genome with other viral genomes to determine some of the characteristics of human sequences found in the latter.
We utilized the UCSC Genome Browser, an on-line genome browser at the University of California, Santa Cruz (UCSC). The browser is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Genome Browser Database, browsing tools, downloadable data files, and documentation are all accessible on the UCSC Genome Bioinformatics website (https://genome.ucsc.edu) (3).
To compare viral genomes to the human genome we used BLAT, the Blast-Like Alignment Tool of the UCSC Genome Browser (3). BLAT can align a user sequence of 25 bases or more to the genome. Because some level of mismatch is tolerated, cross-species alignments may be performed provided the species have not diverged too far from each other; this capability previously allowed comparison of the Mouse Mammary Tumor Virus genome to the human genome (4). BLAT calculates a percent identity score to indicate differences between sequences without a perfect match (i.e. without 100% identity). The differences include mismatches and gaps (5). A BLAT search returns a list of results that are ordered in decreasing order based on the score (5). The results are presented in Table I. Most of the viruses had human sequences, but they were short. For example, three polio sequences were 34 bases, 24 bases, and 20 bases (6). Hepatitis A and St Louis encephalitis had human sequences that were longer than the 117 base SARS-Cov-2 sequence, but they were in non-coding regions of the human genome. The SARS-Cov-2 sequence was the only long sequence found in a human gene (NTNG1). Human NTNG1 encodes a preproprotein that is processed into a secreted protein containing eukaryotic growth factor (EGF)-like domains. This protein acts to guide axon growth during neuronal development. Polymorphisms in this gene may be associated with schizophrenia (7). The related coronaviruses SARS-Cov had a 41 BP human sequence on chromosome 3 that was not part of a human gene, and MERS had no human sequence.
Eight percent of DNA in the human genome comes from human endogenous retroviruses (HERV), and some human diseases have been attributed to this DNA. HERV sequences have occasionally been adapted by the human body to serve a useful purpose, such as in the placenta, where they may safeguard fetal-maternal tolerance (8). However, MERS, SARS-CoV, and SARS-CoV-2 are not retroviruses. Short segments of non-retroviral genomes have been found within the human genome. We are unaware of such a long non-retroviral sequence in the human genome.
The SARS-CoV-2 human sequence lies within the non-structural protein 14 (NSP14), an exonuclease (9) and non-structural protein 15 (NSP15), an endoribonuclease (10). As NSP12 duplicates the coronavirus genome, it sometimes adds an incorrect base to the new copy. NSP14 cuts out these errors, so that the correct base can be added instead. NSP15 protein cuts residual virus RNA segments to evade the infected cell's antiviral defenses.
The 117 base SARS-CoV-2 human sequence is quite close to the viral spike sequence, separated only by NSP16, a 904 base sequence (Figure 1). Human cells have antiviral proteins that identify viral RNA and shred it. NSP16 protein works with NSP10 to camouflage the viral genes and protect them. The mechanism for SARS-CoV-2 infection is the binding of the virus spike protein to the membrane-bound form of angiotensin-converting enzyme 2 (ACE2) and internalization of the complex by the host cell (11).
We have no explanation for the NSP14-NSP15-SARS-Cov-2 sequence we observed here or how it might relate to infectiousness. Further studies are warranted.
Footnotes
Authors' Contributions
Dr. Lehrer and Dr. Rheinstein contributed equally to the conception, data analysis, and writing of this article.
This article is freely accessible online.
Conflicts of Interest
There are no conflicts of interest.
- Received May 1, 2020.
- Revision received May 10, 2020.
- Accepted May 15, 2020.
- Copyright© 2020, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved