West African Anopheles gambiae mosquitoes harbor a taxonomically diverse virome including new insect-specific flaviviruses, mononegaviruses, and totiviruses
Author links open overlay panelJoseph R. Fauver a, Nathan D. Grubaugh a, Benjamin J. Krajacich a, James Weger-Lucarelli a, Steven M. Lakin a, Lawrence S. Fakoli III b, Fatorma K. Bolay b, Joseph W. Diclaro II c, Kounbobr Roch Dabiré d, Brian D. Foy a, Doug E. Brackney a 1, Gregory D. Ebel a, Mark D. Stenglein aShow moreAdd to MendeleyShareCitehttps://doi.org/10.1016/j.virol.2016.07.031Get rights and contentUnder a Creative Commons licenseopen access
Highlights
- •Natural viruses of Anopheles species mosquitoes remain understudied relative to those of other mosquitoes.
- •Anopheles species mosquitoes are vectors of serious human diseases, and it is important to understand the viruses that may alter their physiology and ability to transmit disease.
- •We used metagenomic sequencing to identify viruses in adult Anopheles mosquitoes collected from Liberia, Senegal, and Burkina Faso.
- •We characterized the sequences of several new viruses that apparently infect Anopheles gambiae mosquitoes, including flaviviruses and mononegaviruses.
Abstract
Anopheles gambiae are a major vector of malaria in sub-Saharan Africa. Viruses that naturally infect these mosquitoes may impact their physiology and ability to transmit pathogens. We therefore used metagenomics sequencing to search for viruses in adult Anopheles mosquitoes collected from Liberia, Senegal, and Burkina Faso. We identified a number of virus and virus-like sequences from mosquito midgut contents, including 14 coding-complete genome segments and 26 partial sequences. The coding-complete sequences define new viruses in the order Mononegavirales, and the families Flaviviridae, and Totiviridae. The identification of a flavivirus infecting Anopheles mosquitoes broadens our understanding of the evolution and host range of this virus family. This study increases our understanding of virus diversity in general, begins to define the virome of a medically important vector in its natural setting, and lays groundwork for future studies examining the potential impact of these viruses on anopheles biology and disease transmission.
Keywords
Anopheles gambiaeArthropod virusesMetagenomicsSurveillanceBioinformaticsFlavivirusMononegavirusTotivirusVirus discoveryPathogen discoveryVirome
1. Introduction
Mosquitoes (Diptera: Culicidae) are the most important vectors of human disease. Anopheles gambiae and other anopheline mosquitoes are the vectors of Plasmodium parasites in Africa, which cause nearly 200 million malaria cases annually on the continent (World Health Organization, 2015). These mosquitoes are also responsible for transmitting O’nyong-nyong virus, a human pathogenic alphavirus capable of causing large disease outbreaks (Lanciotti et al., 1998, Lutwama et al., 1999). In addition, Anopheles spp. mosquitoes are vectors of Wuchereria bancrofti roundworms, the causative agent of lymphatic filariasis, which affects over 100 million people in sub-Saharan Africa (World Health Organization, 2016). Current interventions for these diseases are inadequate, and future strategies need to deploy current and novel interventions to disrupt pathogen transmission.Constituents of arthropod microbiomes are being increasingly scrutinized for their potential to alter the arthropod's ability to transmit co-infecting human pathogens (Weiss and Aksoy, 2011). The microbiome of Anopheles species, especially in the gut, has been shown to be quite diverse and can vary depending on the mosquito's environment (Buck et al., 2016, Grubaugh et al., 2015). Furthermore, certain bacteria can influence Plasmodium development in the mosquito (Boissière et al., 2012, Cirimotich et al., 2011, Dong et al., 2009, Gendrin et al., 2015, Hughes et al., 2014). During the last decade there has been increased attention to insect-specific viruses (ISVs) and their potential role in disrupting pathogen transmission (Blitvich and Firth, 2015, Bolling et al., 2015, Junglen and Drosten, 2013, Vasilakis and Tesh, 2015). The majority of ISVs have been described in mosquitoes, although they are known to occur in several arthropod orders, including Hemiptera (i.e. true bugs) (Li et al., 2015) and Parasitiformes (e.g. ticks) (Tokarz et al., 2014). ISVs belong to taxonomically diverse virus families including Bunyaviridae (Auguste et al., 2014, Chandler et al., 2014, Marklewitz et al., 2011, Marklewitz et al., 2013, Yamao et al., 2009), Flaviviridae (Blitvich and Firth, 2015, Cook et al., 2006, Hoshino et al., 2007, Lee et al., 2013, Misencik et al., 2016, Tyler et al., 2011), Reoviridae (Attoui et al., 2005, Auguste et al., 2015, Hermanns et al., 2014) Rhabdoviridae (Kuwata et al., 2011, Quan et al., 2010, Vasilakis et al., 2014a), and Togaviridae (Nasar et al., 2012). In addition, ISVs in the Birnaviridae (Huang et al., 2013, Marklewitz et al., 2012, Vancini et al., 2012), Nodaviridae (Schuster et al., 2014), Tymoviridae (Wang et al., 2012), and Parvoviridae (Ren et al., 2008) families have been characterized, the later from which an Anopheles-specific densovirus is being examined as a paratransgenesis candidate (Ren et al., 2008, Suzuki et al., 2014). Recently discovered ISVs include those in the family Mesoniviridae (Kuwata et al., 2013, Nga et al., 2011, Vasilakis et al., 2014b, Warrilow et al., 2014, Zirkel et al., 2011, Zirkel et al., 2013) and a variety of positive sense ssRNA viruses including the negeviruses (Kallies et al., 2014, Nabeshima et al., 2014, Vasilakis et al., 2013). Recently, two new RNA viruses, a dicistrovirus and a cypovirus, were identified in Anopheles species mosquitoes (Carissimo et al., 2016). Mosquitoes are divided into two subfamilies: Culicinae and Anophelinae, however, a disproportionate amount of mosquito ISVs have been identified from culicine mosquitoes, leaving anopheline mosquitoes relatively understudied.We therefore used metagenomic sequencing to identify viruses infecting wild An. gambiae, An. funestus, and An. rufipes mosquitoes in West Africa. We sampled adult mosquitoes from villages in rural Burkina Faso, Liberia, and Senegal, sequenced RNA, and searched datasets for virus sequences. We identified sequences from multiple new viruses. For several of these, we generated coding complete genome sequences, performed comparative and phylogenetic analyses, and determined the prevalence from our field-collected samples. Our findings indicate that anopheline mosquitoes naturally harbor multiple viruses including flaviviruses.
2. Materials and methods
Mosquito samples used in this study were collected from 2012 to 2015 on separate trips to Senegal, Liberia, and Burkina Faso (Fig. 1 and Table 1). Indoor resting bloodfed mosquitoes were collected in Senegal (Alout et al., 2014, Krajacich et al., 2015) and Liberia (Grubaugh et al., 2015) as previously described. Mosquitoes from Burkina Faso were colonized and reared at the Colorado State University (CSU) Arthropod-Borne and Infectious Disease Laboratory prior to being sampled.
Fig. 1. Mosquito collection sites in Liberia, Senegal, and Burkina Faso. In Liberia, mosquitos were sampled from 6 villages within an area with an approximate radius of 16 km. The collection site in Burkina Faso for mosquito eggs that were used to found a laboratory colony is indicated.
Table 1. Summary of mosquito samples analyzed.
Sample set | Location | Date | Number mosquitos analyzed | Mosquito speciesa | Analysis method |
---|---|---|---|---|---|
1 | Senegal | 8/22/2012 | 41b | An. funestus/An. gambiae/An. rufipes | NGS (MiSeq) |
2 | Burkina Faso | 1/5/2015c | 17 | An. gambiae | NGS (NextSeq) |
3 | “ | 12/30/2015c | 3 | “ | “ |
4 | Liberia (village A) | 6/11&13/2013 | 31 | An. gambiae | NGS (HiSeq) |
5 | “ | 6/15/2013 | 38 | “ | “ |
6 | “ | 6/19/2013 | 24 | “ | “ |
7 | “ | 6/21/2013 | 41 | “ | “ |
8 | “ | 6/23/2013 | 29 | “ | “ |
9 | “ | 6/25/2013 | 57 | “ | “ |
10 | Liberia (village B) | 6/10&14/2013 | 51 | “ | “ |
11 | “ | 6/22&26/2013 | 57 | “ | “ |
12 | Liberia (village C) | 3/30/2015 | 15d | An. gambiae | PCR / Sanger sequencing |
13 | Liberia (village D) | 3/31/2015 | 15 | “ | “ |
14 | Liberia (village E) | 4/1/2015 | 15 | “ | “ |
15 | Liberia (village A) | 4/2/2015 | 15 | “ | “ |
16 | Liberia (village F) | 4/3/2015 | 15 | “ | “ |
Notes:aAs determined by field identification and molecular analysis (see Section 2).bMosquitos in sample sets 1–11 were pooled (1 pool per sample set) for sequencing.cThese mosquitoes were sampled on these dates from a laboratory colony that was derived from An. gambiae larvae collected in Burkina Faso.dMosquitos in sample sets 12–16 were analyzed individually (i.e. not pooled) using PCR and Sanger sequencing to validate NGS results and measure prevalence.
2.1. Sample preparation
2.1.1. Burkina Faso
A laboratory colony of An. gambiae s.s. was established by the Institut de Recherche en Sciences de la Santé from larvae collected in Burkina Faso in 2014 and eggs from this colony were subsequently shipped to Colorado State University (CSU). Mixed sex, non-bloodfed mosquitoes from the colony at CSU were homogenized in 1 ml of mosquito diluent (80% PBS, 20% FBS, supplemented with penicillin, streptomycin, gentamicin, and amphotericin B (Fauver et al., 2015)) with a steel ball bearing for RNA extractions. 50 µl of cleared supernatant was used for RNA extraction with the Mag-Bind Viral DNA/RNA kit (Omega, Georgia, USA) with the KingFisher Flex Magnetic Particle Processor (Thermo Fisher Scientific, Massachusetts, USA) according to manufacturer's protocol. Libraries were prepared using the Ovation RNA-Seq System V2 (NuGEN, California, USA) and Ovation Ultralow DR Multiplex System 1–96 (NuGEN) and sequenced on an Illumina NextSeq at the CSU NGS facility.
2.1.2. Senegal
Field-caught bloodfed mosquitoes were pooled by date and stored in RNA Later (Ambion) at −80 °C. Pools were thawed and RNA Later was removed. 1 ml of PBS was added and mosquito pools were centrifuged for 5 min. 140 µl of supernatant was used for RNA extraction using Qiagen Viral RNA Mini Kit (Qiagen, Hilden, Germany) according to manufacturer's protocol with slight modification. Extracted RNA was subjected to DNase treatment (Thermo Fisher Scientific) and purification using Agencourt RNAclean XP beads (Bechman Colter Genomics, Pasadena, CA). cDNA was amplified using the Ovation RNA-Seq System V2 and prepared for library construction using the Ovation Ultralow DR Multiplex System 1–96 as described (Grubaugh et al., 2016). Libraries were sequenced on a NextSeq instrument at the CSU NGS facility.
2.1.3. Liberia
Mosquitoes were collected and processed as previously described (Grubaugh et al., 2015). Briefly, bloodfed mosquitoes were knocked down with triethylamine and bloodmeals were expelled onto Whatman Flinders Technology Associates (FTA) clone saver cards (GE Healthcare Life Sciences, Little Chalfont, United Kingdom). Cards were stored at −20 °C and shipped to CSU for additional processing. Mosquito dried blood spots (M-DBS) were removed using a Harris 3 mm micro-puncher (GE Healthcare Life Sciences) and placed into RNA Rapid Extraction Solution (Ambion, Texas, USA) to elute nucleic acid off of the cards. RNA was extracted using the Mag-Bind Viral DNA/RNA kit with the KingFisher Flex Magnetic Particle Processor according to manufacturer's protocol. Libraries were prepared using the Ovation RNA-Seq System V2 and Nextera XT DNA Sample Preparation Kit (Illuminia, San Diego, CA) and sequenced on an Illumina HiSeq (Beckman Colter Genomics, Danvers, MA).
2.2. Sequence analysis
Sequencing datasets were processed with the goal of taxonomically assigning non-mosquito reads. First, low quality and Illumina adapter sequences were removed using the Trimmomatic tool version 0.32 with the following settings: ILLUMINACLIP: NexteraPE:1:30:10:4: true LEADING:20 TRAILING:20 SLIDINGWINDOW:4:25 MINLEN:60 (Bolger et al., 2014). Potential PCR duplicate sequences were collapsed using the CD-HIT-EST tool, version 4.6 with parameter –c 0.96 (Li and Godzik, 2006). Then, mosquito sequences were removed by aligning reads to databases of An. gambiae genomic sequences using Bowtie2 version 2.2.5, with parameters –sensitive –score-min C,60,0 (Holt et al., 2002). Remaining sequences were de novo assembled into contiguous sequences using the SPAdes genome assembler (Bankevich et al., 2012, Langmead and Salzberg, 2012). Resulting contigs and non-assembling reads were then taxonomically assessed, first by using the gnapl tool, version 2014–12-28, to align to the NCBI nt nucleotide database (Wu and Nacu, 2010). Sequences that did not produce a nucleotide-level alignment were then searched via translated-nucleotide to protein alignments against the NCBI nr protein sequence database using the Rapsearch2 tool, version 2.23, with parameters –a t, −1 20, and –e1 e-2 (Zhao et al., 2012). Draft virus genome sequences were validated by mapping individual reads to assemblies using Bowtie2 as above and in some cases using PCR and Sanger sequencing. Resulting alignments were imported into Geneious software version 9.0.4 and manually inspected (Kearse et al., 2012). Sequence datasets have been deposited in the NCBI Short Read Archive (SRA) with accession PRJNA327220.
2.3. Analysis of predicted viral protein sequences
ORFs in viral genome assemblies were predicted using Geneious software. Homologs of predicted protein sequences were detected using the BLASTP tool (version 2.2.25+) to search the NCBI non-redundant protein database (nr) (Altschul et al., 1997). For sequences with no detectable similarity by BLASTP, the HHpred homology detection and structure prediction tool (version 2.0) was also used (Soding et al., 2005). The transmembrane prediction tool in Geneious was used to predict transmembrane domains. Virus and virus-like contigs longer than 500 nt long were aligned to the NCBI nr protein sequence database using BLASTX to determine taxonomic classification, closest relative, and percent similarity to closest related sequences.
2.4. Phylogenetic analysis
Predicted viral protein sequences were used to query the NCBI nr protein database using the BLASTP tool (Altschul et al., 1990). Database sequences that aligned with an E-value less than 10−3 and that were full length or nearly full-length were downloaded. These were collapsed using the CD-HIT tool, version 4.6 using parameter –c 0.9 (Li and Godzik, 2006). These representative sets of sequences were aligned using the MAFFT software, version 7.221, using the LINSI mode (Katoh and Standley, 2013). Phylogenetically uninformative columns were removed from multiple alignments using the GBlocks tool, version 0.91b, with parameter –b5=n (Talavera and Castresana, 2007). These trimmed alignments were used to create phylogenies with the MrBayes version 3.2.5 with commands preset aamodelpr=mixed and mcmc ngen= 1,000,000 (Ronquist and Huelsenbeck, 2003). Convergence was confirmed by inspecting the standard deviation of split frequencies. Phylogenies were visualized using FigTree version 1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/).
2.5. Validation of mosquito species collected
We used a molecular strategy to corroborate field identification of the species of collected mosquitoes. We collected a representative set of mosquito cytochrome c oxidase subunit 1 (CO1) gene sequences (Supplemental Table 1) and used Bowtie2 to align sequences from our quality-filtered datasets to these sequences and tabulated the fractions of reads aligning to each mosquito species. This analysis corroborated field-based species identifications (Table 1).
2.6. Prevalence of Anopheles viruses
The abundance of virus reads in sequencing datasets was calculated by using Bowtie2 to map all unique, host-filtered reads to all of the virus sequences identified in this study. The abundance was defined as the number of mapping reads per million unique reads in each dataset.Field prevalence of Anopheles Flavivirus, Bolahun Anopheles virus, and Anopheles Totivirus was determined using independent sample sets collected in 2015 (Table 1). RNA isolated from individual Liberian M-DBS was tested for the presence of each virus by RT-PCR using the Qiagen OneStep RT-PCR kit according to manufacturer's protocol with virus specific primers (Supplemental Table 2). 15 M-DBS from 5 villages (75 total) were used to calculate field prevalence.
3. Results
We collected adult Anopheles mosquitoes from several West Africa locations and performed metagenomic sequencing of midgut RNA-derived libraries to search for viruses. In total, we analyzed 328 adult An. gambiae from Liberia and 41 mixed An. gambiae/An. funestus/An. rufipes from Senegal (~1/3 of each species). These mosquitoes were collected over the course of several years from multiple villages (Table 1). We also analyzed 20 mosquitoes from a laboratory colony established in 2014 from Burkina Faso An. gambiae s.s. larvae. These colony mosquitoes were sampled in 2 batches ~12 months apart. Mosquitoes were combined into pools of between 3 and 57 mosquitoes each, and pools were sequenced on Illumina instruments to a median depth of 4.8×107 150 nt read pairs per pool. After filtering low quality, duplicate, and mosquito-derived reads, a median of 3.0×105 reads remained (0.6%). Remaining reads were de novo assembled and taxonomically assigned by comparison to sequences in Genbank, first by nucleotide-to-nucleotide alignments, then by translated-nucleotide to protein alignments. We identified a number of putative viral sequences and determined coding-complete genome segment sequences where possible (Ladner et al., 2014). Sequencing depth was determined for each coding complete genome (Supplemental Fig. 1). We also used sequencing data to confirm the species composition of the collected mosquito sample sets (Table 1).
3.1. Anopheles flavivirus
We identified flavivirus sequences in datasets from mosquitoes collected in Liberia and Senegal (Fig. 2 and Table 1). The genus Flavivirus (family Flaviviridae) includes viruses whose life cycle involves alternating replication in vertebrate and arthropod hosts and members whose life cycle is restricted to one host or the other (Van Regenmortel, 2000a). Typical flaviviruses have positive-sense ssRNA genomes that encode a single large polyprotein. We assembled two coding-complete flavivirus genomes from Liberia datasets, which shared 95% pairwise nucleotide identity. We designated these as Anopheles flavivirus (AnFV) – variants 1 and 2. We also assembled partial sequences from at least one additional flavivirus from the Senegal datasets, which we called Anopheles flavivirus-like sequences 1 and 2 (Table 1). Anopheles flavivirus-like sequence 1 shared ~79% pairwise nucleotide identity with the AnFV sequences. These and other genome sequences represent the consensus of sequences that shared ≥99.5% pairwise identity. Single nucleotide variants were evident in the various datasets (Supplemental Fig. 2). Given that these datasets derive from pools of mosquitoes, this variation could represent intra- or inter-host diversity, or both.
Fig. 2. Anopheles flavivirus genome organization, dinucleotide usage, and phylogeny (A) Genomic organization of anopheles flavivirus. Predicted functional domains of the viral polyprotein are indicated as is the fifo ORF and the predicted ribosomal frameshift “slippery” sequence at its beginning is indicated by a triangle. The other AnFV and AnFV-like sequences identified in this study (KX148547-KX148549) have a similar organization. (B) AnFV clusters phylogenetically with classic ISFVs. A multiple sequence alignment of flavivirus NS5 protein sequences was used to create a Bayesian phylogeny. The phylogeny was rooted on the branch to Tamana bat virus (not shown). Posterior probabilities of select nodes are indicated. A fully-labeled version of this phylogeny including accession numbers and node posterior probabilities is available as Supplemental Fig. 3. (C) Phylogeny as in (B) but focused on the cISFV clade. (D) Dinucleotide usage in AnFV supports its categorization as a classic insect specific flavivirus (ISFV). CpG and UpA dinucleotide frequencies for viruses in the genus Flavivirus in the NCBI RefSeq database are indicated. Points are color coded according to flavivirus categories as indicated and as in (Blitvich and Firth, 2015). FV: flavivirus. Anopheles flavivirus points are indicated.
The genome organization, gene content, phylogenetic placement, and dinucleotide usage supports the classification of AnFV as a “classic” insect-specific flavivirus (cISFV) (Blitvich and Firth, 2015, Bolling et al., 2015). The AnFV genome contains an ORF of 10032 nt predicted to encode a polyprotein of 3341 amino acids (Fig. 2A). The polyprotein is predicted to be co- and post-transcriptionally cleaved to produce the typical 3 structural and 7 non-structural flavivirus proteins and we identified putative cleavage sites. The polyprotein shares 35–43% global pairwise amino acid identity with sequences from other cISFVs.In addition to the polyprotein ORF, the genome contains an 840 nt ORF overlapping with the NS2 coding region (Fig. 2A). The reading frame of this ORF is −1 relative to the polyprotein ORF, and 5 nt downstream of the predicted NS2A cleavage site, a putative “slippery” sequence (GGAUUUU) was identified as a likely site of ribosomal frameshifting. These “fifo” ORFs (fairly interesting flavivirus ORF) are a characteristic of cISFV genomes (Blitvich and Firth, 2015, Firth et al., 2010). The predicted AnFV FIFO protein possesses no detectable sequence similarity with other cISFV FIFO proteins. However, like other cISFV FIFO proteins, the AnFV protein contains predicted transmembrane domains (Firth et al., 2010). Phylogenetic analysis corroborated the designation of AnFV as a cISFV (Fig. 2B, C and Supplemental Fig. 3). In Bayesian phylogenies based on alignments of NS5 protein sequences, the AnFV sequences occupy a well-supported branch within the cISFV clade. Analyses based on alignments of full polyprotein sequences produced phylogenies with essentially identical topologies.Viral genomes often exhibit patterns of dinucleotide usage similar to that of their hosts. On this basis, cISFV genomes can be distinguished from those of flaviviruses whose lifecycles include replication in vertebrates (Blitvich and Firth, 2015). Based on dinucleotide usage, AnFV clusters with cISFVs (Fig. 2D).
3.2. Bolahun and Gambie viruses
We identified mononegavirus sequences in mosquitoes from Liberia and Senegal and from our Burkina Faso-derived colony. Mononegavirales is a large and diverse order of viruses that have single-stranded negative polarity RNA genomes and evolutionarily related RNA dependent RNA polymerase (RdRp) genes (Van Regenmortel, 2000b). We assembled apparently coding-complete mononegavirus genomes from each of these three datasets (Fig. 3). The genomes from Liberia and Burkina Faso were closely related (94% pairwise nt identity), and we designated these as Bolahun virus (BOAV) – variants 1 and 2. The sequence from Senegal, which we named Gambie virus (GAMV), shared ~60% pairwise nt identity with BOAV.
Fig. 3. Bolahun and Gambie viruses. (A) Genomic organization of Bolahun virus. Predicted coding sequences are indicated. Gambie virus has essentially identical genome organization. (B) Bolahun and Gambie viruses cluster with bornaviruses, nyamiviruses, and arthropod viruses. RdRp-based Bayesian phylogeny. Posterior probabilities of select nodes are indicated. Phylogenies showing all mononegavirus sequences are shown in Supplemental Figs. 4 and 5. (C) Viruses in this virus clade encode a predicted small zinc finger protein just upstream of the L ORF. Genome cartoons of Bolahun virus, Xincheng mosquito virus, and Shuangao fly virus 2. Open reading frames are labeled according to (Li et al., 2015). ORFs not annotated in KM817661 and KM817638 are outlined with dotted lines. Small ORFs predicted to encode Zn finger proteins upstream of L ORFs are colored purple. (D) Multiple alignment of proteins encoded by small ORFs highlighted in (C). Cysteine CXXC motifs characteristic of Zn finger domains are underlined.
The BOAV and GAMV genomes share a similar overall genome organization, with 6 non-overlapping ORFs (Fig. 3A). In mononegaviruses, the ORF nearest the 3′ end of the genome encodes the viral nucleoprotein. The BOAV and GAMV ORF1 is predicted to encode protein of 446 amino acids (lengths given for BOAV, accession KX148552) with no transmembrane domains and an isoelectric point of 8.7. By BLASTP search, the only identifiable homologous sequence was that encoded by Xincheng mosquito virus ORF1 (Li et al., 2015), which shares 19% pairwise global amino acid identity. These are likely nucleoproteins for these viruses. ORF2 encodes a predicted small transmembrane domain-containing protein of 65 residues. ORF3 encodes a protein predicted to be 446 AA long with no detectable similarity to known proteins by BLASTP or HHPRED analyses w/ E-value cutoff 0.1 (Soding et al., 2005). ORF4 encodes the predicted viral glycoprotein, a 638 AA protein with 3 transmembrane domains and sequence similarity to a variety of mononegavirus glycoproteins. As with the putative nucleoprotein, the most similar sequence is from Xincheng mosquito virus, with 38% global pairwise identity. As in all mononegaviruses, the last ORF is predicted to encode the large RdRp (L) protein. BOAV and GAMV proteins share between 36% (ORF2 protein) and 72% (glycoprotein) global amino acid identity.In phylogenies based on alignments of L protein sequences, BOAV and GAMV form a well-supported clade with Xincheng mosquito virus and Shuangao fly virus, which were identified in samples from China from an Anopheles sinensis mosquito and a Psychoda alternata fly (Fig. 3B and Supplemental Figs. 4 and 5). These viruses form a sister clade with those of the Bornaviridae and the Nyamiviridae, two mononegavirus families (Kuhn et al., 2013, Kuhn et al., 2015, Mihindukulasuriya et al., 2009).In the 3 BOAV and GAMV genomes, and in the genomes of Xincheng mosquito virus and Shuangao fly virus 2, there is a small ORF upstream of the L ORF (Fig. 3C). These ORFS range from 123 to 162 nt and encode predicted proteins of 40–53 AA with CXXC motifs characteristic of zinc ribbon type zinc finger domains (Krishna et al., 2003) (Fig. 3D). By HHPRED analysis, similarity to various cellular zinc finger domains was detected. In addition to their phylogenetic placement, a defining characteristic of the viruses in this clade may be the presence of these small ORFs.
3.3. Anopheles totivirus
We identified a totivirus-like sequence in An. gambiae mosquitoes in Liberia (Fig. 4A). Totiviruses were historically only known to infect plant and protist hosts. However a growing number of reports have described totiviruses infecting or associated with arthropod hosts (Dantas et al., 2016, Isawa et al., 2011, Koyama et al., 2015, Martinez et al., 2016, Poulos et al., 2006, Wu et al., 2010, Zhai et al., 2010). Totiviruses have dsRNA genomes that typically have 2 overlapping ORFs that encode the viral capsid and RdRp proteins. We assembled the apparent coding-complete genome of this virus, which we designated Anopheles totivirus (AToV). There are two large ORFs in the AToV genome, but they are in the same reading frame and their translation is unlikely to involve ribosomal frameshifting (Fig. 4A). The first ORF encodes a predicted protein of 980 AA with similarity to sequences encoded in several dipteran genomes. The second ORF encodes the predicted viral RdRp of 1006 AA. In phylogenies based on alignments of totivirus RdRp sequences, AToV clusters with arthropod-infecting or arthropod-associated totiviruses (Fig. 4B).