A bioinformatics package for efficient virus discovery from deep small RNA sequences
Text by Yi Zheng and Zhangjun Fei
Viruses are a major threat to agricultural production and human health. Efficient and accurate detection of viruses in plants and animals is essential for developing effective strategies to manage the spread and impact of viral diseases. Upon viral infection, plants and animals trigger an antiviral defense response called RNA interference (RNAi) that produces large amounts of virus-derived small interfering RNAs (siRNAs). Deep sequencing of these siRNAs has proven to be a highly efficient approach for virus discovery. However, efficient handling and analysis of large-scale siRNA sequences for accurate virus discovery have posed serious challenges. To this end, we have developed VirusDetect, a bioinformatics pipeline that can efficiently analyze large-scale siRNA datasets for both known and novel virus identification, as indicated by our extensive evaluations using both plant and animal sRNA datasets. VirusDetect is easy to use with both standalone and online versions, and provides user-friendly output.
In 2011, we initiated a project that attempted to create a Pan-African sweet potato virome, which would help us to understand virus diversity, distribution and evolution and their impacts on sweet potato production in Africa. This information would further help to guide phytosanitary requirements, predict risks of future epidemics, and suggest regional disease management strategies. We have collected more than 1,000 geo-referenced field sweet potato samples from 12 countries across sub-Saharan Africa and performed deep sRNA sequencing on these samples. In 2012, we initiated another project to investigate global tomato virus distribution. We collected ~200 tomato samples from different countries around the world, which were subjected to deep sRNA sequencing. With this huge amount of sRNA data, a computational program that can automatically analyze the data for efficient and accurate virus discovery was required. Although several bioinformatics tools were available for virus discovery from high-throughput sequence data, they are mainly designed for RNA-Seq or genome sequencing data. None of them were specifically designed to detect viruses using large-scale sRNA sequence data. Then VirusDetect was born!
We have extensively evaluated the performance of VirusDetect for known and novel virus discovery, using both plant and animal sRNA datasets. VirusDetect is able to detect some viruses that the standard virus indexing procedure failed to discover. VirusDetect can identify novel viruses whose genomes show no homology to any known virus sequences, and in some cases VirusDetect can de novo assemble nearly complete genomes of novel viruses despite the short length (21-24 nt) of sRNAs.
VirusDetect can assemble the nearly complete genome of a novel plant virus from sRNA data of moderate depth. (A) Size distribution of total plant sRNAs and virus-derived siRNAs. (B) siRNA distribution across the viral genome in both positive (+) and negative (-) strands. (C) Alignments of viral contigs (blue lines) assembled from different depths of sRNAs to the viral genome (black line).
Introducing the authors
Yi Zheng is a Postdoc at Boyce Thompson Institute. Zhangjun Fei is an Associate Professor at Boyce Thompson Institute and an Adjunct Associate Professor in the Section of Plant Pathology and Plant-Microbe Biology, School of Integrative Plant Science, Cornell University.
About the research
VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs
Yi Zheng, Shan Gao, Chellappan Padmanabhan, Rugang Li, Marco Galvez, Dina Gutierrez, Segundo Fuentes, Kai-Shu Ling, Jan Kreuze, Zhangjun Fei
Virology, Volume 500, January 2017, Pages 130–138