Creating ‘Phagonaute’ a web-based interface for homology searches

Read the full article on ScienceDirect.

Allowing phage synteny browsing and protein function prediction

Text by Marie-Agnes Petit

The recent renewed interest for phages and viruses in general, has lead to the sequencing and tentative annotating of hundreds of new phage genomes. However, from this huge wave of new information the phage “dark matter” concept has also emerged, referring to the fact that the vast majority of phage genes resist annotation. Our earlier work, as well as the work of many others, had nevertheless suggested that part of the problem was due to unfit tools for homology search. Due to the remarkable divergence of phage proteins, simple BLAST searches are often unproductive. This paper describes the setting up of the “Phagonaute” web interface allowing the navigation among complete phage genomes, and taking into account this difficulty. Its purpose is to allow “module” comparisons across genomes, based on distant protein homologies. Within a window of 6-12 genes around a specific query gene, all homology relationships with related phages are displayed graphically, using a color code. Synteny conservation serves to strengthen the potential new function prediction uncovered by distant homology. This tool is therefore designed to help experimentalists to pick the right gene for the right experiment.

The motivation to build up this site came from the fact that in the bacterial or eukaryotic world, such tools already exist and greatly help experimental research. One of the first websites serving this purpose was created by Ross Overbeek a long time ago and still serves today: it is entitled the ‘show neighborhood’ function, on the Integrated Microbial Genome (IMG) site of the Joint Genome Institute. Genomicus is its counterpart (with a different design) for Eukaryots.

Before publishing this work, we used the site for our research purposes, mainly interrogating genes with functions related to homologous recombination, and were surprised by the amount of fruitful guesses it permitted (these are given in the paper as examples of use).

The main problem we had to overcome was the treatment of protein fusions, which could easily meddle up the results. Let us say a query starts with a protein resulting from the fusion of an exonuclease with a recombinase, it will indistinctly display as ‘homologs’ many exonuclease and recombinase proteins which are distinct proteins. We solved this by splitting genes into ‘domains’ (which are not functional domains, but domains of homology). This gives to the output graphic an additional level of refinement, the mapping of the homology region.

We hope this new phage tool will lead to many exciting discoveries and will contribute, with time, to decrease the phage dark matter.

Petit_image

Figure legend

Using co-occurrence to detect new functions. Starting from the hkaK gene (in red) with unknown function, encoded by phage HK620 (infecting E. coli), Phagonaute displays a large list of distant homologs, among which some are annotated as SSB (see legend). In addition, the sak4 gene (also named hkaL, in green, right to hkaK in HK620) is often co-occurrent with this putative ssb.

Introducing the author

Picture1

Marie-Agnes Petit

About the research

Phagonaute: A web-based interface for phage synteny browsing and protein function prediction
Virology, Volume 496, September 2016, Pages 42–50
Hadrien Delattre, Oussema Souiai, Khema Fagoonee, Raphaël Guerois, Marie-Agnès Petit