Home [Briefings in Bioinformatics] ViBE: a hierarchical BERT model to identify eukaryotic viruses using metagnome sequencing data
Post
Cancel

[Briefings in Bioinformatics] ViBE: a hierarchical BERT model to identify eukaryotic viruses using metagnome sequencing data

JournalDateIFAuthorshipdoi
Briefings in BioinformaticsJul, 202211.622First Authorhttps://doi.org/10.1093/bib/bbac204

ViBE: a hierarchical BERT model to identify eukaryotic viruses using metagnome sequencing data

Ho-Jin Gwak, Mina Rho

Abstract

Viruses are ubiquitous in humans and various environments and continually mutate themselves. Identifying viruses in an environment without cultivation is challenging; however, promoting the screening of novel viruses and expanding the knowledge of viral space is essential. Homology-based methods that identify viruses using known viral genomes rely on sequence alignments, making it difficult to capture remote homologs of the known viruses. To accurately capture viral signals from metagenomic samples, models are needed to understand the patterns encoded in the viral genomes. In this study, we developed a hierarchical BERT model named ViBE to detect eukaryotic viruses from metagenome sequencing data and classify them at the order level. We pre-trained ViBE using read-like sequences generated from the virus reference genomes and derived three fine-tuned models that classify paired-end reads to orders for eukaryotic deoxyribonucleic acid viruses and eukaryotic ribonucleic acid viruses. ViBE achieved higher recall than state-of-the-art alignment-based methods while maintaining comparable precision. ViBE outperformed state-of-the-art alignment-free methods for all test cases. The performance of ViBE was also verified using real sequencing datasets, including the vaginal virome.

This post is licensed under CC BY 4.0 by the author.