A Classification Approach for genome structural variations detection

Background: Finding accurate genome structural variations (SVs) is important for understanding phenotype diversity and complex diseases. Limited research using classification to find SVs from next-generation sequencing is available. Additionally, the existing algorithms are mainly dependent on an analysis of the alignment signatures of paired-end reads for the prediction of different types of variations. Here, the candidate SV regions and their features are computed using single reads only. Classification is used to predict the variation types of these regions. Results: Our approach utilizes reads with multi-part alignments to define a possible set of SV regions. To annotate these regions, we extract novel features based on the reads at the breakpoints. We then build three random forest classifiers to identify regions with deletions, inversions, or tandem duplications. Conclusions: This paper proposes a random forest-based classification approach, MPRClassify, which addresses the issue of finding SVs using single reads only. These single-reads are used to define candidate regions and extract their features. Experimental results show that single reads are sufficient to find SVs without the need for paired-end read signatures. Our proposed approach outperforms existing approaches and serves as a basis for future studies finding SVs using single reads.

Volume Number

Issue Number

Magazine \ Newspaper

Journal of Proteomics & Bioinformatics

Pages

211-218

more of publication

Biomolecular databases and subnetwork identification approaches of interest to Big Data community: An Expert Review

Next-generation sequencing approaches and genome-wide studies have become essential for characterizing the mechanisms of human diseases.

2019

The Effect of Machine Learning Algorithms on Metagenomics Gene Prediction

The development of next-generation sequencing facilitates the study of metagenomics. Computational gene prediction aims to find the location of genes in a given DNA sequence. Gene prediction in…

by Achraf El Allali

2019

CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction

Accurate gene prediction in metagenomics fragments is a computationally challenging task due to the short-read length, incomplete, and fragmented nature of the data. Most gene-prediction programs…

2018