CSC 597/598: Masters' Projects
Research interest:
Bioinformatics:
My main research interest lies in facilitating the integration of experimental and computational research, in particular computational genomics and metagenomics. Genome projects as well as metagenomic projects have resulted in larger sequence databases that remain unanalyzed. My research interest is to develop algorithms and tools to analyze the large amount of data from sequencing projects. As computation becomes the more costly part of the sequence analysis pipeline, the need for more efficient algorithms becomes necessary and crucial.
Main ares of interests:
- Metagenomics analysis
- Gene finding
- Variant calling
- Structural variations
Machine Learning:
Machine leanring has become a major research field in computer science ever since artifical intelligenece and pattern recognition theories have emerged few decades ago. My main interest in machine learning is to develop algorithms that can solve computational problems using big data and machine learning algorithms. I am currently working on projects that apply machine leanring and deap learning algorithms in the domains of bioinformatics, neuroscience and neuroimaging.
Proposals:
Current/Previous Projects:
- Over the past few years, new gene finding algorithms, which extract genes or partial genes directly from NGS reads have surfaced to bypass the assembly problem. One of the most recent methods designed for predicting genes is Metagenomic Gene Caller (MGC). MGC reaches accuracies higher than all available gene finders for metagenomics. However, it is only available as prototype code and remain inaccessible to researchers. This project aims to implement the MGC algorithm as an efficient application for researchers to predict genes in large metagenomic studies as part of the new metagenomics pipeline. The tool should be efficient, scalable and easy to use by researchers who have a little or no experience in information technology.
- DNA sequence compression has recently atracted a lot of interest in the scientific community due to the increase of available genomic data as a result of the breakthroughs of high throuputput sequencing .The purpose of this project is to design and implement an algorithm capable of compressing several genomic file formats by utilizing the best compression techniques for each type of data and by considering the tradeoffs provided by the end user. The algorithm will be made available as an easy to use tool in order to benefit biologists and researchers who may not technologically savvy in order install and run sophisticated tools