Showing posts with label Machine Learning. Show all posts
Showing posts with label Machine Learning. Show all posts

Tuesday, 29 August 2017

Microbial mediated drug metabolism

A recent publication out from our lab. 
A novel approach for the prediction of species-specific biotransformation of xenobiotic/drug molecules by the human gut microbiota

Abstract
The human gut microbiota is constituted of a diverse group of microbial species harbouring an enormous metabolic potential, which can alter the metabolism of orally administered drugs leading to individual/population-specific differences in drug responses. Considering the large heterogeneous pool of human gut bacteria and their metabolic enzymes, investigation of species-specific contribution to xenobiotic/drug metabolism by experimental studies is a challenging task. Therefore, we have developed a novel computational approach to predict the metabolic enzymes and gut bacterial species, which can potentially carry out the biotransformation of a xenobiotic/drug molecule. A substrate database was constructed for metabolic enzymes from 491 available human gut bacteria. The structural properties (fingerprints) from these substrates were extracted and used for the development of random forest models, which displayed average accuracies of up to 98.61% and 93.25% on cross-validation and blind set, respectively. After the prediction of EC subclass, the specific metabolic enzyme (EC) is identified using a molecular similarity search. The performance was further evaluated on an independent set of FDA-approved drugs and other clinically important molecules. To our knowledge, this is the only available approach implemented as ‘DrugBug’ tool for the prediction of xenobiotic/drug metabolism by metabolic enzymes of human gut microbiota.

Please explore and write back to me (ashoks773@gmail.com) in case of any problem. Comments are welcome.

Monday, 2 November 2015

Machine learning for metagenomics

This review covers most of the machine learning methods used till now for the analysis of metagenomic data. You can read the full text from the link below.
http://arxiv.org/pdf/1510.06621v1.pdf

Tuesday, 14 April 2015

Woods: A fast and accurate functional annotator and classifier of genomic and metagenomic sequences

Time to analyse your data using woods "a functional annotator and classifier of genomic and metagenomic sequences"
A recent publication out from our lab. Please explore and write back to (ashok@iiserb.ac.in) in case of any problem. Comments are welcome. 

Thursday, 19 March 2015

Composition based methods for taxonomic classification

Taxonomic classification of 16S rRNA or metagenomic sequencing reads is one of the most important steps in order to understand the diversity of microbes within a microbial community. Methods for taxonomic classification have been divided into two major categories, based on their algorithms, composition based methods and similarity-based methods. Here, I have discussed in detail about composition based methods. Most probably in my next post, I would like to focus on similarity-based approaches. 

1. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences.

ETRA provides a statistical analysis of tetranucleotide usage patterns in genomic fragments, either via a web-service or a stand-alone program.

2. PhyloPathia: Accurate phylogenetic classification of variable-length DNA fragments.
PhyloPythia, a composition-based classifier that combines higher-level generic clades from a set of 340 completed genomes with sample-derived population models. Extensive analyses on synthetic and real metagenome data sets showed that PhyloPythia allows the accurate classification of most sequence fragments across all considered taxonomic ranks, even for unknown organisms.
Available at:
http://cbcsrv.watson.ibm.com/phylopythia.html

3. PhyloPathiaS: The PhyloPythiaS Web Server for Taxonomic Assignment of Metagenome Sequences.
PhyloPythiaS is a fast and accurate sequence composition-based classifier that utilizes the hierarchical relationships between clade. PhyloPythiaS is freely available for non-commercial users and can be installed on a Linux-based machine.

4. TACOA: Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach
The classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning. It is an accurate multi-class taxonomic classifier for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp.
http://www.biomedcentral.com/1471-2105/10/56#B20

5. RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles
RAIphy is a composition-based semisupervised binning algorithm that uses a novel sequence similarity metric with iterative refinement of taxonomic models and functions effectively. RAIphy has been implemented as a simple, compact standalone desktop application, which is fast compared to similarity-search-based applications. While achieving competitive binning accuracies for the DNA sequencing read length range (100-1000 bp), the method also performs accurately for longer environmental contigs.

6. NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads.
A webserver that implements the naïve Bayes classifier (NBC) to classify all metagenomic reads to their best taxonomic match. Results indicate that NBC can assign next-generation sequencing reads to their taxonomic classification and can find significant populations of genera that other classifiers may miss.

7. Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models

Phymm, a classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 bp, representing a substantial leap forward over previous composition-based classification methods. They also describe how combining Phymm with sequence alignment algorithms, further improves accuracy.

8. GSTaxClassifier: a genomic signature based taxonomic classifier for metagenomic data analysis.
GSTaxClassifier takes input nucleotide sequences and using a modified Bayesian model evaluates the genomic signatures between metagenomic query sequences and reference genome databases. The simulation studies of a numerical data sets showed that GSTaxClassifier could serve as a useful program for metagenomics studies.

9. SPHINX: an algorithm for taxonomic binning of metagenomic sequences.
A hybrid binning approach (SPHINX) that achieves high binning efficiency by utilizing the principles of both 'composition'- and 'alignment'-based binning algorithms.

10. TAC-ELM: Metagenomic taxonomic classification using extreme learning machines.
A new sequence composition-based taxonomic classifier using extreme learning machines referred to as TAC-ELM for metagenomic analysis. TAC-ELM uses the framework of extreme learning machines to quickly and accurately learn the weights for a neural network model. The input features consist of GC content and oligonucleotides.

11. AKE - the Accelerated k-mer Exploration web-tool for rapid taxonomic classification and visualization
Acceleration in AKE’s taxonomic assignments is achieved by a special machine learning architecture, which is well suited to model data collections that are intrinsically hierarchical. 

12. TAXSOM: Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics.
Biodiversity is usually targeted by classifying 16S ribosomal RNA genes, while metagenomic approaches target metabolic genes. However, both approaches remain isolated, as long as the taxonomic and functional information cannot be interrelated. Techniques like self-organizing maps (SOMs) have been applied to cluster metagenomes into taxon-specific bins in order to link biodiversity with functions.

13. Kraken: ultrafast metagenomic sequence classification using exact alignments

Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. 

14. RDP Classifier: Naive Bayesian classifier for rapid assignment of rRNAsequences into the new bacterial taxonomy.

The Ribosomal Database Project (RDP) Classifier, a naïve Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryote. For shorter rRNA segments, such as those that might be generated by pyrosequencing, the error rate varied greatly over the length of the 16S rRNA gene. The RDPClassifier is suitable both for the analysis of single rRNA sequences and for the analysis of libraries of thousands of sequences.
Available at: http://rdp.cme.msu.edu/

15. 16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets

16S Classifier is developed using a machine learning method, Random Forest, for faster and accurate taxonomic classification of short hypervariable regions of 16S rRNA sequence. It displayed precision values of up to 0.91 on training datasets and the precision values of up to 0.98 on the test dataset. On real metagenomic datasets, it showed up to 99.7% accuracy at the phylum level and up to 99.0% accuracy at the genus level.

Sunday, 8 February 2015

16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets

Time to analyze your 16s rRNA data using 16S Classifier
A recent publication out from our lab. Please explore and write back to (ashok@iiserb.ac.in) in case of any problem. Comments are welcome. 
To the best of our knowledge, 16S Classifier is the only available tool which can carry out the efficient, sensitive and accurate taxonomic assignment of any of the 16S rRNA hypervariable regions which are commonly used in metagenomic projects. In the case of complete 16S rRNA also, it displayed exceptional (precision of 0.97) performance on the test dataset. Thus, the wide usage of this tool is anticipated in different metagenomic projects. 16S Classifier is available freely at 
http://metagenomics.iiserb.ac.in/16Sclassifier
http://metabiosys.iiserb.ac.in/16Sclassifier
Instructions for running the stand-alone version of 16S Classifier on the Linux PC.
1. User can download a zip file of a particular hypervariable region or complete 16S, which is freely available at 
http://metagenomics.iiserb.ac.in/16Sclassifier/download.html
2. Extract the zipped file which contains a model file (*.Rdata), a script file (*.sh) and an exe file (16sclassifier.exe).

Other dependencies

1. User has to install R from the following link 
http://cran.r-project.org/
2. install Random forest by typing the following commands in terminal  R  and install.packages ('randomForest')


Command line usage./16sclassifier.exe 'queryfile' 'modelname'

The query file should be in Fasta format and the model name could be v2, v3, v4, v5, v6, v7, v8, v23, v34, v35, v45, v56, v67, v78 and Complete16S.