Environmental research has been revolutionized by sequencing technologies that generates huge amount of data by directly sequencing of DNA from environment. Analysis of the data using computational biology provides valuable insights about novel microorganisms, their functions and metabolic pathways. This blog aims to enhance understanding of different tools, algorithms and pipelines for the study of microbial diversity in different environment. Contact me: ashoks773@gmail.com
Showing posts with label Machine Learning. Show all posts
Showing posts with label Machine Learning. Show all posts
Tuesday, 29 August 2017
Monday, 2 November 2015
Machine learning for metagenomics
This review covers most of the machine learning methods used till now for the analysis of metagenomic data. You can read the full text from the link below.
http://arxiv.org/pdf/1510.06621v1.pdf
http://arxiv.org/pdf/1510.06621v1.pdf
Tuesday, 14 April 2015
Woods: A fast and accurate functional annotator and classifier of genomic and metagenomic sequences
Time to analyse
your data using woods "a functional annotator and classifier of
genomic and metagenomic sequences"
A recent
publication out from our lab. Please explore and write back to
(ashok@iiserb.ac.in) in case of any problem. Comments are
welcome.
Available
at: http://www.ncbi.nlm.nih.gov/pubmed/25863333
Thursday, 19 March 2015
Composition based methods for taxonomic classification
Taxonomic classification of 16S rRNA or metagenomic sequencing reads is one of the most important steps in order to understand the diversity of microbes within a microbial community. Methods for taxonomic classification have been divided into two major categories, based on their algorithms, composition based methods and similarity-based methods. Here, I have discussed in detail about composition based methods. Most probably in my next post, I would like to focus on similarity-based approaches.
1. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences.
ETRA provides a statistical analysis of tetranucleotide usage patterns in genomic fragments, either via a web-service or a stand-alone program.
Available at: http://www.megx.net/tetra
2. PhyloPathia: Accurate phylogenetic classification of variable-length DNA fragments.
PhyloPythia, a composition-based classifier that combines higher-level generic clades from a set of 340 completed genomes with sample-derived population models. Extensive analyses on synthetic and real metagenome data sets showed that PhyloPythia allows the accurate classification of most sequence fragments across all considered taxonomic ranks, even for unknown organisms.
Available at: http://cbcsrv.watson.ibm.com/phylopythia.html
Available at: http://cbcsrv.watson.ibm.com/phylopythia.html
PhyloPythiaS is a fast and accurate sequence composition-based classifier that utilizes the hierarchical relationships between clade. PhyloPythiaS is freely available for non-commercial users and can be installed on a Linux-based machine.
4. TACOA: Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach
The classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning. It is an accurate multi-class taxonomic classifier for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp.
http://www.biomedcentral.com/1471-2105/10/56#B20
5. RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles
RAIphy is a composition-based semisupervised binning algorithm that uses a novel sequence similarity metric with iterative refinement of taxonomic models and functions effectively. RAIphy has been implemented as a simple, compact standalone desktop application, which is fast compared to similarity-search-based applications. While achieving competitive binning accuracies for the DNA sequencing read length range (100-1000 bp), the method also performs accurately for longer environmental contigs.
Available at: http://bioinfo.unl.edu/raiphy.php
6. NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads.
A webserver that implements the naïve Bayes classifier (NBC) to classify all metagenomic reads to their best taxonomic match. Results indicate that NBC can assign next-generation sequencing reads to their taxonomic classification and can find significant populations of genera that other classifiers may miss.
Available at: http://nbc.ece.drexel.edu
7. Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models
Phymm, a classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 bp, representing a substantial leap forward over previous composition-based classification methods. They also describe how combining Phymm with sequence alignment algorithms, further improves accuracy.
Available at: http://www.cbcb.umd.edu/software/phymm/
8. GSTaxClassifier: a genomic signature based taxonomic classifier for metagenomic data analysis.
GSTaxClassifier takes input nucleotide sequences and using a modified Bayesian model evaluates the genomic signatures between metagenomic query sequences and reference genome databases. The simulation studies of a numerical data sets showed that GSTaxClassifier could serve as a useful program for metagenomics studies.
Available at: http://helix2.biotech.ufl.edu:26878/metagenomics/
9. SPHINX: an algorithm for taxonomic binning of metagenomic sequences.
A hybrid binning approach (SPHINX) that achieves high binning efficiency by utilizing the principles of both 'composition'- and 'alignment'-based binning algorithms.
Available at: http://metagenomics.atc.tcs.com/SPHINX/
10. TAC-ELM: Metagenomic taxonomic classification using extreme learning machines.
A new sequence composition-based taxonomic classifier using extreme learning machines referred to as TAC-ELM for metagenomic analysis. TAC-ELM uses the framework of extreme learning machines to quickly and accurately learn the weights for a neural network model. The input features consist of GC content and oligonucleotides.
Available at: http://www.cs.gmu.edu/~mlbio/TAC-ELM/
11. AKE - the Accelerated k-mer Exploration web-tool for rapid taxonomic classification and visualization
Acceleration in AKE’s taxonomic assignments is achieved by a special machine learning architecture, which is well suited to model data collections that are intrinsically hierarchical.
Available at: https://ani.cebitec.uni-bielefeld.de/ake/login.html
12. TAXSOM: Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics.
Biodiversity is usually targeted by classifying 16S ribosomal RNA genes, while metagenomic approaches target metabolic genes. However, both approaches remain isolated, as long as the taxonomic and functional information cannot be interrelated. Techniques like self-organizing maps (SOMs) have been applied to cluster metagenomes into taxon-specific bins in order to link biodiversity with functions.
Available at: http://soma.arb-silva.de/
13. Kraken: ultrafast metagenomic sequence classification using exact alignments
Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program.
Available at: http://ccb.jhu.edu/software/kraken/
14. RDP Classifier: Naive Bayesian classifier for rapid assignment of rRNAsequences into the new bacterial taxonomy.
The Ribosomal Database Project (RDP) Classifier, a naïve Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryote. For shorter rRNA segments, such as those that might be generated by pyrosequencing, the error rate varied greatly over the length of the 16S rRNA gene. The RDPClassifier is suitable both for the analysis of single rRNA sequences and for the analysis of libraries of thousands of sequences.
Available at: http://rdp.cme.msu.edu/
15. 16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets
16S Classifier is developed using a machine learning method, Random Forest, for faster and accurate taxonomic classification of short hypervariable regions of 16S rRNA sequence. It displayed precision values of up to 0.91 on training datasets and the precision values of up to 0.98 on the test dataset. On real metagenomic datasets, it showed up to 99.7% accuracy at the phylum level and up to 99.0% accuracy at the genus level.
Available at: http://metagenomics.iiserb.ac.in/16Sclassifier
Sunday, 8 February 2015
16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets
Time to analyze your 16s rRNA data using 16S Classifier
A recent publication out from our lab. Please explore and write back to (ashok@iiserb.ac.in) in case of any problem. Comments are welcome.
To the best of our knowledge, 16S Classifier is the only available tool which can carry out the efficient, sensitive and accurate taxonomic assignment of any of the 16S rRNA hypervariable regions which are commonly used in metagenomic projects. In the case of complete 16S rRNA also, it displayed exceptional (precision of 0.97) performance on the test dataset. Thus, the wide usage of this tool is anticipated in different metagenomic projects. 16S Classifier is available freely at
http://metagenomics.iiserb.ac.in/16Sclassifier
http://metabiosys.iiserb.ac.in/16Sclassifier
http://metagenomics.iiserb.ac.in/16Sclassifier
http://metabiosys.iiserb.ac.in/16Sclassifier
Instructions for running the stand-alone version of 16S Classifier on the Linux PC.
1. User can download a zip file of a particular hypervariable region or complete 16S, which is freely available at http://metagenomics.iiserb.ac.in/16Sclassifier/download.html
2. Extract the zipped file which contains a model file (*.Rdata), a script file (*.sh) and an exe file (16sclassifier.exe).
Other dependencies
1. User has to install R from the following link http://cran.r-project.org/
2. install Random forest by typing the following commands in terminal R and install.packages ('randomForest')
1. User can download a zip file of a particular hypervariable region or complete 16S, which is freely available at http://metagenomics.iiserb.ac.in/16Sclassifier/download.html
2. Extract the zipped file which contains a model file (*.Rdata), a script file (*.sh) and an exe file (16sclassifier.exe).
Other dependencies
1. User has to install R from the following link http://cran.r-project.org/
2. install Random forest by typing the following commands in terminal R and install.packages ('randomForest')
Command line usage./16sclassifier.exe 'queryfile' 'modelname'
The query file should be in Fasta format and the model name could be v2, v3, v4, v5, v6, v7, v8, v23, v34, v35, v45, v56, v67, v78 and Complete16S.
The query file should be in Fasta format and the model name could be v2, v3, v4, v5, v6, v7, v8, v23, v34, v35, v45, v56, v67, v78 and Complete16S.
Subscribe to:
Comments (Atom)