Thursday 19 March 2015

Composition based methods for taxonomic classification

Taxonomic classification of 16S rRNA or metagenomic sequencing reads is one of the most important steps in order to understand the diversity of microbes within a microbial community. Methods for taxonomic classification have been divided into two major categories, based on their algorithms, composition based methods and similarity-based methods. Here, I have discussed in detail about composition based methods. Most probably in my next post, I would like to focus on similarity-based approaches. 

1. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences.

ETRA provides a statistical analysis of tetranucleotide usage patterns in genomic fragments, either via a web-service or a stand-alone program.

2. PhyloPathia: Accurate phylogenetic classification of variable-length DNA fragments.
PhyloPythia, a composition-based classifier that combines higher-level generic clades from a set of 340 completed genomes with sample-derived population models. Extensive analyses on synthetic and real metagenome data sets showed that PhyloPythia allows the accurate classification of most sequence fragments across all considered taxonomic ranks, even for unknown organisms.
Available at:
http://cbcsrv.watson.ibm.com/phylopythia.html

3. PhyloPathiaS: The PhyloPythiaS Web Server for Taxonomic Assignment of Metagenome Sequences.
PhyloPythiaS is a fast and accurate sequence composition-based classifier that utilizes the hierarchical relationships between clade. PhyloPythiaS is freely available for non-commercial users and can be installed on a Linux-based machine.

4. TACOA: Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach
The classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning. It is an accurate multi-class taxonomic classifier for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp.
http://www.biomedcentral.com/1471-2105/10/56#B20

5. RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles
RAIphy is a composition-based semisupervised binning algorithm that uses a novel sequence similarity metric with iterative refinement of taxonomic models and functions effectively. RAIphy has been implemented as a simple, compact standalone desktop application, which is fast compared to similarity-search-based applications. While achieving competitive binning accuracies for the DNA sequencing read length range (100-1000 bp), the method also performs accurately for longer environmental contigs.

6. NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads.
A webserver that implements the naïve Bayes classifier (NBC) to classify all metagenomic reads to their best taxonomic match. Results indicate that NBC can assign next-generation sequencing reads to their taxonomic classification and can find significant populations of genera that other classifiers may miss.

7. Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models

Phymm, a classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 bp, representing a substantial leap forward over previous composition-based classification methods. They also describe how combining Phymm with sequence alignment algorithms, further improves accuracy.

8. GSTaxClassifier: a genomic signature based taxonomic classifier for metagenomic data analysis.
GSTaxClassifier takes input nucleotide sequences and using a modified Bayesian model evaluates the genomic signatures between metagenomic query sequences and reference genome databases. The simulation studies of a numerical data sets showed that GSTaxClassifier could serve as a useful program for metagenomics studies.

9. SPHINX: an algorithm for taxonomic binning of metagenomic sequences.
A hybrid binning approach (SPHINX) that achieves high binning efficiency by utilizing the principles of both 'composition'- and 'alignment'-based binning algorithms.

10. TAC-ELM: Metagenomic taxonomic classification using extreme learning machines.
A new sequence composition-based taxonomic classifier using extreme learning machines referred to as TAC-ELM for metagenomic analysis. TAC-ELM uses the framework of extreme learning machines to quickly and accurately learn the weights for a neural network model. The input features consist of GC content and oligonucleotides.

11. AKE - the Accelerated k-mer Exploration web-tool for rapid taxonomic classification and visualization
Acceleration in AKE’s taxonomic assignments is achieved by a special machine learning architecture, which is well suited to model data collections that are intrinsically hierarchical. 

12. TAXSOM: Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics.
Biodiversity is usually targeted by classifying 16S ribosomal RNA genes, while metagenomic approaches target metabolic genes. However, both approaches remain isolated, as long as the taxonomic and functional information cannot be interrelated. Techniques like self-organizing maps (SOMs) have been applied to cluster metagenomes into taxon-specific bins in order to link biodiversity with functions.

13. Kraken: ultrafast metagenomic sequence classification using exact alignments

Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. 

14. RDP Classifier: Naive Bayesian classifier for rapid assignment of rRNAsequences into the new bacterial taxonomy.

The Ribosomal Database Project (RDP) Classifier, a naïve Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryote. For shorter rRNA segments, such as those that might be generated by pyrosequencing, the error rate varied greatly over the length of the 16S rRNA gene. The RDPClassifier is suitable both for the analysis of single rRNA sequences and for the analysis of libraries of thousands of sequences.
Available at: http://rdp.cme.msu.edu/

15. 16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets

16S Classifier is developed using a machine learning method, Random Forest, for faster and accurate taxonomic classification of short hypervariable regions of 16S rRNA sequence. It displayed precision values of up to 0.91 on training datasets and the precision values of up to 0.98 on the test dataset. On real metagenomic datasets, it showed up to 99.7% accuracy at the phylum level and up to 99.0% accuracy at the genus level.

No comments:

Post a Comment