Understanding Computational Biology

Environmental research has been revolutionized by sequencing technologies that generates huge amount of data by directly sequencing of DNA from environment. Analysis of the data using computational biology provides valuable insights about novel microorganisms, their functions and metabolic pathways. This blog aims to enhance understanding of different tools, algorithms and pipelines for the study of microbial diversity in different environment. Contact me: ashoks773@gmail.com

Thursday, 19 March 2015

Composition based methods for taxonomic classification

Taxonomic classification of 16S rRNA or metagenomic sequencing reads is one of the most important steps in order to understand the diversity of microbes within a microbial community. Methods for taxonomic classification have been divided into two major categories, based on their algorithms, composition based methods and similarity-based methods. Here, I have discussed in detail about composition based methods. Most probably in my next post, I would like to focus on similarity-based approaches.

1. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences.

ETRA provides a statistical analysis of tetranucleotide usage patterns in genomic fragments, either via a web-service or a stand-alone program.

Available at: http://www.megx.net/tetra

http://www.biomedcentral.com/1471-2105/5/163

2. PhyloPathia: Accurate phylogenetic classification of variable-length DNA fragments.

PhyloPythia, a composition-based classifier that combines higher-level generic clades from a set of 340 completed genomes with sample-derived population models. Extensive analyses on synthetic and real metagenome data sets showed that PhyloPythia allows the accurate classification of most sequence fragments across all considered taxonomic ranks, even for unknown organisms.
Available at: http://cbcsrv.watson.ibm.com/phylopythia.html

http://www.nature.com/nmeth/journal/v4/n1/full/nmeth976.html

3. PhyloPathiaS: The PhyloPythiaS Web Server for Taxonomic Assignment of Metagenome Sequences.

PhyloPythiaS is a fast and accurate sequence composition-based classifier that utilizes the hierarchical relationships between clade. PhyloPythiaS is freely available for non-commercial users and can be installed on a Linux-based machine.

Available at: http://phylopythias.cs.uni-duesseldorf.de/index.php?phase=wait

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0038581

4. TACOA: Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach

The classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning. It is an accurate multi-class taxonomic classifier for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp.

Available at: http://www.cebitec.uni-bielefeld.de/brf/tacoa/ta coa.html

http://www.biomedcentral.com/1471-2105/10/56#B20

5. RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles

RAIphy is a composition-based semisupervised binning algorithm that uses a novel sequence similarity metric with iterative refinement of taxonomic models and functions effectively. RAIphy has been implemented as a simple, compact standalone desktop application, which is fast compared to similarity-search-based applications. While achieving competitive binning accuracies for the DNA sequencing read length range (100-1000 bp), the method also performs accurately for longer environmental contigs.

Available at: http://bioinfo.unl.edu/raiphy.php

http://www.biomedcentral.com/1471-2105/12/41

6. NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads.

A webserver that implements the naïve Bayes classifier (NBC) to classify all metagenomic reads to their best taxonomic match. Results indicate that NBC can assign next-generation sequencing reads to their taxonomic classification and can find significant populations of genera that other classifiers may miss.

Available at: http://nbc.ece.drexel.edu

http://www.ncbi.nlm.nih.gov/pubmed/21062764

7. Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models

Phymm, a classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 bp, representing a substantial leap forward over previous composition-based classification methods. They also describe how combining Phymm with sequence alignment algorithms, further improves accuracy.

Available at: http://www.cbcb.umd.edu/software/phymm/

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2762791/

8. GSTaxClassifier: a genomic signature based taxonomic classifier for metagenomic data analysis.

GSTaxClassifier takes input nucleotide sequences and using a modified Bayesian model evaluates the genomic signatures between metagenomic query sequences and reference genome databases. The simulation studies of a numerical data sets showed that GSTaxClassifier could serve as a useful program for metagenomics studies.

Available at: http://helix2.biotech.ufl.edu:26878/metagenomics/

http://www.ncbi.nlm.nih.gov/pubmed/20011152

9. SPHINX: an algorithm for taxonomic binning of metagenomic sequences.

A hybrid binning approach (SPHINX) that achieves high binning efficiency by utilizing the principles of both 'composition'- and 'alignment'-based binning algorithms.

Available at: http://metagenomics.atc.tcs.com/SPHINX/

http://www.ncbi.nlm.nih.gov/pubmed/21030462

10. TAC-ELM: Metagenomic taxonomic classification using extreme learning machines.

A new sequence composition-based taxonomic classifier using extreme learning machines referred to as TAC-ELM for metagenomic analysis. TAC-ELM uses the framework of extreme learning machines to quickly and accurately learn the weights for a neural network model. The input features consist of GC content and oligonucleotides.

Available at: http://www.cs.gmu.edu/~mlbio/TAC-ELM/

http://www.ncbi.nlm.nih.gov/pubmed/22849369

11. AKE - the Accelerated k-mer Exploration web-tool for rapid taxonomic classification and visualization

Acceleration in AKE’s taxonomic assignments is achieved by a special machine learning architecture, which is well suited to model data collections that are intrinsically hierarchical.

Available at: https://ani.cebitec.uni-bielefeld.de/ake/login.html

http://www.biomedcentral.com/1471-2105/15/384

12. TAXSOM: Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics.

Biodiversity is usually targeted by classifying 16S ribosomal RNA genes, while metagenomic approaches target metabolic genes. However, both approaches remain isolated, as long as the taxonomic and functional information cannot be interrelated. Techniques like self-organizing maps (SOMs) have been applied to cluster metagenomes into taxon-specific bins in order to link biodiversity with functions.

Available at: http://soma.arb-silva.de/

http://www.ncbi.nlm.nih.gov/pubmed/21160538

13. Kraken: ultrafast metagenomic sequence classification using exact alignments

Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program.

Available at: http://ccb.jhu.edu/software/kraken/

http://genomebiology.com/2014/15/3/R46

14. RDP Classifier: Naive Bayesian classifier for rapid assignment of rRNAsequences into the new bacterial taxonomy.

The Ribosomal Database Project (RDP) Classifier, a naïve Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryote. For shorter rRNA segments, such as those that might be generated by pyrosequencing, the error rate varied greatly over the length of the 16S rRNA gene. The RDPClassifier is suitable both for the analysis of single rRNA sequences and for the analysis of libraries of thousands of sequences.

Available at: http://rdp.cme.msu.edu/

http://www.ncbi.nlm.nih.gov/pmc/articles/pmid/17586664/

15. 16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets

16S Classifier is developed using a machine learning method, Random Forest, for faster and accurate taxonomic classification of short hypervariable regions of 16S rRNA sequence. It displayed precision values of up to 0.91 on training datasets and the precision values of up to 0.98 on the test dataset. On real metagenomic datasets, it showed up to 99.7% accuracy at the phylum level and up to 99.0% accuracy at the genus level.

Available at: http://metagenomics.iiserb.ac.in/16Sclassifier

http://metabiosys.iiserb.ac.in/16Sclassifier/application.php

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0116106

Wednesday, 11 March 2015

Tools and methods for Flux Balance Analysis

Here I have discussed various tools and methods for Flux Balance Analysis (FBA) of metabolic networks.

1. OptFlux: an open-source software platform for in silico metabolic engineering by rocha et. al.

OptFlux is an open-source and modular software aimed at being the reference computational application in the field. It allows the use of stoichiometric metabolic models for (i) phenotype simulation of both wild-type and mutant organisms, using the methods of Flux Balance Analysis, Minimization of Metabolic Adjustment or Regulatory on/off Minimization of Metabolic flux changes, (ii) Metabolic Flux Analysis, computing the admissible flux space given a set of measured fluxes, and (iii) pathway analysis through the calculation of Elementary Flux Modes.

Available at: http://www.optflux.org/

2. MetaFluxNet: the management of metabolic reaction information and quantitative metabolic flux analysis by lee et .al.

MetaFluxNet is a program package for managing information on the metabolic reaction network and for quantitatively analyzing metabolic fluxes in an interactive and customized way. It allows users to interpret and examine metabolic behavior in response to genetic and/or environmental modifications. As a result, quantitative in silico simulations of metabolic pathways can be carried out to understand the metabolic status and to design the metabolic engineering strategies. The main features of the program include a well-developed model construction environment, user-friendly interface for metabolic flux analysis (MFA), comparative MFA of strains having different genotypes under various environmental conditions, and automated pathway layout creation.

Availbale at: http://metafluxnet.kaist.ac.kr/

3. BioOpt:

BioOpt is a software application running on Windows command prompt. The program focuses on the flux balance analysis, using linear programming as the mathematical support. Given a biological system model, which includes a set of metabolic reactions, the program is able to calculate all internal mass balance fluxes, reduced costs and shadow prices depending on the constraints and objective defined by the user. Running BioOpt with different parameters allows the user to obtain several kinds of outputs that can help in the analysis of the system.

Available at: http://129.16.106.142/tools.php?c=bioopt

4. SurreyFBA: A command line tool and graphics user interface for constraint based modelling of genome scale metabolic reaction networks.

SurreyFBA, which provides constraint-based simulations and network map visualization in a free, stand-alone software. It is based on a command line interface to the GLPK solver distributed as binary and source code for the three major operating systems. SurreyFBA includes JyMet, a graphics user interface allowing spreadsheet based model presentation, visualization of numerical results on metabolic networks represented in the Petri net convention, as well as in charts and plots.

Available at: http://sysbio3.fhms.surrey.ac.uk/SurreyFBA.zip

5. FASIMU: FBA simulation software for metabolomics, fluxomics, and biotechnology

FASIMU, a command line oriented software implementing the most frequently applied FBA algorithms. Moreover, it offers the first freely available implementation of (i) weighted flux minimization, (ii) fitness maximization for partially inhibited enzymes, and (iii) the concentration-based thermodynamic feasibility constraint. It allows heterogenous computation series suited for network pruning, leak analysis, FVA, and systematic probing of metabolic objectives for network curation controlled by an intuitive description file. The metabolic network can be supplied in SBML, CellNetAnalyzer, and plain text format. FASIMU uses the optimization capabilities of free (lp solve and GLPK) and commercial solvers (CPLEX, LINDO). The results can be visualized in Cytoscape or BiNA using newly developed plugins.

Available at: http://www.bioinformatics.org/fasimu/downloads/

6. GEMSiRV: A software platform for GEnome-scale Metabolic model Simulation, Reconstruction and Visualization

GEMSiRV comes with downloadable, ready-to-use public-domain metabolic models, reference metabolite/reaction databases, and metabolic network maps, all of which can be input into GEMSiRV as the starting materials for network construction or simulation analyses. Furthermore, all of the GEMSiRV-generated metabolic models and analysis results, including projects in progress, can be easily exchanged in the research community. GEMSiRV is a powerful integrative resource that may facilitate the development of systems biology studies.

Available at: http://sb.nhri.org.tw/GEMSiRV/en/GEMSiRV

7. CellNetAnalyzer: Structural and Functional Analysis of Cellular Networks
CellNetAnalyzer (CNA) is a MATLAB toolbox providing a graphical user interface and various (partially unique) computational methods and algorithms for exploring structural and functional properties of metabolic, signaling, and regulatory networks.

Metabolic networks are formalized and analyzed by stoichiometric and constraint-based modeling techniques, including flux balance analysis (FBA), metabolic flux analysis, elementary-modes analysis, minimal cut set analysis, and many more. Several algorithms are provided for computational strain design / metabolic engineering.

Available at: http://www2.mpi-magdeburg.mpg.de/projects/cna/cna.html

8. SNA--a toolbox for the stoichiometric analysis of metabolic networks.

SNA is a Mathematica toolbox for stoichiometric network analysis. Among other things, it supports flux balance analysis and the enumeration of the elementary vectors of the flux and the conversion cone.

Available at: http://www.webcitation.org/mainframe.php

9. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox.

The COBRA Toolbox is a set of MATLAB scripts for constraint-based modeling that are run from within the MATLAB environment. These scripts depend on external libraries for reading and writing SBML-formatted models and for simulations. Additionally, some functions may require additional MATLAB Toolboxes that must be purchased from the MathWorks.

Available at: http://opencobra.sourceforge.net/openCOBRA/Install.html

10. FBA-SimVis: interactive visualization of constraint-based metabolic models.

FBA-SimVis is a VANTED Plug-in for the constraint-based analysis of metabolic models with special focus on the dynamic and visual exploration of metabolic flux data resulting from model analysis. The program provides a user-friendly environment for model reconstruction, constraint-based model analysis and dynamic visualisation of the simulation results. With the ability to quantitatively analyse metabolic fluxes in an interactive and visual manner, FBA-SimVis supports a comprehensive understanding of constraint-based metabolic flux models in both overview and detail.

Available at: http://fbasimvis.ipk-gatersleben.de/

11. MetaFlux: Construction and completion of flux balance models from pathway databases.

A multiple gap-filling method to accelerate the development of FBA models using a new tool, called MetaFlux, based on mixed integer linear programming (MILP). he method suggests corrections to the sets of reactions, biomass metabolites, nutrients and secretions. The method generates FBA models directly from Pathway/Genome Databases. Thus, FBA models developed in this framework are easily queried and visualized using the Pathway Tools software.

Available at: http://biocyc.org/download.shtml

12. CycSim—an online tool for exploring and experimenting with genome-scale metabolic models

CycSim is a web application dedicated to in silico experiments with genome-scale metabolic models coupled to the exploration of knowledge from BioCyc and KEGG. Specifically, CycSim supports the design of knockout experiments: simulation of growth phenotypes of single or multiple gene deletions mutants on specified media, comparison of these predictions with experimental phenotypes and direct visualization of both on metabolic maps. The web interface is designed for simplicity, putting constraint-based modelling techniques within easier reach of biologists. CycSim also functions as an online repository of genome-scale metabolic models.

Available at: http://www.genoscope.cns.fr/cycsim/org.nemostudio.web.gwt.App/App.html (Standalone version not available)

13. WEbcoli: an interactive and asynchronous web application for in silico design and analysis of genome-scale E.coli model.

WEbcoli is a WEb application for in silico designing, analyzing and engineering Escherichia coli metabolism. It is devised and implemented using advanced web technologies, thereby leading to enhanced usability and dynamic web accessibility. As a main feature, the WEbcoli system provides a user-friendly rich web interface, allowing users to virtually design and synthesize mutant strains derived from the genome-scale wild-type E.coli model and to customize pathways of interest through a graph editor. In addition, constraints-based flux analysis can be conducted for quantifying metabolic fluxes and charactering the physiological and metabolic states under various genetic and/or environmental conditions.

Available at: http://webcoli.org (Standalone version not available)

14. RAST/Model SEED genome-scale metabolic reconstruction pipeline:

RAST and the Model SEED framework were developed as a means of automatically producing annotations and draft genome-scale metabolic models. They break down the model reconstruction process into eight steps: submitting a genome sequence to RAST, annotating the genome, curating the annotation, submitting the annotation to Model SEED, reconstructing the core model, generating the draft biomass reaction, auto-completing the model, and curating the model. Each of these eight steps is documented in detail.

Availbale at: http://seed-viewer.theseed.org/seedviewer.cgi?page=ModelView (Standalone version not available)

15. MicrobesFlux: a web platform for drafting metabolic models from the KEGG database:

MicrobesFlux is an installation-free and open-source platform that enables biologists without prior programming knowledge to develop metabolic models for annotated microorganisms in the KEGG database. Our system facilitates users to reconstruct metabolic networks of organisms based on experimental information. Through human-computer interaction, MicrobesFlux provides users with reasonable predictions of microbial metabolism via flux balance analysis. This prototype platform can be a springboard for advanced and broad-scope modeling of complex biological systems by integrating other “omics” data or 13 C- metabolic flux analysis results.

Available at: http://tanglab.engineering.wustl.edu/static/MicrobesFlux.html (Standalone version not available)

16. FAME the Flux Analysis and Modeling Environment:

The Flux Analysis and Modeling Environment (FAME) is the first web-based modeling tool that combines the tasks of creating, editing, running, and analyzing/visualizing stoichiometric models into a single program. Analysis results can be automatically superimposed on familiar KEGG-like maps.

Available at: http://f-a-m-e.org/ajax/page1.php (Standalone version not available)