Understanding Computational Biology

Environmental research has been revolutionized by sequencing technologies that generates huge amount of data by directly sequencing of DNA from environment. Analysis of the data using computational biology provides valuable insights about novel microorganisms, their functions and metabolic pathways. This blog aims to enhance understanding of different tools, algorithms and pipelines for the study of microbial diversity in different environment. Contact me: ashoks773@gmail.com

Thursday, 19 March 2015

Composition based methods for taxonomic classification

Taxonomic classification of 16S rRNA or metagenomic sequencing reads is one of the most important steps in order to understand the diversity of microbes within a microbial community. Methods for taxonomic classification have been divided into two major categories, based on their algorithms, composition based methods and similarity-based methods. Here, I have discussed in detail about composition based methods. Most probably in my next post, I would like to focus on similarity-based approaches.

1. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences.

ETRA provides a statistical analysis of tetranucleotide usage patterns in genomic fragments, either via a web-service or a stand-alone program.

Available at: http://www.megx.net/tetra

http://www.biomedcentral.com/1471-2105/5/163

2. PhyloPathia: Accurate phylogenetic classification of variable-length DNA fragments.

PhyloPythia, a composition-based classifier that combines higher-level generic clades from a set of 340 completed genomes with sample-derived population models. Extensive analyses on synthetic and real metagenome data sets showed that PhyloPythia allows the accurate classification of most sequence fragments across all considered taxonomic ranks, even for unknown organisms.
Available at: http://cbcsrv.watson.ibm.com/phylopythia.html

http://www.nature.com/nmeth/journal/v4/n1/full/nmeth976.html

3. PhyloPathiaS: The PhyloPythiaS Web Server for Taxonomic Assignment of Metagenome Sequences.

PhyloPythiaS is a fast and accurate sequence composition-based classifier that utilizes the hierarchical relationships between clade. PhyloPythiaS is freely available for non-commercial users and can be installed on a Linux-based machine.

Available at: http://phylopythias.cs.uni-duesseldorf.de/index.php?phase=wait

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0038581

4. TACOA: Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach

The classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning. It is an accurate multi-class taxonomic classifier for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp.

Available at: http://www.cebitec.uni-bielefeld.de/brf/tacoa/ta coa.html

http://www.biomedcentral.com/1471-2105/10/56#B20

5. RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles

RAIphy is a composition-based semisupervised binning algorithm that uses a novel sequence similarity metric with iterative refinement of taxonomic models and functions effectively. RAIphy has been implemented as a simple, compact standalone desktop application, which is fast compared to similarity-search-based applications. While achieving competitive binning accuracies for the DNA sequencing read length range (100-1000 bp), the method also performs accurately for longer environmental contigs.

Available at: http://bioinfo.unl.edu/raiphy.php

http://www.biomedcentral.com/1471-2105/12/41

6. NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads.

A webserver that implements the naïve Bayes classifier (NBC) to classify all metagenomic reads to their best taxonomic match. Results indicate that NBC can assign next-generation sequencing reads to their taxonomic classification and can find significant populations of genera that other classifiers may miss.

Available at: http://nbc.ece.drexel.edu

http://www.ncbi.nlm.nih.gov/pubmed/21062764

7. Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models

Phymm, a classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 bp, representing a substantial leap forward over previous composition-based classification methods. They also describe how combining Phymm with sequence alignment algorithms, further improves accuracy.

Available at: http://www.cbcb.umd.edu/software/phymm/

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2762791/

8. GSTaxClassifier: a genomic signature based taxonomic classifier for metagenomic data analysis.

GSTaxClassifier takes input nucleotide sequences and using a modified Bayesian model evaluates the genomic signatures between metagenomic query sequences and reference genome databases. The simulation studies of a numerical data sets showed that GSTaxClassifier could serve as a useful program for metagenomics studies.

Available at: http://helix2.biotech.ufl.edu:26878/metagenomics/

http://www.ncbi.nlm.nih.gov/pubmed/20011152

9. SPHINX: an algorithm for taxonomic binning of metagenomic sequences.

A hybrid binning approach (SPHINX) that achieves high binning efficiency by utilizing the principles of both 'composition'- and 'alignment'-based binning algorithms.

Available at: http://metagenomics.atc.tcs.com/SPHINX/

http://www.ncbi.nlm.nih.gov/pubmed/21030462

10. TAC-ELM: Metagenomic taxonomic classification using extreme learning machines.

A new sequence composition-based taxonomic classifier using extreme learning machines referred to as TAC-ELM for metagenomic analysis. TAC-ELM uses the framework of extreme learning machines to quickly and accurately learn the weights for a neural network model. The input features consist of GC content and oligonucleotides.

Available at: http://www.cs.gmu.edu/~mlbio/TAC-ELM/

http://www.ncbi.nlm.nih.gov/pubmed/22849369

11. AKE - the Accelerated k-mer Exploration web-tool for rapid taxonomic classification and visualization

Acceleration in AKE’s taxonomic assignments is achieved by a special machine learning architecture, which is well suited to model data collections that are intrinsically hierarchical.

Available at: https://ani.cebitec.uni-bielefeld.de/ake/login.html

http://www.biomedcentral.com/1471-2105/15/384

12. TAXSOM: Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics.

Biodiversity is usually targeted by classifying 16S ribosomal RNA genes, while metagenomic approaches target metabolic genes. However, both approaches remain isolated, as long as the taxonomic and functional information cannot be interrelated. Techniques like self-organizing maps (SOMs) have been applied to cluster metagenomes into taxon-specific bins in order to link biodiversity with functions.

Available at: http://soma.arb-silva.de/

http://www.ncbi.nlm.nih.gov/pubmed/21160538

13. Kraken: ultrafast metagenomic sequence classification using exact alignments

Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program.

Available at: http://ccb.jhu.edu/software/kraken/

http://genomebiology.com/2014/15/3/R46

14. RDP Classifier: Naive Bayesian classifier for rapid assignment of rRNAsequences into the new bacterial taxonomy.

The Ribosomal Database Project (RDP) Classifier, a naïve Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryote. For shorter rRNA segments, such as those that might be generated by pyrosequencing, the error rate varied greatly over the length of the 16S rRNA gene. The RDPClassifier is suitable both for the analysis of single rRNA sequences and for the analysis of libraries of thousands of sequences.

Available at: http://rdp.cme.msu.edu/

http://www.ncbi.nlm.nih.gov/pmc/articles/pmid/17586664/

15. 16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets

16S Classifier is developed using a machine learning method, Random Forest, for faster and accurate taxonomic classification of short hypervariable regions of 16S rRNA sequence. It displayed precision values of up to 0.91 on training datasets and the precision values of up to 0.98 on the test dataset. On real metagenomic datasets, it showed up to 99.7% accuracy at the phylum level and up to 99.0% accuracy at the genus level.

Available at: http://metagenomics.iiserb.ac.in/16Sclassifier

http://metabiosys.iiserb.ac.in/16Sclassifier/application.php

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0116106

Wednesday, 11 March 2015

Tools and methods for Flux Balance Analysis

Here I have discussed various tools and methods for Flux Balance Analysis (FBA) of metabolic networks.

1. OptFlux: an open-source software platform for in silico metabolic engineering by rocha et. al.

OptFlux is an open-source and modular software aimed at being the reference computational application in the field. It allows the use of stoichiometric metabolic models for (i) phenotype simulation of both wild-type and mutant organisms, using the methods of Flux Balance Analysis, Minimization of Metabolic Adjustment or Regulatory on/off Minimization of Metabolic flux changes, (ii) Metabolic Flux Analysis, computing the admissible flux space given a set of measured fluxes, and (iii) pathway analysis through the calculation of Elementary Flux Modes.

Available at: http://www.optflux.org/

2. MetaFluxNet: the management of metabolic reaction information and quantitative metabolic flux analysis by lee et .al.

MetaFluxNet is a program package for managing information on the metabolic reaction network and for quantitatively analyzing metabolic fluxes in an interactive and customized way. It allows users to interpret and examine metabolic behavior in response to genetic and/or environmental modifications. As a result, quantitative in silico simulations of metabolic pathways can be carried out to understand the metabolic status and to design the metabolic engineering strategies. The main features of the program include a well-developed model construction environment, user-friendly interface for metabolic flux analysis (MFA), comparative MFA of strains having different genotypes under various environmental conditions, and automated pathway layout creation.

Availbale at: http://metafluxnet.kaist.ac.kr/

3. BioOpt:

BioOpt is a software application running on Windows command prompt. The program focuses on the flux balance analysis, using linear programming as the mathematical support. Given a biological system model, which includes a set of metabolic reactions, the program is able to calculate all internal mass balance fluxes, reduced costs and shadow prices depending on the constraints and objective defined by the user. Running BioOpt with different parameters allows the user to obtain several kinds of outputs that can help in the analysis of the system.

Available at: http://129.16.106.142/tools.php?c=bioopt

4. SurreyFBA: A command line tool and graphics user interface for constraint based modelling of genome scale metabolic reaction networks.

SurreyFBA, which provides constraint-based simulations and network map visualization in a free, stand-alone software. It is based on a command line interface to the GLPK solver distributed as binary and source code for the three major operating systems. SurreyFBA includes JyMet, a graphics user interface allowing spreadsheet based model presentation, visualization of numerical results on metabolic networks represented in the Petri net convention, as well as in charts and plots.

Available at: http://sysbio3.fhms.surrey.ac.uk/SurreyFBA.zip

5. FASIMU: FBA simulation software for metabolomics, fluxomics, and biotechnology

FASIMU, a command line oriented software implementing the most frequently applied FBA algorithms. Moreover, it offers the first freely available implementation of (i) weighted flux minimization, (ii) fitness maximization for partially inhibited enzymes, and (iii) the concentration-based thermodynamic feasibility constraint. It allows heterogenous computation series suited for network pruning, leak analysis, FVA, and systematic probing of metabolic objectives for network curation controlled by an intuitive description file. The metabolic network can be supplied in SBML, CellNetAnalyzer, and plain text format. FASIMU uses the optimization capabilities of free (lp solve and GLPK) and commercial solvers (CPLEX, LINDO). The results can be visualized in Cytoscape or BiNA using newly developed plugins.

Available at: http://www.bioinformatics.org/fasimu/downloads/

6. GEMSiRV: A software platform for GEnome-scale Metabolic model Simulation, Reconstruction and Visualization

GEMSiRV comes with downloadable, ready-to-use public-domain metabolic models, reference metabolite/reaction databases, and metabolic network maps, all of which can be input into GEMSiRV as the starting materials for network construction or simulation analyses. Furthermore, all of the GEMSiRV-generated metabolic models and analysis results, including projects in progress, can be easily exchanged in the research community. GEMSiRV is a powerful integrative resource that may facilitate the development of systems biology studies.

Available at: http://sb.nhri.org.tw/GEMSiRV/en/GEMSiRV

7. CellNetAnalyzer: Structural and Functional Analysis of Cellular Networks
CellNetAnalyzer (CNA) is a MATLAB toolbox providing a graphical user interface and various (partially unique) computational methods and algorithms for exploring structural and functional properties of metabolic, signaling, and regulatory networks.

Metabolic networks are formalized and analyzed by stoichiometric and constraint-based modeling techniques, including flux balance analysis (FBA), metabolic flux analysis, elementary-modes analysis, minimal cut set analysis, and many more. Several algorithms are provided for computational strain design / metabolic engineering.

Available at: http://www2.mpi-magdeburg.mpg.de/projects/cna/cna.html

8. SNA--a toolbox for the stoichiometric analysis of metabolic networks.

SNA is a Mathematica toolbox for stoichiometric network analysis. Among other things, it supports flux balance analysis and the enumeration of the elementary vectors of the flux and the conversion cone.

Available at: http://www.webcitation.org/mainframe.php

9. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox.

The COBRA Toolbox is a set of MATLAB scripts for constraint-based modeling that are run from within the MATLAB environment. These scripts depend on external libraries for reading and writing SBML-formatted models and for simulations. Additionally, some functions may require additional MATLAB Toolboxes that must be purchased from the MathWorks.

Available at: http://opencobra.sourceforge.net/openCOBRA/Install.html

10. FBA-SimVis: interactive visualization of constraint-based metabolic models.

FBA-SimVis is a VANTED Plug-in for the constraint-based analysis of metabolic models with special focus on the dynamic and visual exploration of metabolic flux data resulting from model analysis. The program provides a user-friendly environment for model reconstruction, constraint-based model analysis and dynamic visualisation of the simulation results. With the ability to quantitatively analyse metabolic fluxes in an interactive and visual manner, FBA-SimVis supports a comprehensive understanding of constraint-based metabolic flux models in both overview and detail.

Available at: http://fbasimvis.ipk-gatersleben.de/

11. MetaFlux: Construction and completion of flux balance models from pathway databases.

A multiple gap-filling method to accelerate the development of FBA models using a new tool, called MetaFlux, based on mixed integer linear programming (MILP). he method suggests corrections to the sets of reactions, biomass metabolites, nutrients and secretions. The method generates FBA models directly from Pathway/Genome Databases. Thus, FBA models developed in this framework are easily queried and visualized using the Pathway Tools software.

Available at: http://biocyc.org/download.shtml

12. CycSim—an online tool for exploring and experimenting with genome-scale metabolic models

CycSim is a web application dedicated to in silico experiments with genome-scale metabolic models coupled to the exploration of knowledge from BioCyc and KEGG. Specifically, CycSim supports the design of knockout experiments: simulation of growth phenotypes of single or multiple gene deletions mutants on specified media, comparison of these predictions with experimental phenotypes and direct visualization of both on metabolic maps. The web interface is designed for simplicity, putting constraint-based modelling techniques within easier reach of biologists. CycSim also functions as an online repository of genome-scale metabolic models.

Available at: http://www.genoscope.cns.fr/cycsim/org.nemostudio.web.gwt.App/App.html (Standalone version not available)

13. WEbcoli: an interactive and asynchronous web application for in silico design and analysis of genome-scale E.coli model.

WEbcoli is a WEb application for in silico designing, analyzing and engineering Escherichia coli metabolism. It is devised and implemented using advanced web technologies, thereby leading to enhanced usability and dynamic web accessibility. As a main feature, the WEbcoli system provides a user-friendly rich web interface, allowing users to virtually design and synthesize mutant strains derived from the genome-scale wild-type E.coli model and to customize pathways of interest through a graph editor. In addition, constraints-based flux analysis can be conducted for quantifying metabolic fluxes and charactering the physiological and metabolic states under various genetic and/or environmental conditions.

Available at: http://webcoli.org (Standalone version not available)

14. RAST/Model SEED genome-scale metabolic reconstruction pipeline:

RAST and the Model SEED framework were developed as a means of automatically producing annotations and draft genome-scale metabolic models. They break down the model reconstruction process into eight steps: submitting a genome sequence to RAST, annotating the genome, curating the annotation, submitting the annotation to Model SEED, reconstructing the core model, generating the draft biomass reaction, auto-completing the model, and curating the model. Each of these eight steps is documented in detail.

Availbale at: http://seed-viewer.theseed.org/seedviewer.cgi?page=ModelView (Standalone version not available)

15. MicrobesFlux: a web platform for drafting metabolic models from the KEGG database:

MicrobesFlux is an installation-free and open-source platform that enables biologists without prior programming knowledge to develop metabolic models for annotated microorganisms in the KEGG database. Our system facilitates users to reconstruct metabolic networks of organisms based on experimental information. Through human-computer interaction, MicrobesFlux provides users with reasonable predictions of microbial metabolism via flux balance analysis. This prototype platform can be a springboard for advanced and broad-scope modeling of complex biological systems by integrating other “omics” data or 13 C- metabolic flux analysis results.

Available at: http://tanglab.engineering.wustl.edu/static/MicrobesFlux.html (Standalone version not available)

16. FAME the Flux Analysis and Modeling Environment:

The Flux Analysis and Modeling Environment (FAME) is the first web-based modeling tool that combines the tasks of creating, editing, running, and analyzing/visualizing stoichiometric models into a single program. Analysis results can be automatically superimposed on familiar KEGG-like maps.

Available at: http://f-a-m-e.org/ajax/page1.php (Standalone version not available)

Friday, 27 February 2015

Databases, Software and Tools for metabolic pathway/network reconstructions

Here I am providing the details about the available tools and methods for the metabolic pathway reconstruction.

1. The RAVEN Toolbox and Its Use for Generating a Genome-scale Metabolic Model for Penicillium chrysogenum:

RAVEN (Reconstruction, Analysis and Visualization of Metabolic Networks) Toolbox: a software suite that allows for semi-automated reconstruction of genome-scale models. It makes use of published models and/or the KEGG database, coupled with extensive gap-filling and quality control features. The software suite also contains methods for visualizing simulation results and omics data, as well as a range of methods for performing simulations and analyzing the results.

Available at: http://129.16.106.142/downloads.php

2. RAST/Model SEED genome-scale metabolic reconstruction pipeline:

RAST and the Model SEED framework were developed as a means of automatically producing annotations and draft genome-scale metabolic models. They break down the model reconstruction process into eight steps: submitting a genome sequence to RAST, annotating the genome, curating the annotation, submitting the annotation to Model SEED, reconstructing the core model, generating the draft biomass reaction, auto-completing the model, and curating the model. Each of these eight steps is documented in detail.

Availbale at: http://seed-viewer.theseed.org/seedviewer.cgi?page=ModelView (Standalone version not available)

3. MicrobesFlux: a web platform for drafting metabolic models from the KEGG database:

MicrobesFlux is an installation-free and open-source platform that enables biologists without prior programming knowledge to develop metabolic models for annotated microorganisms in the KEGG database. Our system facilitates users to reconstruct metabolic networks of organisms based on experimental information. Through human-computer interaction, MicrobesFlux provides users with reasonable predictions of microbial metabolism via flux balance analysis. This prototype platform can be a springboard for advanced and broad-scope modeling of complex biological systems by integrating other “omics” data or 13 C- metabolic flux analysis results.

Available at: http://tanglab.engineering.wustl.edu/static/MicrobesFlux.html (Standalone version not available)

4. FAME the Flux Analysis and Modeling Environment:

The Flux Analysis and Modeling Environment (FAME) is the first web-based modeling tool that combines the tasks of creating, editing, running, and analyzing/visualizing stoichiometric models into a single program. Analysis results can be automatically superimposed on familiar KEGG-like maps.

Available at: http://f-a-m-e.org/ajax/page1.php (Standalone version not available)

5. Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology:

Pathway Tools is a production-quality software environment for creating a type of model-organism database called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc integrates the evolving understanding of the genes, proteins, metabolic network and regulatory network of an organism.

Available at: http://bioinformatics.ai.sri.com/ptools/

6. BioModels Database:

BioModels Database serves as a huge repository of computational models of genomes and different biological processes. It hosts models described in peer-reviewed scientific literature and automatically generated models from pathway resources (Path2Models). Models collected from literature are manually curated and semantically enriched with cross-references from external data resources. The database resource allows scientific community to store, search and retrieve mathematical models of their interest. In addition, features such as generation of sub-models, online simulation, conversion of models into different representational formats, and programmatic access via web services, are also provided.

Available at: http://www.ebi.ac.uk/biomodels-main/

7. GEMSiRV: A software platform for GEnome-scale Metabolic model Simulation, Reconstruction and Visualization

GEMSiRV comes with downloadable, ready-to-use public-domain metabolic models, reference metabolite/reaction databases, and metabolic network maps, all of which can be input into GEMSiRV as the starting materials for network construction or simulation analyses. Furthermore, all of the GEMSiRV-generated metabolic models and analysis results, including projects in progress, can be easily exchanged in the research community. GEMSiRV is a powerful integrative resource that may facilitate the development of systems biology studies.

Available at: http://sb.nhri.org.tw/GEMSiRV/en/GEMSiRV

8. Metashark: software for automated metabolic network prediction from DNA sequence and its application to the genomes of Plasmodium falciparum and Eimeria tenella.

The metabolic SearcH And Reconstruction Kit (metaSHARK) is a new fully automated software package for the detection of enzyme-encoding genes within unannotated genome data and their visualization in the context of the surrounding metabolic network.

Available at:

9. The SuBliMinaL Toolbox: automating steps in the reconstruction of metabolic networks.

The SuBliMinaL Toolbox (http://www.mcisb.org/subliminal/) facilitates the reconstruction process by providing a number of independent modules to perform common tasks, such as generating draft reconstructions, determining metabolite protonation state, mass and charge balancing reactions, suggesting intracellular compartmentalisation, adding transport reactions and a biomass function, and formatting the reconstruction to be used in third-party analysis packages.

Available at: http://www.mcisb.org/resources/subliminal/

Sunday, 8 February 2015

16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets

Time to analyze your 16s rRNA data using 16S Classifier

A recent publication out from our lab. Please explore and write back to (ashok@iiserb.ac.in) in case of any problem. Comments are welcome.

http://journals.plos.org/plosone/articleid=10.1371/journal.pone.0116106

To the best of our knowledge, 16S Classifier is the only available tool which can carry out the efficient, sensitive and accurate taxonomic assignment of any of the 16S rRNA hypervariable regions which are commonly used in metagenomic projects. In the case of complete 16S rRNA also, it displayed exceptional (precision of 0.97) performance on the test dataset. Thus, the wide usage of this tool is anticipated in different metagenomic projects. 16S Classifier is available freely at
http://metagenomics.iiserb.ac.in/16Sclassifier
http://metabiosys.iiserb.ac.in/16Sclassifier

Instructions for running the stand-alone version of 16S Classifier on the Linux PC.
1. User can download a zip file of a particular hypervariable region or complete 16S, which is freely available at http://metagenomics.iiserb.ac.in/16Sclassifier/download.html
2. Extract the zipped file which contains a model file (*.Rdata), a script file (*.sh) and an exe file (16sclassifier.exe).
Other dependencies
1. User has to install R from the following link http://cran.r-project.org/
2. install Random forest by typing the following commands in terminal R and install.packages ('randomForest')

Command line usage./16sclassifier.exe 'queryfile' 'modelname'

The query file should be in Fasta format and the model name could be v2, v3, v4, v5, v6, v7, v8, v23, v34, v35, v45, v56, v67, v78 and Complete16S.

Monday, 17 February 2014

Editing Multiple Files in Linux

Editing multiple file in linux can enhance user's working speed , here I am discussing some basic commands which may be helpful for all of us. Please download the file from here.

Editing multiple files in linux for advanced users

Here the list of some commands by using this you can directly switch between multiple files. This is very useful for the frequent linux users, it can enhance your working speed.

1. Put all the file names with vi

vi file1 file2 .... (you can write name of files you want to edit or write something)

In terminal first file will open, by using normal vi commands like Esc : i you can easily insert, anything

Esc :w (for write the file)

:n (you can switch to the next file)

Esc :wq (for save all the files)

2. No need to put all the names of the file, first you give only single file name

vi file1 (you can edit this file using normal vi commands)

Esc :w (for write the file)

:e file2 (file2 is the name of file you want to open after file1)

3. vi remember two file names at the same time current and alternative file names,

symbol % (current file name)

symbol # (alternative)

vi file1 file2

you wrote something in file1 using Esc -i, Esc :w write, than you move to the next file either by :n or :e here you made some changes in file2, but you don't want to save current changes and want to go back in first file, for this you can use the following command.

:e!# this command will discard your edits in the current file and return to the last saved version of the current file (file1)

:w %.new this command will make copy of your current edited file with suffix .new (output: file1.new)

You want to find and replace one string in multiple files in the same directory, you can do it with the following command in vi

vi *.txt (to open all the files)

:argdo %s/findme/replaceme/g | wq

you can use sed command also for the same

sed -i 's/findme/replaceme/g' *.txt

Tuesday, 11 February 2014

Basic commands for linux

Linux commands are very useful for file handling, here I am discussing some basic commands for linux. Please download printable version here.

***** cut ****

cut command is used to extract bytes/characters/fields (separated by the delimiter) from each line when input used as a file.

Syntax

cut [-b] [-c] [-f ] [-d] [file] (User can use any of the option with or without delimiter)

-b: give the range of the bytes which will be returned

-c: to print characters of line by position you can use the particular position or range for example

cut -c2 file (Outputs the second character of every line of the file)

cut -c1,4 file (Outputs the first & fourth character from the every line of the file)

cut -c2-5 file (Outputs the second to fifth character of every line of the file)

cut -c-5 file (Outputs the first five characters, b/c only last position defined )

cut -c2- file (Outputs from second to last, because only first position defined)

-f: to print specific fields separated by delimiter like space(' '), tab('\t'), :, ; etc

-d : option for delimiter

cut -d' ' -f3 file (Outputs third field in each line by treating space as a delimiter)

cut -d' ' -f3,4 file (Outputs more fields, by specifying the fields positions)

cut -d' ' -f2-5 file (Outputs fields range, by specifying the fields positions)

cut -d' ' -f-5 file (Outputs the first five fields, b/c only last position defined )

cut -d' ' -f2- file (Outputs second to last field, b/c only first position defined )

**** comm ****

comm command is used for comparing the two randomly sorted files line by line.

Syntax

comm [options] file1 file2

comm file1 file2 (Outputs three column, first column contains unique in first, second column contains unique in second, and third column contains common in both files)

comm -12 file1 file2 (Output common in both files, here -12 suppress the first and second column)

comm -13 file1 file2 (Output unique in second file, here -13 suppress the first and third column)

comm -23 file1 file2 (Output unique in first file, here -23 suppress the second and third column)

**** paste and cat ****

These two commands used for combining two or more files.

paste file1 file2 file3.... (Output content of all files in a single file pasted side by side)

cat file1 file2 file3 ... (Output content of all files in a single file pasted below the file1 in the same order)

**** rm ****

rm command is used for removing the files.

Syntax

rm [options] file

rm file (remove file from the file system)

rm -i file (prompt before removing the file)

rm -I file1 file2 file3 ... (prompt once before removing more than three files)

rm -fr file (remove entire thing recursively)

rm -r directory/ (remove directories and their contents recursively)

******* The End ******

Saturday, 8 February 2014

Basic commands for the Linux vi Editor

vi is a powerful screen oriented text editor for Linux/Unix operating system, that's why very useful for file handling mainly in the field of computational biology. Please download printable version here.

*** Some Basic vi Commands ***

For editing the file

1. vi filename [create or edit file]

2. vi -r filename [recover the filename that was being edited]

3. Esc i [for inserting text]

4. Esc I [inserting text at the beginning of the current line]

5. Esc o [open and put a text in a new line below current line]

6. Esc O [open and put a text in a new line above current line]

7. Esc colon wq [for save and quit]

8. Esc colon q! [for quit only not for save]

9. Esc u [for undo whatever you just did]

10. Esc $ [move cursor to the end of the current line]

11. arrow keys for move up (k), down (j), left (h) and right (l)

12. :s/findme/replaceme/g [for finding and replacing with particular character/word]

For deleting the text

1. Esc x [delete single character under the cursor if Nx means N number of characters]

2. Esc dw [delete the single word beginning with character under cursor if dNw same as previous]

3. dd [delete entire current line, if Ndd same delete N lines beginning with the current line]

For copy paste

4. yy [copy the current line]

5. p [paste the line]

for more information you can type vi -h

******* The end ******

Understanding Computational Biology

Thursday, 19 March 2015

Composition based methods for taxonomic classification

1. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences.

7. Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models

13. Kraken: ultrafast metagenomic sequence classification using exact alignments

14. RDP Classifier: Naive Bayesian classifier for rapid assignment of rRNAsequences into the new bacterial taxonomy.

15. 16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets

Wednesday, 11 March 2015

Tools and methods for Flux Balance Analysis

8. SNA--a toolbox for the stoichiometric analysis of metabolic networks.

9. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox.

10. FBA-SimVis: interactive visualization of constraint-based metabolic models.

12. CycSim—an online tool for exploring and experimenting with genome-scale metabolic models

13. WEbcoli: an interactive and asynchronous web application for in silico design and analysis of genome-scale E.coli model.

14. RAST/Model SEED genome-scale metabolic reconstruction pipeline:

Friday, 27 February 2015

Databases, Software and Tools for metabolic pathway/network reconstructions

Here I am providing the details about the available tools and methods for the metabolic pathway reconstruction.

1. The RAVEN Toolbox and Its Use for Generating a Genome-scale Metabolic Model for Penicillium chrysogenum:

2. RAST/Model SEED genome-scale metabolic reconstruction pipeline:

5. Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology:

The metabolic SearcH And Reconstruction Kit (metaSHARK) is a new fully automated software package for the detection of enzyme-encoding genes within unannotated genome data and their visualization in the context of the surrounding metabolic network.

Available at:

9. The SuBliMinaL Toolbox: automating steps in the reconstruction of metabolic networks.

Available at: http://www.mcisb.org/resources/subliminal/

Sunday, 8 February 2015

16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets

Monday, 17 February 2014

Editing Multiple Files in Linux

Tuesday, 11 February 2014

Basic commands for linux

Saturday, 8 February 2014

Basic commands for the Linux vi Editor

Position

About Me

Thursday, 19 March 2015

1. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences.

7. Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models

13. Kraken: ultrafast metagenomic sequence classification using exact alignments

14. RDP Classifier: Naive Bayesian classifier for rapid assignment of rRNAsequences into the new bacterial taxonomy.

15. 16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets

Wednesday, 11 March 2015

8. SNA--a toolbox for the stoichiometric analysis of metabolic networks.

9. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox.

10. FBA-SimVis: interactive visualization of constraint-based metabolic models.

12. CycSim—an online tool for exploring and experimenting with genome-scale metabolic models

13. WEbcoli: an interactive and asynchronous web application for in silico design and analysis of genome-scale E.coli model.

14. RAST/Model SEED genome-scale metabolic reconstruction pipeline:

Friday, 27 February 2015

Here I am providing the details about the available tools and methods for the metabolic pathway reconstruction.

1. The RAVEN Toolbox and Its Use for Generating a Genome-scale Metabolic Model for Penicillium chrysogenum:

2. RAST/Model SEED genome-scale metabolic reconstruction pipeline:

5. Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology:

The metabolic SearcH And Reconstruction Kit (metaSHARK) is a new fully automated software package for the detection of enzyme-encoding genes within unannotated genome data and their visualization in the context of the surrounding metabolic network. Available at:

9. The SuBliMinaL Toolbox: automating steps in the reconstruction of metabolic networks.

Available at: http://www.mcisb.org/resources/subliminal/

Sunday, 8 February 2015

Monday, 17 February 2014

Tuesday, 11 February 2014

Saturday, 8 February 2014

The metabolic SearcH And Reconstruction Kit (metaSHARK) is a new fully automated software package for the detection of enzyme-encoding genes within unannotated genome data and their visualization in the context of the surrounding metabolic network.

Available at: