Instead of astronomical possibilities, protein structure space is amazingly limited. The reason behind such structure space distribution have been investigated on sequence, structure and function domains but the enigma of the problem remains. Design of protein structure space has been shaped by evolution considering organism's survival, hence understanding protein structure space and its variations are no trivial task. With Pro-Lego we attempt to provide a systematic framework to study, understand and explore protein topology space with current knowledge of structural data in a component approach (Lego or patterns). Current version (v1.1) has been developed for all-alpha proteins. Given a PDB id and chain specifier, Pro-Lego v1.1 identifies inherrent patern and further test the patern to be prevalent or not based on study reported in [1]. Moreover, Pro-Lego analyze important structural descriptors of the protein and links to functional analysis.
[1] (submitted manuscript) Khan T, Ghosh I (2014). Modularity in Protein Structures: Study on all-alpha Proteins.

Oris : Origin Search Tool

We have implemented various existing methods in the context of replication origin finding and also incorporated few new measures which are hitherto not part of any freely available origin finding software tool. Software modules are offered and arranged in a contextual view and simple visualisation plots are provided to aid in making useful inferences. Our software suite called ORIS stands for ORIgin Search and is meant to help researchers in identifying origin of replication sites in the genomic data of prokaryotes, archea and eukaryotes. Our java based software package is free and has many modules which deal with GC and AT skew, cumulative skew, autocorrelation, cross-correlation, origin specific motif search with mismatches, weight matrix search, Shannon and Renyi entropy and bending profiles. ORIS also supports userdefined expressions which can be very useful in the discovery of new methods for predicting replication origin ORIS is written in java language. it was chosen because of its features like object oriented, platform independent, architecturally neutral, robust, multithreaded etc (10). ORIS has a java swing based GUI. As it is implemented in java, any machine with any operating system having JRE (Java Runtime Environment) can run ORIS.

Computational Pipeline for analysis of known and prediction of novel miRNAs from the deep sequencing data:

MicroRNAs are a class of small non-coding RNAs that regulate mRNA expression at the post - transcriptional level and thereby many fundamental biological processes. A number of methods, such as multiplex polymerase chain reaction, microarrays have been developed for profiling levels of known miRNAs. These methods lack the ability to identify novel miRNAs and accurately determine expression at a range of concentrations. Deep or massively parallel sequencing methods are providing suitable platforms for genome wide transcriptome analysis and have the ability to identify novel transcripts.
Our major aim is to understand the microRNAome through deep sequencing technology which provides several interesting observations. These include: a) Quantitative profiles of known miRNAs and a valuable list of differentially expressed miRNAs that help to understand the role of miRNAs in a specific biological process as well as reveal altered levels of miRNAs in different systems such as oncogenesis, development. b) Most importantly the discovery of novel miRNAs, even those that have low expression levels.
We have developed an improved automated computation pipeline for analysis of deep sequencing data that involves an exhaustive elimination pipeline and a novel miRNA prediction pipeline. The elimination pipeline is used for filtering out the known, annotated sRNA sequences.The novel miRNA prediction pipeline involves: a) The extraction of the flanking sequences of the unannotated, unmatched sequences from the elimination pipeline that matched the intergenic/intronic regions. b) The analysis of the extended sequences by precursor-prediction algorithms. c) Finally the filtering of the predictions to check for presence of IsomiRs and Star sequences.
Publication details:- Vaz C, Ahmad H M, Sharma P, Gupta R, Kumar L, Kulshreshtha R, Bhattacharya A. Analysis of microRNA transcriptome by deep sequencing of small RNA libraries of peripheral blood.BMC Genomics 2010, 11:288.

ABWGAT: Anchor Based Whole Genome Analysis Tool:

Large numbers of genomes are being sequenced regularly and the rate will go up in future due to availability of new genome sequencing techniques. In order to understand genotype to phenotype relationships it is necessary to identify sequence variations at the genomic level. Alignment of a pair of genomes and parsing the alignment data is an accepted approach for identification of variations. Though there are a number of tools available for whole genome alignment none of these allow automatic parsing of the alignment and identification of different kinds of genomic variants with high degree of sensitivity. We have developed a new algorithm and web based interface for pairwise whole genome comparison named ABWGAT (Anchor Based Whole Genome Analysis Tool) that is simple to use. The server is useful to find genetic variations like SNV (Single Nucleotide Variations), INDEL (Insertion and deletion), Repeat Expansions and Inversions. The output is a separate list for each of the variations, size, gene name, predicted function etc.The address of the web-server is as follow:ABWGAT: Anchor Based Whole Genome Analysis Tool
Publication details:- Sarbashis Das, Anchal Vishnoi, and Alok Bhattacharya; ABWGAT: Anchor Based Whole Genome Analysis Tool. Bioinformatics Advance Access published on October 14, 2009. doi:10.1093/bioinformatics/btp587

Plant Stress Gene Database
Stress conditions, both biotic and abiotic cause extensive losses to agricultural production worldwide. Individually, stress conditions such as drought, salinity or heat have been the subject of intense research. However,in the field, crops and other plants are routinely subjected to a combination of different abiotic stresses. Owing to their sessile nature, plants are constantly exposed to a multitude of environmental stresses to which they react with a battery of responses. The result is plant tolerance to conditions such as excessive or inadequate light, water, salt and temperature, and resistance to pathogens. Not only is plant physiology known to change under abiotic or biotic stress, but changes in the genome have also been identified.This database include 259 stress-related genes of 11 species alongwith all the available information about the individual genes. Stress related ESTs were also found for Phaseolus vulgaris. Database also includes ortholog and paralog of proteins which are coded by stress related genes.

DNA Scanner

DNA SCANNER is a tool which scans DNA for number of different properties such as biophysical, energy, potential for protein interactions and sequence based features such as T density, AT density etc.

  • ELAN - A server based tool for genome wide analysis of mobile genetic elements

  • miRNA Prediction Webserver 

    CIDmiRNA is the tool for computer-assisted identification of micro-RNA using an SCFG model and has been designed to analyze either a single sequence or complete genome.

    GOPAM Gene Ontology based prediction analysis of microarray

    GOPAM (Gene Ontology Based Prediction Analysis of Microarray) is the integrated web based application composed of three component (1) GOPAM : For analysis of GO hierarchy to find set of interesting GO nodes, (2) GOViZ : For interactive visualization of the GO hierarchy for the specific node of interest up to chosen level for children, or ancestor or both and to visualize how set of GO nodes minimally connected in the GO structure as well as in GO hierarchy. (3) GOdb : Connects the database to several other databases and allow GO centric query with certain degree of evidence.

    Mycobacterial Genome Divergence database

    MGDD (Mycobacterial Genome Divergence Database) is a repository of genetic differences among different strains and species of organisms belonging to Mycobacterium tuberculosis complex. The differences are based on comparison of user chosen organisms. The query sequences are used to compare against subject sequences. The users can also choose the type of genetic divergence, that is, SNPs (Single Nucleotide Polymorphism), insertions, repeat expansion and divergent sequences that they are interested in. The results from a specific region (based on boundary defined by nucleotide sequence) or a specific gene can be displayed based on user's choice. Presently, the database has precomputed analysis from three different fully sequenced genomes of this complex. These are Mycobacterium tuberculosis H37Rv, Mycobacterium tuberculosis CDC1551 and Mycobacterium bovis AF2122/97. In future it will be updated with more strains species as fully sequenced genomes become available.

    Spectral Repeat Finder (SRF), Software for finding repeat structures in genomic DNA (in collaboration with IMTECH)

    Spectral Repeat Finder (SRF) is a program to find repeats through an analysis of the power spectrum of a given DNA sequence. By repeat we mean the repeated occurrence of a segment of N nucleotides within a DNA sequence. The repeats can be contiguous, in which case they are termed tandem repeats, or not, in which case they are dispersed. SRF is an ab initio technique as no prior assumptions need to be made regarding either the repeat length, its fidelity, or whether the repeats are in tandem or not.