Category: Bioinformatics algorithms

Blast2GO
Blast2GO, first published in 2005, is a bioinformatics software tool for the automatic, high-throughput functional annotation of novel sequence data (genes proteins). It makes use of the BLAST algorit
High-performance Integrated Virtual Environment
The High-performance Integrated Virtual Environment (HIVE) is a distributed computing environment used for healthcare-IT and biological research, including analysis of Next Generation Sequencing (NGS)
Nussinov algorithm
The Nussinov algorithm is a nucleic acid structure prediction algorithm used in computational biology to predict the folding of an RNA molecule that makes use of dynamic programming principles. The al
Pairwise Algorithm
A Pairwise Algorithm is an algorithmic technique with its origins in Dynamic programming. Pairwise algorithms have several uses including comparing a protein profile (a residue scoring matrix for one
Microarray analysis techniques
Microarray analysis techniques are used in interpreting the data generated from experiments on DNA (Gene chip analysis), RNA, and protein microarrays, which allow researchers to investigate the expres
Quasi-median networks
The concept of a quasi-median network is a generalization of the concept of a median network that was introduced to represent multistate characters. Note that, unlike median networks, quasi-median net
Velvet assembler
Velvet is an algorithm package that has been designed to deal with de novo genome assembly and short read sequencing alignments. This is achieved through the manipulation of de Bruijn graphs for genom
Pseudo amino acid composition
Pseudo amino acid composition, or PseAAC, in molecular biology, was originally introduced by Kuo-Chen Chou in 2001 to represent protein samples for improving protein subcellular localization predictio
PSI Protein Classifier
PSI Protein Classifier is a program generalizing the results of both successive and independent iterations of the PSI-BLAST program. PSI Protein Classifier determines belonging of the found by PSI-BLA
Short Oligonucleotide Analysis Package
SOAP (Short Oligonucleotide Analysis Package) is a suite of bioinformatics software tools from the BGI Bioinformatics department enabling the assembly, alignment, and analysis of next generation DNA s
TopHat (bioinformatics)
TopHat is an open-source bioinformatics tool for the throughput alignment of shotgun cDNA sequencing reads generated by transcriptomics technologies (e.g. RNA-Seq) using Bowtie first and then mapping
Baum–Welch algorithm
In electrical engineering, statistical computing and bioinformatics, the Baum–Welch algorithm is a special case of the expectation–maximization algorithm used to find the unknown parameters of a hidde
Ukkonen's algorithm
In computer science, Ukkonen's algorithm is a linear-time, online algorithm for constructing suffix trees, proposed by Esko Ukkonen in 1995. The algorithm begins with an implicit suffix tree containin
Bowtie (sequence analysis)
Bowtie is a software package commonly used for sequence alignment and sequence analysis in bioinformatics. The source code for the package is distributed freely and compiled binaries are available for
Quartet distance
The quartet distance is a way of measuring the distance between two phylogenetic trees. It is defined as the number of subsets of four leaves that are not related by the same topology in both trees.
Robinson–Foulds metric
The Robinson–Foulds or symmetric difference metric, often abbreviated as the RF distance, is a simple way to calculate the distance between phylogenetic trees. It is defined as (A + B) where A is the
Z curve
The Z curve (or Z-curve) method is a bioinformatics algorithm for genome analysis. The Z-curve is a three-dimensional curve that constitutes a unique representation of a DNA sequence, i.e., for the Z-
Complete-linkage clustering
Complete-linkage clustering is one of several methods of agglomerative hierarchical clustering. At the beginning of the process, each element is in a cluster of its own. The clusters are then sequenti
Sequential pattern mining
Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. It is usually presumed th
Needleman–Wunsch algorithm
The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences. It was one of the first applications of dynamic programming to compare biological sequen
SCHEMA (bioinformatics)
SCHEMA is a computational algorithm used in protein engineering to identify fragments of proteins (called schemas) that can be recombined without disturbing the integrity of the proteins' three-dimens
ViennaRNA Package
The ViennaRNA Package is a set of standalone programs and libraries used for prediction and analysis of RNA secondary structures. The source code for the package is distributed freely and compiled bin
WPGMA
WPGMA (Weighted Pair Group Method with Arithmetic Mean) is a simple agglomerative (bottom-up) hierarchical clustering method, generally attributed to Sokal and Michener. The WPGMA method is similar to
Island algorithm
The island algorithm is an algorithm for performing inference on hidden Markov models, or their generalization, dynamic Bayesian networks.It calculates the marginal distribution for each unobserved no
Kabsch algorithm
The Kabsch algorithm, named after , is a method for calculating the optimal rotation matrix that minimizes the RMSD (root mean squared deviation) between two paired sets of points. It is useful in gra
Sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutiona
Neighbor joining
In bioinformatics, neighbor joining is a bottom-up (agglomerative) clustering method for the creation of phylogenetic trees, created by and Masatoshi Nei in 1987. Usually based on DNA or protein seque
De novo sequence assemblers
De novo sequence assemblers are a type of program that assembles short nucleotide sequences into longer ones without the use of a reference genome. These are most commonly used in bioinformatic studie
Shapiro Senapathy algorithm
The Shapiro Senapathy algorithm (S&S) is an algorithm for predicting splice junctions in genes of animals and plants. This algorithm has been used to discover disease-causing splice site mutations and
BLAST (biotechnology)
In bioinformatics, BLAST (basic local alignment search tool) is an algorithm and program for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucl
Hirschberg's algorithm
In computer science, Hirschberg's algorithm, named after its inventor, Dan Hirschberg, is a dynamic programming algorithm that finds the optimal sequence alignment between two strings. Optimality is m
SAMtools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM (Sequence Alignment/Map), BAM (Binary Alignment/Map) and CRAM formats, written by
SPAdes (software)
SPAdes (St. Petersburg genome assembler) is a genome assembly algorithm which was designed for single cell and multi-cells bacterial data sets. Therefore, it might not be suitable for large genomes pr
Smith–Waterman algorithm
The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings of nucleic acid sequences or protein sequences. Instead of looking at the e
UPGMA
UPGMA (unweighted pair group method with arithmetic mean) is a simple agglomerative (bottom-up) hierarchical clustering method. The method is generally attributed to Sokal and Michener. The UPGMA meth
UCLUST
UCLUSTis an algorithm designed to cluster nucleotide or amino-acid sequences into clusters based on sequence similarity. The algorithm was published in 2010 and implemented in a program also named UCL