BIORED

Introduction

BIORED is a a tool to discover patterns in genomic and proteomic sequences. It accepts a powerful pattern language that is a subset of regular expressions. We use a genetic algorithm to discover patterns together with an efficient pattern matching procedure to count pattern occurrences in the sequences. To achieve higher performances we have also implemented a parallel and distributed version of BIORED, using LAM-MPI.

Capabilities

With BIORED you can mine for patterns in a single data sets, or using two data sets a positive and a negative. These two data sets allow the system to run as a classification system and currently recognizes the following measures: coverage, precision, rule-set-accuracy, recall, specificity, support and f-measure.

It is also possible to use a different source of character probabilities, which might be usefull when mining patterns in a small substring with skewed probability distribution.

We also developed the program bioredx which is a simple implementation of a exhaustive search algorithm. This program allows to improve a already known pattern by adding or removing character positions. The program is also used to evaluate the score of a given pattern in a dataset.

Demonstration

You can try the BIORED system by submiting a small sequence (less than 5 kilobytes) to a online form. The demo does not fully demonstrate the power of the tool, since it lacks many options.

Please note that the algorithm is of stochastic nature, therefore the same dataset can/will lead to diferent results in different executions.

You can download an example dataset extracted from Human gene for preproinsulin (chromosome 11).

Related Articles

Available soon.

Documentation

The documentation is available as manpages.

pattern language, biored, biored MPI, bioredx, and bioredx MPI.

Download

The source code of the tool was written in ANSI C (C89); is available under the Gnu Public License, and can be downloaded here.

To compile the program you need the R library. The parallel/distributed version needs the LAM-MPI library.

A x86 Linux static version and a Windows version of the sequential version of the programs can also be downloaded.

About

BIORED was developed in LIACC and DCC by Pedro Pereira, Nuno A. Fonseca and Fernando Silva.

Valid HTML 4.01
Transitional