BIORED is a a tool to discover patterns in genomic and proteomic sequences. It accepts a powerful pattern language that is a subset of regular expressions. We use a genetic algorithm to discover patterns together with an efficient pattern matching procedure to count pattern occurrences in the sequences. To achieve higher performances we have also implemented a parallel and distributed version of BIORED, using LAM-MPI.
With BIORED you can mine for patterns in a single data sets, or using two data sets a positive and a negative. These two data sets allow the system to run as a classification system and currently recognizes the following measures: coverage, precision, rule-set-accuracy, recall, specificity, support and f-measure.
It is also possible to use a different source of character probabilities, which might be usefull when mining patterns in a small substring with skewed probability distribution.
We also developed the program bioredx which is a simple implementation of a exhaustive search algorithm. This program allows to improve a already known pattern by adding or removing character positions. The program is also used to evaluate the score of a given pattern in a dataset.
You can try the BIORED system by submiting a small sequence (less than 5 kilobytes) to a online form. The demo does not fully demonstrate the power of the tool, since it lacks many options.
Please note that the algorithm is of stochastic nature, therefore the same dataset can/will lead to diferent results in different executions.
The documentation is available as manpages.pattern language, biored, biored MPI, bioredx, and bioredx MPI.
To compile the program you need the R library. The parallel/distributed version needs the LAM-MPI library.