Efficient Subgraph Frequency Estimation with G-Tries

Pedro Ribeiro and Fernando Silva

2010

Abstract

Many biological networks contain recurring overrepresented elements, called network motifs. Finding these substructures is a computationally hard task related to graph isomorphism. G-Tries are an efficient data structure, based on multiway trees, capable of efficiently identifying common substructures in a set of subgraphs. They are highly successful in constraining the search space when finding the occurrences of those subgraphs in a larger original graph. This leads to speedups up to 100 times faster than previous methods that aim for exact and complete results. In this paper we present a new efficient sampling algorithm for subgraph frequency estimation based on g-tries. It is able to uniformly traverse a fraction of the search space, providing an accurate unbiased estimation of subgraph frequencies. Our results show that in the same amount of time our algorithm achieves better precision than previous methods, as it is able to sustain higher sampling speeds.

Keywords

Complex Networks ; Network Motifs; Subgraph Frequency; Sampling; G-Tries

Digital Object Identifier (DOI)

doi 10.1007/978-3-642-15294-8_20

Publication in PDF format

pdf Download PDF

Journal/Conference/Book

10th International Workshop on Algorithms in Bioinformatics

Reference (text)

Pedro Ribeiro and Fernando Silva. Efficient Subgraph Frequency Estimation with G-Tries. Proceedings of the 10th International Workshop on Algorithms in Bioinformatics (WABI), pp. 238-249, Springer LNCS Vol. 6293, Liverpool, UK, September, 2010.

Bibtex

@inproceedings{ribeiro-WABI2010,
  author = {Pedro Ribeiro and Fernando Silva},
  title = {Efficient Subgraph Frequency Estimation with G-Tries},
  doi = {10.1007/978-3-642-15294-8_20},
  booktitle = {10th International Workshop on Algorithms in Bioinformatics},
  pages = {238-249},
  publisher = {Springer},
  month = {September},
  year = {2010}
}