G-Tries: an efficient data structure for discovering network motifs

Pedro Ribeiro and Fernando Silva

2010

Abstract

In this paper we propose a novel specialized data structure that we call g-trie, designed to deal with collections of subgraphs. The main conceptual idea is akin to a prefix tree in the sense that we take advantage of common topology by constructing a multiway tree where the descendants of a node share a common substructure. We give algorithms to construct a g-trie, to list all stored subgraphs, and to find occurrences on another graph of the subgraphs stored in the g-trie. We evaluate the implementation of this structure and its associated algorithms on a set of representative benchmark biological networks in order to find network motifs. To assess the efficiency of our algorithms we compare their performance with other known network motif algorithms also implemented in the same common platform. Our results show that indeed, g-tries are a feasible, adequate and very efficient data structure for network motifs discovery, clearly outperforming previous algorithms and data structures.

Keywords

Biological Networks; Complex Networks; Network Motifs; Graph Mining; Algorithms; Data Structures; Tries

Reference (text)

Pedro Ribeiro and Fernando Silva. G-Tries: an efficient data structure for discovering network motifs. Proceedings of the 25th ACM Symposium On Applied Computing - Bioinformatics Track (ACMSAC), pp. 1559-1566, ACM, Sierre, Switzerland, March, 2010.

Bibtex

@inproceedings{ribeiro-ACMSAC2010,
  author = {Pedro Ribeiro and Fernando Silva},
  title = {G-Tries: an efficient data structure for discovering network motifs},
  doi = {10.1145/1774088.1774422},
  booktitle = {25th ACM Symposium On Applied Computing - Bioinformatics Track},
  pages = {1559-1566},
  publisher = {ACM},
  month = {March},
  year = {2010}
}