Efficient and Scalable Algorithms for Network Motifs Discovery

Pedro Ribeiro

2011

Abstract

Networks are a powerful representation for a multitude of natural and artificial systems. They are ubiquitous in real-world systems, presenting substantial non-trivial topological features. These are called complex networks and have received increasing attention in recent years. In order to understand their design principles, the concept of network motifs emerged. These are recurrent over-represented patterns of inter-connections, conjectured to have some significance, that can be seen as basic building blocks of networks. Algorithmically, discovering network motifs is a hard problem related to graph isomorphism. The needed execution time grows exponentially as the size of networks or motifs increases, thus limiting their applicability. Since motifs are a fundamental concept, increasing the efficiency in its detection can lead to new insights in several areas of knowledge. To develop efficient and scalable algorithms for motifs discovery is precisely the main aim of this thesis.
 
We provide a thorough survey of existing methods, complete with an associated chronology, taxonomy, algorithmic description and empirical evaluation and comparison. We propose a novel data-structure, g-tries, designed to represent a collection of graphs. Akin to a prefix tree, it takes advantage of common substructures to both reduce the memory needed to store the graphs, and to produce a new more efficient sequential algorithm to compute their frequency as subgraphs of another larger graph. We also introduce a sampling methodology for g-tries that successfully trades accuracy for faster execution times. We identify opportunities for parallelism in motif discovery, creating an associated taxonomy. We expose the whole motif computation as a tree based search and devise a general methodology for parallel execution with dynamic load balancing, including a novel strategy capable of efficiently stopping and dividing computation on the fly. In particular we provide parallel algorithms for ESU and g-tries.
 
Finally, we extensively evaluate our algorithms on a set of diversified complex networks. We show that we are able to outperform all existing sequential algorithms and are able to scale our parallel algorithms up to 128 processors almost linearly. By combining the power of g-tries and parallelism, we speedup motif discovery by several orders of magnitude, thus effectively pushing the limits in its applicability.

Publication in PDF format

pdf Download PhD Thesis as PDF

Institution

Faculty of Science of the University of Porto

Reference (text)

Pedro Ribeiro. Efficient and Scalable Algorithms for Network Motifs Discovery. PhD Thesis. Doctoral Programme in Computer Science. Faculty of Science of the University of Porto, June, 2011.

Bibtex

@phdthesis{ribeiro-PHD2011,
  author = {Pedro Ribeiro},
  title = {Efficient and Scalable Algorithms for Network Motifs Discovery},
  school = {Faculty of Science of the University of Porto},
  month = {June},
  year = {2011}
}