gtrieScanner - Quick Discovery of Network Motifs

Pedro Ribeiro - CRACS & INESC-TEC, DCC/FCUP

This webpage hosts a software tool that uses the g-trie data structure to count occurrences of subgraphs on larger graphs, determining if they are network motifs.

If you want to learn more about g-tries, please consult the following references (at the end of this page there is a more complete list of publications related to g-tries):

Pedro Ribeiro and Fernando Silva. G-Tries: a data structure for storing and finding subgraphs. In Data Mining and Knowledge Discovery, Vol. 28(2), pp. 337-377, Springer, March, 2014. [DOI]

Pedro Ribeiro and Fernando Silva. Querying Subgraph Sets with G-Tries. (Best Paper Award) 2nd ACM SIGMOD Workshop on Databases and Social Networks (DBSocial), Scottsdale, USA, pp 25-30, May, 2012 [DOI] [PDF]

Pedro Ribeiro. Efficient and Scalable Algorithms for Network Motifs Discovery. PhD Thesis. Doctoral Programme in Computer Science, Faculty of Science, University of Porto. June, 2011. [PDF]

Pedro Ribeiro and Fernando Silva. G-Tries: an efficient data structure for discovering network motifs. ACM 25th Symposium On Applied Computing - Bioinformatics Track, Sierre, Switzerland, pp 1559-1566, March, 2010. [DOI]

Source Code

Warning: this release is preliminary and is aimed mainly for demonstrating the capabilits of g-tries. Future releases will include more features.

Version 0.1 - 23/02/2012 - [ZIP file]
(tested with Ubuntu 11 and gcc 4.6.1)

Released under license: Artistic License 2.0
(This software uses the nauty program version 2.4 by Brendan McKay. Therefore, nauty's license restrictions also apply to the usage of gtrieScanner.)

To compile you need a C++ compiler (use GCC if possible) and the make utility (just do 'make' after uncompressing the source). This preliminary release was made for Linux systems.

Auxiliary Files

Pre-Computed G-Tries

gtries.zip: all undirected graphs from sizes 3 to 9, all directed graphs from sizes 3 to 6
(note that this is also an example of the types of files that the program expects as g-tries files)

Subgraph Lists

lists.zip: all undirected graphs from sizes 3 to 9, all directed graphs from sizes 3 to 6
(note that this is also an example of the types of files that the program expects as subgraph lists)

Example Graphs

These graphs are in 'simple_weight' format (edge list) and are available from mfinder's page (and therefore are also readable by that software and, incidentally, by Fanmod).

s420_st.txt: An undirected electronic circuit (252 nodes, 399 edges)
yeastInter_st.txt: a directed network expression gene regulation interactions in yeast organism (688 Nodes, 1079 Edges)

Example Output

You can check an example output file: example.html

Very Short Manual

A limitation of this particular release is that you can only search for subgraphs of a specific size, that is, you cannot search for subgraphs of different sizes at the same time (but nothing forbids you to do more than search)
Examples of usage

gtrieScanner -s 3 -m esu -g s420_st.txt
Compute the frequencies of subgraphs of size 3 in the undirected s420_st.txt network, using ESU algorithm
gtrieScanner -s 4 -m gtrie dir4.gt -g yeastInter_st.txt -d -t html -o yeast.html
Compute the frequencies of subgraphs of size 4 in directed yeastInter_st.txt network, using the g-trie stored in undir4.gt. Produce an HTML output to yeast.html file.
gtrieScanner -s 5 -m subgraphs undir5.str -g s420_st.txt -r 100 -oc dump.txt
Compute the motifs of size 5 in undirected s420_st.txt network, using the subgraphs listed in undir5.str and 100 random networks. Dump all occurrences of the subgbaphs in the original network 'dump.txt'.
gtrieScanner -s 5 -c dir5.str -o mygtrie5.gt -d
Produce the directed g-trie containing the subgraph list of dir5.str and output it to a pre-computed g-trie file 'mygtrie.gt'
Note that in all cases results are first ordered by z-score and then by frequency.

Command Line Syntax

You should call the program like this:
gtrieScanner -s <motif_size> [other_option]

Possible Options
- [-s <int>] or [--size <int>] Subgraph/motif size to consider (mandatory) - [-g <file>] or [--graph <file>] File containing the graph (mandatory except when just creating a g-trie) - [-d] or [--directed] Graph is directed (default is undirected) - [-u] or [--undirected] Graph is undirected (default is undirected) - [-f <format>] of [--format <format>] Format of the graph file. 'format' can be: (simple_weight) . "simple": list of pairs "a b", meaning an edge between a and b . "simple_weight": list of triples "a b c", meaning an edge between a and b with weight c (c is ignored) In all cases node labels are integers starting from 1. See above for example files. - [-m <method>] or [--method <method>] Method for searching for motifs. 'method' can be: (mandatory except when just creating a g-trie) . "esu": Use ESU on original graph . "gtrie <file>": use the g-trie of 'file' on original network . "subgraphs <file>": insert the subgraph list (one subgraph per line, as exemplified above) on a g-trie and use it on the original network. In any case, for computing the census on the random networks, a g-trie will be created with the subgraphs that appear at least once. - [-c <file>] or [--create <file>] Create g-trie from 'file' with subgraph list (one subgraph per line, see above examples) G-Trie is written to the file indicated by '-o' - [-o <file>] or [--output <file>] Name for the file which will contain the results of the computation. - [-oc <file>] or [--occurrences <file>] Show/Dump all individual occurrences of subgraphs in the original network to 'file' - [-t <format>] or [--type <format>] Format of the results. 'format' can be: . "txt": text file . "html": html file, ready for being seen on a browser (need connection to internet for graph drawing) - [-r <int>] or [--random <int>] Number of random networks to generate. (default is 0) Leave at zero to just compute frequency. - [-rs <int>] or [--rseed <int>] Seed for random number generation (default is time()) - [-re <int>] or [--rexchanges <int>] Number of exchanges per edge on randomization. (default is 3) - [-rt <int>] or [--tries <int>] Number of tries per edge on randomization. (default is 10)

Future - Planned updates

There are many features already with code written that will be rolled out and included in the following public releases. For example:

Documentation of code using Doyxgen
Other types of graph formats (GML, ...)
Possibility of using sampling
Multiple subgraph sizes on same operation
Graphical User Interface
...

Publications

This is the list of our publications directed related to g-tries:

Querying Subgraph Sets with G-Tries (Best Paper Award)
Pedro Ribeiro and Fernando Silva.
2nd ACM SIGMOD Workshop on Databases and Social Networks (DBSocial), Scottsdale, USA, pp 25-30, May, 2012

Efficient and Scalable Algorithms for Network Motifs Discovery
Pedro Ribeiro.
PhD Thesis. Doctoral Programme in Computer Science, Faculty of Science, University of Porto. June, 2011.

Efficient Parallel Subgraph Counting using G-Tries
Pedro Ribeiro, Fernando Silva and Luís Lopes.
IEEE International Conference on Cluster Computing, Crete, Greece, pp 217-226, IEEE CS Press, September, 2010.

Efficient Subgraph Frequency Estimation with G-Tries
Pedro Ribeiro and Fernando Silva.
10th International Workshop on Algorithms in Bioinformatics, Liverpool, UK, pp 238-249, Springer LNBI, September, 2010.

G-Tries: an efficient data structure for discovering network motifs
Pedro Ribeiro and Fernando Silva.
Proceedings of the ACM 25th Symposium On Applied Computing - Bioinformatics Track, Sierre, Switzerland, pp 1559-1566, March, 2010.

Not using g-tries, but also related to network motif discovery:

Parallel Discovery of Network Motifs
Pedro Ribeiro, Fernando Silva and Luís Lopes.
Journal of Parallel and Distributed Computing, Vol 72(2), pp 144-154, Elsevier, February, 2012.

A Parallel Algorithm for Counting Subgraphs in Complex Networks
Pedro Ribeiro, Fernando Silva and Luís Lopes.
3rd International Joint Conference on Biomedical Engineering Systems and Technologies - BIOSTEC, Revised Selected Papers, pp 380-393, Springer CCIS Vol. 127, 2011.

Strategies for Network Motifs Discovery
Pedro Ribeiro, Fernando Silva and Marcus Kaiser.
5th IEEE International Conference on e-Science, Oxford, UK, pp 80-87, IEEE CS Press, December, 2009.

Other Related Software

Fanmod - a tool for fast network motif detection
mfinder - Network motifs detection tool
Kavosh - a new algorithm for finding network motifs
MAVisto - Motif Analysis and Visualisation Tool

Pedro Ribeiro - CRACS & INESC-TEC, DCC/FCUP