gtrieScanner  

gtrieScanner - Quick Discovery of Network Motifs

Pedro Ribeiro - CRACS & INESC-TEC, DCC/FCUP

| Source Code | Auxiliary Files | Graphs | Example Output | Short Manual | Future | Publications | Other |


This webpage hosts a software tool that uses the g-trie data structure to count occurrences of subgraphs on larger graphs, determining if they are network motifs.

If you want to learn more about g-tries, please consult the following references (at the end of this page there is a more complete list of publications related to g-tries):


Source Code

Warning: this release is preliminary and is aimed mainly for demonstrating the capabilits of g-tries. Future releases will include more features.

To compile you need a C++ compiler (use GCC if possible) and the make utility (just do 'make' after uncompressing the source). This preliminary release was made for Linux systems.


Auxiliary Files

Pre-Computed G-Tries

Subgraph Lists


Example Graphs

These graphs are in 'simple_weight' format (edge list) and are available from mfinder's page (and therefore are also readable by that software and, incidentally, by Fanmod).


Example Output

You can check an example output file: example.html


Very Short Manual

A limitation of this particular release is that you can only search for subgraphs of a specific size, that is, you cannot search for subgraphs of different sizes at the same time (but nothing forbids you to do more than search)

Examples of usage

gtrieScanner -s 3 -m esu -g s420_st.txt
Compute the frequencies of subgraphs of size 3 in the undirected s420_st.txt network, using ESU algorithm

gtrieScanner -s 4 -m gtrie dir4.gt -g yeastInter_st.txt -d -t html -o yeast.html
Compute the frequencies of subgraphs of size 4 in directed yeastInter_st.txt network, using the g-trie stored in undir4.gt. Produce an HTML output to yeast.html file.

gtrieScanner -s 5 -m subgraphs undir5.str -g s420_st.txt -r 100 -oc dump.txt
Compute the motifs of size 5 in undirected s420_st.txt network, using the subgraphs listed in undir5.str and 100 random networks. Dump all occurrences of the subgbaphs in the original network 'dump.txt'.

gtrieScanner -s 5 -c dir5.str -o mygtrie5.gt -d
Produce the directed g-trie containing the subgraph list of dir5.str and output it to a pre-computed g-trie file 'mygtrie.gt'

Note that in all cases results are first ordered by z-score and then by frequency.

Command Line Syntax

You should call the program like this:

gtrieScanner -s <motif_size> [other_option]

Possible Options

 - [-s <int>] or [--size <int>]
   Subgraph/motif size to consider (mandatory)

 - [-g <file>] or [--graph <file>]
   File containing the graph (mandatory except when just creating a g-trie)

 - [-d] or [--directed]
   Graph is directed (default is undirected)

 - [-u] or [--undirected]
   Graph is undirected (default is undirected)

 - [-f <format>] of [--format <format>]
   Format of the graph file. 'format' can be: (simple_weight)
   . "simple": list of pairs "a b", meaning an edge between a and b
   . "simple_weight": list of triples "a b c", meaning an edge between a and b with weight c (c is ignored)
   In all cases node labels are integers starting from 1. See above for example files.

 - [-m <method>] or [--method <method>]
   Method for searching for motifs. 'method' can be: (mandatory except when just creating a g-trie)
   . "esu": Use ESU on original graph
   . "gtrie <file>": use the g-trie of 'file' on original network
   . "subgraphs <file>": insert the subgraph list (one subgraph per line, as exemplified above) 
                               on a g-trie and use it on the original network.
   In any case, for computing the census on the random networks, a g-trie will be created with the
   subgraphs that appear at least once.

 - [-c <file>] or [--create <file>]
   Create g-trie from 'file' with subgraph list (one subgraph per line, see above examples)
   G-Trie is written to the file indicated by '-o'

 - [-o <file>] or [--output <file>]
   Name for the file which will contain the results of the computation.

 - [-oc <file>] or [--occurrences <file>]
   Show/Dump all individual occurrences of subgraphs in the original network to 'file'

 - [-t <format>] or [--type <format>]
   Format of the results. 'format' can be:
   . "txt": text file
   . "html": html file, ready for being seen on a browser
     (need connection to internet for graph drawing)

 - [-r <int>] or [--random <int>]
   Number of random networks to generate. (default is 0)
   Leave at zero to just compute frequency.

 - [-rs <int>] or [--rseed <int>]
   Seed for random number generation (default is time())

 - [-re <int>] or [--rexchanges <int>]
   Number of exchanges per edge on randomization. (default is 3)

 - [-rt <int>] or [--tries <int>]
   Number of tries per edge on randomization. (default is 10)

Future - Planned updates

There are many features already with code written that will be rolled out and included in the following public releases. For example:


Publications

This is the list of our publications directed related to g-tries:

Not using g-tries, but also related to network motif discovery:


Other Related Software


Pedro Ribeiro - CRACS & INESC-TEC, DCC/FCUP

tumblr visit counter