gtrieScanner - Quick Discovery of Network Motifs
| Source Code | Auxiliary Files | Graphs | Example Output | Short Manual | Future | Publications | Other |
This webpage hosts a software tool that uses the g-trie data structure to count occurrences of subgraphs on larger graphs, determining if they are network motifs.
If you want to learn more about g-tries, please consult the following references (at the end of this page there is a more complete list of publications related to g-tries):
Warning: this release is preliminary and is aimed mainly for demonstrating the capabilits of g-tries. Future releases will include more features.
To compile you need a C++ compiler (use GCC if possible) and the make utility (just do 'make' after uncompressing the source). This preliminary release was made for Linux systems.
These graphs are in 'simple_weight' format (edge list) and are available from mfinder's page (and therefore are also readable by that software and, incidentally, by Fanmod).
You can check an example output file: example.html
A limitation of this particular release is that you can only search for subgraphs of a specific size, that is, you cannot search for subgraphs of different sizes at the same time (but nothing forbids you to do more than search)
gtrieScanner -s 3 -m esu -g s420_st.txt
Compute the frequencies of subgraphs of size 3 in the undirected s420_st.txt network, using ESU algorithm
gtrieScanner -s 4 -m gtrie dir4.gt -g yeastInter_st.txt -d -t html -o yeast.html
Compute the frequencies of subgraphs of size 4 in directed yeastInter_st.txt network, using the g-trie stored in undir4.gt. Produce an HTML output to yeast.html file.
gtrieScanner -s 5 -m subgraphs undir5.str -g s420_st.txt -r 100 -oc dump.txt
Compute the motifs of size 5 in undirected s420_st.txt network, using the subgraphs listed in undir5.str and 100 random networks. Dump all occurrences of the subgbaphs in the original network 'dump.txt'.
gtrieScanner -s 5 -c dir5.str -o mygtrie5.gt -d
Produce the directed g-trie containing the subgraph list of dir5.str and output it to a pre-computed g-trie file 'mygtrie.gt'
Note that in all cases results are first ordered by z-score and then by frequency.
You should call the program like this:
gtrieScanner -s <motif_size> [other_option]
- [-s <int>] or [--size <int>] Subgraph/motif size to consider (mandatory) - [-g <file>] or [--graph <file>] File containing the graph (mandatory except when just creating a g-trie) - [-d] or [--directed] Graph is directed (default is undirected) - [-u] or [--undirected] Graph is undirected (default is undirected) - [-f <format>] of [--format <format>] Format of the graph file. 'format' can be: (simple_weight) . "simple": list of pairs "a b", meaning an edge between a and b . "simple_weight": list of triples "a b c", meaning an edge between a and b with weight c (c is ignored) In all cases node labels are integers starting from 1. See above for example files. - [-m <method>] or [--method <method>] Method for searching for motifs. 'method' can be: (mandatory except when just creating a g-trie) . "esu": Use ESU on original graph . "gtrie <file>": use the g-trie of 'file' on original network . "subgraphs <file>": insert the subgraph list (one subgraph per line, as exemplified above) on a g-trie and use it on the original network. In any case, for computing the census on the random networks, a g-trie will be created with the subgraphs that appear at least once. - [-c <file>] or [--create <file>] Create g-trie from 'file' with subgraph list (one subgraph per line, see above examples) G-Trie is written to the file indicated by '-o' - [-o <file>] or [--output <file>] Name for the file which will contain the results of the computation. - [-oc <file>] or [--occurrences <file>] Show/Dump all individual occurrences of subgraphs in the original network to 'file' - [-t <format>] or [--type <format>] Format of the results. 'format' can be: . "txt": text file . "html": html file, ready for being seen on a browser (need connection to internet for graph drawing) - [-r <int>] or [--random <int>] Number of random networks to generate. (default is 0) Leave at zero to just compute frequency. - [-rs <int>] or [--rseed <int>] Seed for random number generation (default is time()) - [-re <int>] or [--rexchanges <int>] Number of exchanges per edge on randomization. (default is 3) - [-rt <int>] or [--tries <int>] Number of tries per edge on randomization. (default is 10)
There are many features already with code written that will be rolled out and included in the following public releases. For example:
This is the list of our publications directed related to g-tries:
Not using g-tries, but also related to network motif discovery:
Pedro Ribeiro - CRACS & INESC-TEC, DCC/FCUP