Friday, February 4, 2011

Even faster gene predictions

One question we sometimes get from our users is, “I have lots of gene lists. Can I do predictions with them all in one shot instead filling out the form each time?”

Query Runner is our solution to this problem. It's a scriptable command-line tool bundled with the GeneMANIA Cytoscape plugin and highly optimized to eat gene lists for breakfast.

You can choose from a bunch of different output formats for the prediction results ranging from plain gene lists with scores, to full-fledged reports packed with provenance, network adjacency lists and GO annotation statistics.

Loading up prediction networks is currently the biggest bottleneck in GeneMANIA. For example, we have about 650 MB worth of networks, compressed, in our human data set. Most of the time in the prediction process is spent loading that up. For instance, in our example human query, loading takes up 61% of the time of a single prediction:

The GeneMANIA website and Cytoscape plugin get around this by loading the data once and performing multiple queries with it, one at a time:

Query Runner tops this by running a query on each available core on the machine:

On a quad core processor, it takes just as long to run four predictions as it does for one:

No comments:

Post a Comment