[Neo4j] BFS help using neo4j on a graph with 200 million edges, 10 million nodes, (branching factor ca. 50)

david lightstone david.lightstone at gmail.com
Thu Oct 28 19:38:04 CEST 2010


Hi everyone,

I'm running Neo4j on both Ubuntu and Windows 7 boxes. I have a dataset with
200 million edges and 10 million nodes with a median branching factor of
about 50 outgoing, directed edges/node. I'm trying to run the BFS search on
the data but am fairing unsuccessful in being able to do so in a timely
fashion. I have tried to follow the advice on
http://wiki.neo4j.org/content/Neo4j_Performance_Guide but I still have
queries that can take up to 300 seconds or so to run. My Ubuntu box has 6GB
of RAM, and is running on a 7200RPM hard drive, while my windows box has 8GB
RAM and is running off of SSDs (HDtune reports ~300 MB/s reads).

I had also added an index for the nodes.

Can anyone offer advice on why this process may be taking so long? The CPU
usage on both is very low (2-5%) and I'm pretty sure the whole thing is HDD
i/o limited, but I was wondering if there were any techniques or anything to
actually get the query to go any faster?

Judging by what I had read about Neo4j in descriptions I assumed that my
data size was not too large to justify a long BFS (the paths can take up to
300 seconds just 4 nodes away.)

Thank you in advance.


More information about the User mailing list