[Neo] Traverser node return process

Bhuvan bifss at yahoo.co.in
Thu Apr 29 21:23:55 CEST 2010


Hello,

We are trying to explore Neo4j for a huge number of graph nodes and relations.
Let's say there are about 6 million users across the world and 6 million user address elements like postal-code/city/state/country etc.
Now I am trying to get all users in a given country which has about 3 million users. What I found is that traverser returned about 0.6 million nodes quickly and thereafter it slows down as shown below:
--------------------------
 INFO  [2010-04-28 20:15:13,082] [test.TraversalTest] - Starting...
 INFO  [2010-04-28 20:15:39,030] [test.TraversalTest] – 100,000
 INFO  [2010-04-28 20:15:41,734] [test.TraversalTest] – 200,000
 INFO  [2010-04-28 20:15:44,022] [test.TraversalTest] – 300,000
 INFO  [2010-04-28 20:15:51,353] [test.TraversalTest] – 400,000
 INFO  [2010-04-28 20:15:53,433] [test.TraversalTest] – 500,000
 INFO  [2010-04-28 20:15:55,721] [test.TraversalTest] – 600,000

 INFO  [2010-04-28 20:20:54,433] [test.TraversalTest] – 700,000
 INFO  [2010-04-28 20:25:32,407] [test.TraversalTest] – 800,000
 INFO  [2010-04-28 20:30:33,274] [test.TraversalTest] – 900,000
 INFO  [2010-04-28 20:35:26,405] [test.TraversalTest] – 1,000,000
 INFO  [2010-04-28 20:39:17,099] [test.TraversalTest] – 1,100,000
 INFO  [2010-04-28 20:42:52,856] [test.TraversalTest] – 1,200,000
 INFO  [2010-04-28 20:46:57,318] [test.TraversalTest] – 1,300,000
 INFO  [2010-04-28 20:50:58,397] [test.TraversalTest] – 1,400,000
 INFO  [2010-04-28 20:54:53,570] [test.TraversalTest] – 1,500,000
--------------------------
The number in the last of line above shows the returned node count after every 100,000 nodes which is printed in the for-loop.
I used following traverser:

Traverser traverser = startNode.traverse(Traverser.Order.BREADTH_FIRST,
                        StopEvaluator.DEPTH_ONE,
                        ReturnableEvaluator.ALL_BUT_START_NODE,
                        TestRelationshipType.HAS_COUNTRY, Direction.INCOMING);

where startNode above is country node to which users are related with HAS_COUNTRY relation.

My question is why it slows down in returning nodes after a while and if there is something which can be done to avoid it?

Thanks
Bhuvan




More information about the User mailing list