[Neo4j] How to boost performance?

Vinicius Carvalho java.vinicius at gmail.com
Wed Nov 23 16:09:44 CET 2011


Hi Michael, this is going to be a newbie question, so please forgive me:

I've re ran the tests with your examples, and using a embedded database.
First thing: Whooping FAST! Mind blowing :D -> 5ms

But ... I got different results, same time though which is great, proves
the exact thing that happened on my local machine 1k nodes 5ms 250k nodes
5ms :D

Using cypher on the console
start n = node(3) match n-->()-->(x) return x

I got 6475 nodes, which seems to be right, as every node have around 80
relations, so 80*80 would give me this.

Using your first example (I probably got it wrong) with the new traversal:

Node startNode = db.getNodeById(Long.valueOf(id));
TraversalDescription traversalQuery =
Traversal.description().evaluator(Evaluators.atDepth(2)).expand(Traversal.expanderForAllTypes(Direction.OUTGOING));
long start = System.currentTimeMillis();
for(Node n : traversalQuery.traverse(startNode).nodes()){
count++;
}
 long end = System.currentTimeMillis();
return "Fetched " + count + " nodes in " + (end-start) + " ms";

It returns 196 nodes in 5ms

And using the second one:

Node startNode=db.getNodeById(3);
long start = System.currentTimeMillis();
for (Relationship rel : startNode.getRelationships()) {
   Node other = rel.getOtherNode(startNode);
   for(Relationship rr : other.getRelationships()){
   count++;
   }
}
long end = System.currentTimeMillis();
return "Fetched " + count + " nodes in " + (end-start) + " ms";

Returns 25896 nodes in 5ms as well.

Just trying to understand why I've got different results, again really
newbie question, I'll dive into the docs of traversal a bit further, but if
you could share a thought here would be great.

Thanks


On Wed, Nov 23, 2011 at 2:21 PM, Vinicius Carvalho
<java.vinicius at gmail.com>wrote:

> Tks, for this test it's just a readonly graph now, so I don't think I'll
> run into synchronization issues. As we proceed with tests, I do hope that
> we will have one day is a HA version of neo4j. And as Jim's said in that
> thread, use it for other to read the graph.
>
> Regards
>
>
> On Wed, Nov 23, 2011 at 2:15 PM, Michael Hunger <
> michael.hunger at neotechnology.com> wrote:
>
>> Just make sure that it is just a snapshot of the data and doesn't update
>> its caches.
>>
>> Otherwise you will run into synchronization issues.
>>
>> See also this thread and Tobias' explanations around it:
>>
>> http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Synchronization-of-EmbeddedReadOnlyGraphDatabase-Bug-td3174626.html#a3213450
>>
>> Michael
>>
>> Am 23.11.2011 um 15:05 schrieb Vinicius Carvalho:
>>
>> > But wouldn't it mean that I need to have exclusive lock on the db? I
>> would
>> > like to keep the server running pointing at the same data directory.
>> >
>> > Regards
>> >
>> > On Wed, Nov 23, 2011 at 1:50 PM, Michael Hunger <
>> > michael.hunger at neotechnology.com> wrote:
>> >
>> >> Please use EmbeddedGraphDatabase,
>> >>
>> >> EmbeddedReadOnlyGraphDatabase caches a snapshot of the data in its
>> caches
>> >> and doesn't get update-changes.
>> >>
>> >> Michael
>> >>
>> >> Am 23.11.2011 um 14:39 schrieb Vinicius Carvalho:
>> >>
>> >>> Hi Michael, thanks. The data load was fine, I've used your script with
>> >> the
>> >>> BathInserter. Memory footprint was really slow, I think the peak was
>> >> 200mb
>> >>> of heap usage. I did something really retarded and left a logger.info
>> ,
>> >>> which slowed things a bit, but the process was really smooth.
>> >>>
>> >>> Many thanks on the help with the query. I'll try this, I'm putting the
>> >>> readonlyembedded neo inside our app right now. I expect to see some
>> good
>> >>> performance boost :)
>> >>>
>> >>> Best Regards
>> >>>
>> >>> On Wed, Nov 23, 2011 at 12:12 PM, Michael Hunger <
>> >>> michael.hunger at neotechnology.com> wrote:
>> >>>
>> >>>> Vinicius,
>> >>>>
>> >>>> first: did you have any issues importing the data into Neo4j?
>> >>>> second: your example used cypher which is not optimized for
>> performance
>> >>>> (yet!). This is in our plans for the next two releases of neo4j.
>> >>>>
>> >>>> So if you want to see the real performance of neo4j, please use the
>> >>>> traversal framework or the core-API:
>> >>>>
>> >>>> Cypher & Traversals:
>> >>>>
>> >>>> // define
>> >>>> cypherQuery = cypherParser.parse("start n=node({start_node}) match
>> >>>> n-->()-->x return x")
>> >>>> traversalQuery =
>> >>>>
>> >>
>> Traversal.description().evaluator(Evaluators.atDepth(2)).expand(Traversal.expanderForAllTypes(Direction.OUTGOING))
>> >>>>
>> >>>> // execute
>> >>>> for (Node n : cypherQuery.execute({"start_node":startNode})) { ... }
>> >>>> for (Node n : traversalQuery.traverse(startNode).nodes()) { ... }
>> >>>>
>> >>>> If you're interested in the paths, remove the ".nodes()" call at the
>> >>>> traverser
>> >>>>
>> >>>> In java core-api code:
>> >>>>
>> >>>> Node start=db.getNodeById(3);
>> >>>>
>> >>>> for (Relationship rel=start.getRelationships()) {
>> >>>>  Node second = rel.getOtherNode(start);
>> >>>>  for (Relationship rel=second.getRelationships()) {
>> >>>>      Node third = rel.getOtherNode(second);
>> >>>>      // do something with the 3 nodes, 2 relationships which form
>> your
>> >>>> path
>> >>>>  }
>> >>>> }
>> >>>>
>> >>>> In the REST API the traversal would look like: (see
>> >>>>
>> >>
>> http://docs.neo4j.org/chunked/snapshot/rest-api-traverse.html#rest-api-traversal-using-a-return-filter
>> >>>> )
>> >>>>  * POST http://localhost:7474/db/data/node/3/traverse/node
>> >>>>  * Accept: application/json
>> >>>>  * Content-Type: application/json
>> >>>>
>> >>>> {
>> >>>> "relationships" : [ {"direction" : "out" } ],
>> >>>> "max_depth" : 3
>> >>>> }
>> >>>>
>> >>>>
>> >>>> Am 23.11.2011 um 11:54 schrieb Vinicius Carvalho:
>> >>>>
>> >>>>> Hi there, I've posted a few days ago about the POC I'm doing here
>> at my
>> >>>>> company. I have some initial numbers and I'd like to ask for some
>> help
>> >>>> here
>> >>>>> in order to promote neo4j here in LMI Ericsson.
>> >>>>>
>> >>>>> I've loaded a mySQL db with a really simple entity, that pretty much
>> >> only
>> >>>>> represents a node and relations (only properties it has is an UID
>> and
>> >> x/y
>> >>>>> space coordinate for each node)
>> >>>>>
>> >>>>> The DB contains 250.000 cells and 19. relations stored in a myISAM
>> >> table,
>> >>>>> indexed only by it's primary key. Please find the DDL for the two
>> >> tables.
>> >>>>>
>> >>>>> CREATE TABLE  `pci`.`cells` (
>> >>>>> `id` varchar(32) collate utf8_bin NOT NULL,
>> >>>>> `x_pos` double default NULL,
>> >>>>> `y_pos` double default NULL,
>> >>>>> `pci` smallint(6) default '0',
>> >>>>> PRIMARY KEY  (`id`)
>> >>>>> )
>> >>>>>
>> >>>>> CREATE TABLE  `pci`.`relations` (
>> >>>>> `id` int(11) NOT NULL auto_increment,
>> >>>>> `source` varchar(32) collate utf8_bin default NULL,
>> >>>>> `target` varchar(32) collate utf8_bin default NULL,
>> >>>>> PRIMARY KEY  (`id`),
>> >>>>> KEY `src_idx` (`source`),
>> >>>>> KEY `src_target` (`target`)
>> >>>>> )
>> >>>>>
>> >>>>> So as you can see, a simple secondary table contains the
>> relationship
>> >>>> with
>> >>>>> source and targets pointing to the cells table.
>> >>>>>
>> >>>>> I've loaded this exact same DB into a neoserver running on the same
>> >>>>> machine: A Blade with 26 cpus (6 cores each) and 16gb RAM.
>> >>>>>
>> >>>>> One of the requirements we have is to find all associations of my
>> >>>>> associations. Something that in neo I did like this:
>> >>>>>
>> >>>>> START n = node(3)
>> >>>>> MATCH n-->()-->(x)
>> >>>>> return x
>> >>>>>
>> >>>>> For this specific node it returns 6475 nodes.
>> >>>>>
>> >>>>> I have tested this before using Hibernate in two modes: without a L2
>> >>>> cache,
>> >>>>> and with an L2 Cache (Ehcache standalone no replication).
>> >>>>> Here's a snippet of the code that loads it, so you can understand
>> >> what's
>> >>>>> going under the hood:
>> >>>>>
>> >>>>>
>> >>>>> @Override
>> >>>>> public List<Cell> loadCellWithRealtions(String... ids) {
>> >>>>> Session session = (Session) em.getDelegate();
>> >>>>> Criteria c = session.createCriteria(Cell.class)
>> >>>>> .setFetchMode("incomingRelations", FetchMode.SELECT)
>> >>>>> .setFetchMode("outgoingRelations", FetchMode.SELECT)
>> >>>>> .add(Restrictions.in("id", Arrays.asList(ids)));
>> >>>>> List<Cell> results = c.list();
>> >>>>> for(Cell cell : results){
>> >>>>> Hibernate.initialize(cell.getIncomingRelations());
>> >>>>> Hibernate.initialize(cell.getOutgoingRelations());
>> >>>>> }
>> >>>>> return results;
>> >>>>> }
>> >>>>>
>> >>>>> @Override
>> >>>>> public List<Cell> loadCellWithNeighbourRelations(String... ids) {
>> >>>>> List<Cell> cells = loadCellWithRealtions(ids);
>> >>>>> for(Cell c : cells){
>> >>>>> for(Relation r : c.getIncomingRelations()){
>> >>>>> Hibernate.initialize(r.getSource().getIncomingRelations());
>> >>>>> Hibernate.initialize(r.getSource().getOutgoingRelations());
>> >>>>> }
>> >>>>> for(Relation r : c.getOutgoingRelations()){
>> >>>>> Hibernate.initialize(r.getTarget().getIncomingRelations());
>> >>>>> Hibernate.initialize(r.getTarget().getOutgoingRelations());
>> >>>>> }
>> >>>>> }
>> >>>>> return cells;
>> >>>>> }
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> So the first method executes one query and 2 subselects to find a
>> cell
>> >>>> and
>> >>>>> all relations, the second method, iterate over each relation and do
>> the
>> >>>>> same. So I pretty much will have something like 3+r*3 selects on db,
>> >>>> where
>> >>>>> r is the number of relations right.
>> >>>>>
>> >>>>> Ok, to be a bit fair with the tests, I've ran this for the same
>> node 10
>> >>>>> times (get a chance to warm the caches), exclude the longest and
>> >> smallest
>> >>>>> result, and then took a mean of it. Here's the results:
>> >>>>>
>> >>>>> EhCache: 70ms
>> >>>>> Plain Hibernate: 550ms
>> >>>>>
>> >>>>> I still don't have a version of neo4j code running integrated in the
>> >> app
>> >>>>> server, but the idea is to use REST API. Running the query on the
>> REST
>> >>>> API
>> >>>>> took over 2 seconds on average, but due the large size of the
>> response,
>> >>>>> network lagging was the issue. So I ran the same query 10 times
>> using
>> >> the
>> >>>>> web console, and the average time for neo was 300ms
>> >>>>>
>> >>>>> Before asking anything I do know that we will have more complex
>> queries
>> >>>>> where neo will shine, but I need to improve those results in order
>> to
>> >>>> sell
>> >>>>> it here :), with those numbers, ppl will just say that having a
>> cache
>> >> and
>> >>>>> using Relational model would suffice.
>> >>>>>
>> >>>>> Anything I could do to improve this?
>> >>>>>
>> >>>>> Regards
>> >>>>> _______________________________________________
>> >>>>> Neo4j mailing list
>> >>>>> User at lists.neo4j.org
>> >>>>> https://lists.neo4j.org/mailman/listinfo/user
>> >>>>
>> >>>> _______________________________________________
>> >>>> Neo4j mailing list
>> >>>> User at lists.neo4j.org
>> >>>> https://lists.neo4j.org/mailman/listinfo/user
>> >>>>
>> >>> _______________________________________________
>> >>> Neo4j mailing list
>> >>> User at lists.neo4j.org
>> >>> https://lists.neo4j.org/mailman/listinfo/user
>> >>
>> >> _______________________________________________
>> >> Neo4j mailing list
>> >> User at lists.neo4j.org
>> >> https://lists.neo4j.org/mailman/listinfo/user
>> >>
>> > _______________________________________________
>> > Neo4j mailing list
>> > User at lists.neo4j.org
>> > https://lists.neo4j.org/mailman/listinfo/user
>>
>> _______________________________________________
>> Neo4j mailing list
>> User at lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>


More information about the User mailing list