[Neo] Test data from the filesystem / jdk
johan at neotechnology.com
Sun Sep 28 21:05:57 CEST 2008
I am not sure if I understand your problem but performance seems to be
the main issue so I have a few questions. First, what version of Neo
are you using? We are currently on 1.0-b7-SNAPSHOT and 1.0-b7 release
will be out very soon.
On Sun, Sep 28, 2008 at 7:32 PM, Michael Hunger <neo at jexp.de> wrote:
> So when I learned more about neo from Emil last weekend I looked first for testdata onto the filesystem. There I have
> plenty of nodes (1 mio files) even forming a graph (count symbolic links) with lots of semi-structured information.
> So the first approach was slurping filesystem into neo. I had some problems with that as building the graph was quite
> slow. I'll look into that later (perhaps parallelizing the traversal helps).
Is it some specific traversal in Neo that is slow?
> Then I added a second visitor which is able to create neo node based objects from that. At first it was awfully slow.
> Taking from 8 seconds for the first 500 classes rising up to 300 seconds for the last 500 clases. Then I added more
> caching to it and removed some of the neo lookups (which are fast with 2ms but way to slow for constructing the graph).
> (I also committed after each 500 classes and gave the jvm 1gb of memory both not needed with the in memory approach)
You talk about a degradation of write performance (8s-300s) so could
it be that you picked wrong layout of the node space and perform a
linear search/traversal for each lookup? I would then suggest to
either use index util (lets you index nodes with key value pairs) or
change the node space layout so the lookup can be performed more
efficiently (tree structure that can be traversed given the
2ms for getting a node by id or doing a few traversals sounds awfully
slow. With warm caches you should be able to do thousands traversals
(going from one node to another via a relationship) per ms, and even
Finally 500 classes (I have no idea how much data that contains) could
be to little to actually max out write performance (you still waiting
more on flushes then on writing data to disk).
> At the end I dropped all traversal lookup stuff for finding the nodes to add to but just cached all of them. So that's
> the point I'd like to discuss:
> Whats the best way of building up a reasonably large database?
> What I stuck with is caching everything by name/identifier in java maps and not looking up nodes (by traversal) for
> building relationships. This can get a bit problematic with concurrent threads (say I want to use many threads for
> reading the file system or parsing java class files to increase throughput).
We do not yet have a "batch mode" for Neo (but we are working on it).
I am not sure if that is what you are looking for? Building up a large
graph you should make sure you group enough writes in each transaction
so the time spent flushing data to disk is small compared to the total
time of the transaction.
Use index-util [http://components.neo4j.org/index-util/] for the
identifier->node problem. Pick the right layout of the node
space/graph meaning you can perform all your "application level CRUDs"
in acceptable time.
> Another thing I noted was that creating nodes and relationships in neo is not as fast as it should.
How fast was it and how fast should it be? A normal laptop usually has
10k-20k relationship creates/s and about 50k-100k node creates/s.
> And even another one - I tried using a ramdisk (on a Mac) and it even slowed things down (compared to my solid state drive).
That is weird. Try turn off memory mapping to see if you get better
performance using a ramdisk. To do this create a normal properties
file containing this ("settings.props"):
Then you instantiate Neo like this:
NeoService neo = new EmbeddedNeo( "/ramdisk",
EmbeddedNeo.loadConfigurations( "settings.props" ) );
More information about the User