[Neo4j] Node loading question

Rick Bullotta rick.bullotta at burningskysoftware.com
Mon Aug 16 15:52:31 CEST 2010


Hi, David.

I'm writing my own specialized "timeline"-like indexing model.  It is
basically a "bucketed linked list" approach.  I'm trying to optimize overall
performance and felt that having dedicated index nodes and a b-tree model
could be a problem, since 99% of the additions/inserts are at the end of the
list.  

Net result was a fairly simply model where I keep "n" bucket nodes
(typically a bucket holds one day's worth of entries, though I can be
adjusted to a smaller or larger time window), each of which can hold any
number of entry (basically activity stream) nodes.  Each of the bucket nodes
are linked via relationships to a common parent node, and there's a
timestamp and window size (both long) property on each to indicate the
bucket interval.  Then, the bucket node has relationships to the first and
last entry nodes.  The entry nodes have a "next" relationship to the next
in-order node (in time order).  Each node has a timestamp property.  In
order to facilitate searching for a set of values within a given time
interval, I was considering also keeping the timestamp on the relationships
to avoid "touching" the node, but I've decided against that.

A few specialized aspects of our traversals are that I need to query for a
maximum of "n" entries, within a given time interval (which could be "all
time"), and begin the traversal/selection process sometimes with the first
node, sometimes the last ("n" oldest w/in the interval or "n" newest), and
lastly, I then might have to return the resultant set of domain entities in
ascending or descending time order.

I've also had to optimize the write/insert process for entries by making it
somewhat asynchronous and "bucketed" also.  I queue entries until either a
certain number or awaiting in the queue or a specific time period has
elapsed, upon which a block of entries is written within a single
transaction.  Because of the fact that new entries are usually added to the
end of the list, there are some optimizations to account for this scenario
(and, as a result "sub optimizations" for inserts elsewhere within the
list).

Overall, what I have now seems to work quite well and reasonably performant.

Rick



-----Original Message-----
From: user-bounces at lists.neo4j.org [mailto:user-bounces at lists.neo4j.org] On
Behalf Of David Montag
Sent: Sunday, August 15, 2010 6:30 PM
To: Neo4j user discussions
Subject: Re: [Neo4j] Node loading question

Hi Rick,

I believe that once an operation touches properties for the first time for a
node or relationship, all properties are loaded from disk. They are then
cached in memory. What are you storing in the properties? Not sure I
understand the optimization you're trying to make - maybe you could explain
it a bit more?

David

On Sun, Aug 15, 2010 at 6:29 PM, Rick Bullotta <
rick.bullotta at burningskysoftware.com> wrote:

> When a node is accessed, are all of its properties loaded or are they
> "lazily loaded" as needed?
>
>
>
> I'm trying to decide whether to include a subset (but duplicated)
> properties
> on relationships to avoid loading the entire node if that is a concern.
>
>
>
> Thanks,
>
>
>
> Rick
>
>
>
> _______________________________________________
> Neo4j mailing list
> User at lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User at lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user



More information about the User mailing list