[Neo] Node with the millions of incoming relationships "to" - is this a proper way ?

Dmitri Livotov dmitri at livotov.eu
Wed Feb 10 14:35:25 CET 2010


Thanks for all your detailed responses. We now moved forward for stress 
testing it initial results shows a quite well performace, we're now 
running random reads, traversing and updates from 100 parallel threads 
and average response on neo4j with default (read "no") memory settings 
on a developer machine is about 300-400 ms. We're going to adjust memory 
options and polish test cases and run this on a production server under 
a typical production load of our current, SQL-bases system, which 
handles about 500...1000 concurrent users daily. If anyone interested, 
I'll publish the results.

Any suggestions and hints for fine-tuning neo4j for high load as well 
for another test cases to perform would be also very appreciated.

Thanks,
Dmitri

Mattias Persson wrote:
> 2010/2/8 Dmitri Livotov <dmitri at livotov.eu>:
>   
>> Thanks for such detailed clarifications on your designs.
>>
>> But if we stay on "million relations to a single node" - will it harm on
>> performance or so in comparison to Rick's last suggestion on having only
>> top node model ?
>>     
> Well, all those relationships will occupy some memory and disk space,
> but that's a minor issue. Also if such a heavy node only have
> relationships of the same type it's ok, but if there should be, say, a
> million relationships of one type and one relationship of another type
> it could take some time to find that one relationship each time you
> request it and that node isn't present in the cache at the moment
> (since relationships aren't sorted on disk by type). That should
> probably be your only concern.
>   
>> Dmitri
>>
>>
>> Mattias Persson wrote:
>>     
>>> 2010/2/8 Rick Bullotta <rick.bullotta at burningskysoftware.com>:
>>>
>>>       
>>>> Hello, Dmitri.
>>>>
>>>> We are using the first approach - a top level "class" node with "entity"
>>>> nodes below them in the graph.  In some cases, this is a flat "collection"
>>>> of entities, in others, it is a more complex set of linkages accomplished
>>>> via relationships.  Our taxonomy model (something similar to folders) can
>>>> contain any class(es) of entities, via relationships.  Additionally, an
>>>> entity can be multiply linked to multiple taxonomy locations (and of course
>>>> to other entities).
>>>>
>>>> We are also using a hybrid approach to reduce the # of nodes that must be
>>>> traversed in certain types of searches.  Basically, we are using a sort of
>>>> "bucket" approach.  In this scenario, the "class" node links to a set of
>>>> "buckets".  These buckets represent some logical clustering of related nodes
>>>> that correspond to some commonly used key data that would be used to
>>>> query/reduce the set of nodes in the resultset.  For example, if the most
>>>> common way to retrieve these nodes would be based on a date/time key, we
>>>> might create a set of bucket nodes for each day, each day/hour,
>>>> etc...allowing rapid reduction to a smaller subset of relevant nodes for a
>>>> date oriented search.  Similarly, you could do the same for some key field
>>>> in the nodes (bucket all nodes by the first letter of the node identity
>>>> field), or use a tagging metamodel where the buckets corresponded to tag
>>>> values.  I suppose this could also be done directly or indirectly using
>>>> Lucene as well.
>>>>
>>>> Hope this helps.
>>>>
>>>> Rick
>>>>
>>>>
>>>>         
>>> An interesting approach Rick,
>>>
>>> However I'd go for something simpler to avoid those millions of
>>> relationships that would go to that one node as you said. I'd only
>>> have the "top" (root) folders be connected to that node and then
>>> traverse the tree when you'd like to find the files/folders. The cost
>>> for traversing both (filtered) folders and files is quite low and
>>> you'd get rid of that "heavy" node with loads of relationships on it.
>>>
>>> In fact, I've created some layouts like that and it worked fine with
>>> several millions files/folders at least.
>>>
>>>
>>>       
>>>> -----Original Message-----
>>>> From: user-bounces at lists.neo4j.org [mailto:user-bounces at lists.neo4j.org] On
>>>> Behalf Of Dmitri Livotov
>>>> Sent: Monday, February 08, 2010 8:26 AM
>>>> To: Neo user discussions
>>>> Subject: [Neo] Node with the millions of incoming relationships "to" - is
>>>> this a proper way ?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> A kind of architectural question now
>>>>
>>>> We're thinking on our graph model now, where we will have a lot of nodes
>>>> of particular type. Something like the filesystem, where each fs element
>>>> (node in neo4j terms) could be of type "file" or "folder".
>>>>
>>>> We need to be able to separate and later query (filter) nodes by type.
>>>> In some cases, we'll also need to iterate over
>>>> all nodes of some type and so on. For now, I do see two ways for
>>>> defining the node types in the database:
>>>>
>>>> - first is to define so called "class nodes", say "File class" node and
>>>> "Folder class" node and then every node in the database will have the
>>>> extra relation either to "File class" node or "Folder class" node. This
>>>> way I can easily find all nodes of particular type and so on.
>>>>
>>>> - second way - to define a property in every node, say "nodetype", with
>>>> the appropriate value for every node.
>>>>
>>>> First way seems to be more correct for me, but Im concerning that there
>>>> will be a millions of relations to a single node - is this OK for the
>>>> database performance and deficiency (millions to one relationship looks
>>>> like an unbalanced graph) ?
>>>>
>>>>
>>>> Thanks,
>>>> Dmitri
>>>> _______________________________________________
>>>> Neo mailing list
>>>> User at lists.neo4j.org
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>
>>>> _______________________________________________
>>>> Neo mailing list
>>>> User at lists.neo4j.org
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>
>>>>
>>>>         
>>>
>>>
>>>       
>> _______________________________________________
>> Neo mailing list
>> User at lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>>     
>
>
>
>   



More information about the User mailing list