[Neo] Node with the millions of incoming relationships "to" - is this a proper way ?
mattias at neotechnology.com
Mon Feb 8 21:31:40 CET 2010
2010/2/8 Dmitri Livotov <dmitri at livotov.eu>:
> Thanks for such detailed clarifications on your designs.
> But if we stay on "million relations to a single node" - will it harm on
> performance or so in comparison to Rick's last suggestion on having only
> top node model ?
Well, all those relationships will occupy some memory and disk space,
but that's a minor issue. Also if such a heavy node only have
relationships of the same type it's ok, but if there should be, say, a
million relationships of one type and one relationship of another type
it could take some time to find that one relationship each time you
request it and that node isn't present in the cache at the moment
(since relationships aren't sorted on disk by type). That should
probably be your only concern.
> Mattias Persson wrote:
>> 2010/2/8 Rick Bullotta <rick.bullotta at burningskysoftware.com>:
>>> Hello, Dmitri.
>>> We are using the first approach - a top level "class" node with "entity"
>>> nodes below them in the graph. In some cases, this is a flat "collection"
>>> of entities, in others, it is a more complex set of linkages accomplished
>>> via relationships. Our taxonomy model (something similar to folders) can
>>> contain any class(es) of entities, via relationships. Additionally, an
>>> entity can be multiply linked to multiple taxonomy locations (and of course
>>> to other entities).
>>> We are also using a hybrid approach to reduce the # of nodes that must be
>>> traversed in certain types of searches. Basically, we are using a sort of
>>> "bucket" approach. In this scenario, the "class" node links to a set of
>>> "buckets". These buckets represent some logical clustering of related nodes
>>> that correspond to some commonly used key data that would be used to
>>> query/reduce the set of nodes in the resultset. For example, if the most
>>> common way to retrieve these nodes would be based on a date/time key, we
>>> might create a set of bucket nodes for each day, each day/hour,
>>> etc...allowing rapid reduction to a smaller subset of relevant nodes for a
>>> date oriented search. Similarly, you could do the same for some key field
>>> in the nodes (bucket all nodes by the first letter of the node identity
>>> field), or use a tagging metamodel where the buckets corresponded to tag
>>> values. I suppose this could also be done directly or indirectly using
>>> Lucene as well.
>>> Hope this helps.
>> An interesting approach Rick,
>> However I'd go for something simpler to avoid those millions of
>> relationships that would go to that one node as you said. I'd only
>> have the "top" (root) folders be connected to that node and then
>> traverse the tree when you'd like to find the files/folders. The cost
>> for traversing both (filtered) folders and files is quite low and
>> you'd get rid of that "heavy" node with loads of relationships on it.
>> In fact, I've created some layouts like that and it worked fine with
>> several millions files/folders at least.
>>> -----Original Message-----
>>> From: user-bounces at lists.neo4j.org [mailto:user-bounces at lists.neo4j.org] On
>>> Behalf Of Dmitri Livotov
>>> Sent: Monday, February 08, 2010 8:26 AM
>>> To: Neo user discussions
>>> Subject: [Neo] Node with the millions of incoming relationships "to" - is
>>> this a proper way ?
>>> A kind of architectural question now
>>> We're thinking on our graph model now, where we will have a lot of nodes
>>> of particular type. Something like the filesystem, where each fs element
>>> (node in neo4j terms) could be of type "file" or "folder".
>>> We need to be able to separate and later query (filter) nodes by type.
>>> In some cases, we'll also need to iterate over
>>> all nodes of some type and so on. For now, I do see two ways for
>>> defining the node types in the database:
>>> - first is to define so called "class nodes", say "File class" node and
>>> "Folder class" node and then every node in the database will have the
>>> extra relation either to "File class" node or "Folder class" node. This
>>> way I can easily find all nodes of particular type and so on.
>>> - second way - to define a property in every node, say "nodetype", with
>>> the appropriate value for every node.
>>> First way seems to be more correct for me, but Im concerning that there
>>> will be a millions of relations to a single node - is this OK for the
>>> database performance and deficiency (millions to one relationship looks
>>> like an unbalanced graph) ?
>>> Neo mailing list
>>> User at lists.neo4j.org
>>> Neo mailing list
>>> User at lists.neo4j.org
> Neo mailing list
> User at lists.neo4j.org
Mattias Persson, [mattias at neotechnology.com]
Neo Technology, www.neotechnology.com
More information about the User