[Neo4j] Querying for nodes that have no relationhip to a specfic node
craig at amanzi.com
Wed Aug 18 15:25:50 CEST 2010
> Mmm we cannot limit the number of relationships in this app. One the
> most important features
> is that we'll keep looking for good matches for the user.
Since the trimming algorithm will entirely delete low scoring relationships,
the search algorithm will also keep finding them again, and if something has
changed (some data, or the scoring algorithm), then the relationships might
score higher on a later round and not get trimmed (something else might be
Sure we can avoid storing low value relationships to avoid having an
> almost complete graph.
> We'll also limit the calculation of the score to users that match some
> factors, like age, geographical region, language, etc. In this way the
> graph might get dense but it won't really grow into a complete graph.
I think your factors for limitation would be part of the score. For example,
large age differences would lower the score, large geographic separation
also. Instead of making these hard constraints, just build them into the
score. This allows you to adjust the score calculation as you decide how
better to model the 'compatibility' and the graph can adjust accordingly.
I would think also that whenever a new score formula is introduced, you
would go through the graph and recalculate the scores of existing
relationships, trimming new low scoring ones, before continuing with the
ongoing search/trim cycle. This way a new scoring method would reflect very
quickly in the graph without affecting runtime capabilities. Peoples
relationships would not disappear only to appear at some unknown future
time, they would just start adjusting in order as the new score took effect.
For example, if you previously did not take into account age, a user might
see high scoring partners on their list. When you add age into the formula,
that users list would re-order, with some previous matches possibly even
being trimmed right off the list if the age difference is high enough. Very
We can choose not to store relationships with a score under N. In this
> we could end up calculating a possible match that we have already
> discarded. This would mean sacrificing CPU in favor of a less dense
> graph, but I think it might make sense.
Yes. The wasted CPU cycles would be most noticed once the graph is stable,
and the algorithm is running in circles re-calculating the same stuff. That
is a minor issue I think, but can be optimized for later. The main thing
would be to be sure that when new data comes in the graph adjusts
accordingly. So one optimization is to trigger recalculation cycles based on
the rate of data input (new users, or users editing their profiles). I think
that these edits will also re-prioritise the queue, since a new user should
get to the front of the queue to get some initial matches, before an
existing users gets his existing matches refined (possibly only marginally
More information about the User