[Neo4j] Querying for nodes that have no relationhip to a specfic node

Alberto Perdomo alberto.perdomo at gmail.com
Wed Aug 18 11:50:27 CEST 2010


Hi Craig,

On Thu, Jul 29, 2010 at 12:14 PM, Craig Taverner <craig at amanzi.com> wrote:
> I think leveraging existing relationships is obviously valuable, but I
> thought I'd throw in an idea for doing the original suggestion, pure random
> search:

Sounds interesting. I think the way to go is to leverage existing
relationships like (favorites, etc.) and the pseudo random.

> Reword the original problem to instead of looking for a set of random
> potential matches for every node, rather looking for new random
> relationships. What I mean it find both A and B randomly. This can be done
> at high performance by simply generating a random number between 0 and the
> maximum node ID. Assuming most nodes are people, you will be able to
> generate a sample set of random people almost instantly (need to trim the
> set to real people nodes of course, removing invalid nodes and non-people
> nodes, hence the word 'almost').

The array needs to be random but according to certain constraints,
like age, gender, etc.
For instance calculate the score with n users that are female or male
and within an age range of  ...

> The sample set can be some pre-defined size, eg. 100 nodes. Then compute all
> node-node relationships between the nodes in this set (up to 10k
> relationships) with the following rules:
>
>   - Ignore if a relationship already exists
>   - Possibly limit to only 10 relationships per node (your suggestion
>   above)

Limit meaning in this run? Or at all times? The first is ok, the
second not. I guess you mean exiting after I have computed already 10
new relationships right?

>   - If the total number of pre-existing relationships are high (or
>   relationships per node are high), invoke a trimming algorithm, for example
>   removing relationships of low weight, since you care less about them

This might be interesting to keep the density low but I have to look
at it since they way it works is that people see suggestions of other
users that have a good match. I can't probably make them disappear
suddenly.

>
> This idea can be run continously in the background thread, and if the
> trimmer works well, will allow the total graph size to reach some stable
> state with time.
>
> Then you can add features like when a node's properties are changed, delete
> all those relationships to the node and pass the id into the background
> process for immediate inclusion in the sample set (bypass the random
> sampling for new or edited nodes, so they get some relationships
> immediately).


More information about the User mailing list