[Neo4j] Lucene Index on Relationships

Marius Kubatz marius.kubatz at udo.edu
Mon Jun 21 18:28:01 CEST 2010


Hi,

thanks a lot for the feedback.

There are a lot of applications where indexed relationships will provide a
speed benefit, but just to a limit. This advantage of sparse properties on a
lot of edges ( I.e. 100 relationships with a property - as opposed to 1000
relationships without ) holds its benefits up to a "critical mass" where it
makes no difference between indexing a iterating. I have some ideas on this
and will try to write down my workarounds for this problem and post them.

I also have an idea how to prevent the indexing of the whole graph. Normally
one would create an index with ( Relationship,  key, value ).
Does it make sense to use the property name as key and start node ID as
value, would this create smaller buckets?

Best regards

Marius

2010/6/21 Craig Taverner <craig at amanzi.com>

> A side comment, since I think indexing relationships with lucene might be
> good, but think there might be alternatives for your current example.
>
> You said that the relationship property is a float from 0 to 1, so you
> cannot use relationship types, but actually, when you consider that any
> index is usually created by breaking data ranges (continuous or discrete)
> into fewer, more discrete ranges, you can use a relationship type to
> represent a range of floats. For example, if you have roughly even
> distribution of floats between 0 and 1, try divide that into 100 parts
> (0%-100%, or 0.01 to 1.00), and make a relationship type for each. This
> would certainly facilitate traversing relationships of specific float
> values
> (at least improve the performance dramatically, as in an index).
>
> Of course, this example focuses on traversing from a particular document.
> If
> you are searching for all relationships in the entire database with
> particular float values, then a separate index would be better.
>
> On Mon, Jun 21, 2010 at 2:11 PM, Marius Kubatz <marius.kubatz at udo.edu
> >wrote:
>
> > Hello guys, hello community!
> >
> > I'm currently evaluating neo4j for my thesis and have a wish :)
> > I have already opened a ticket for this,(
> > https://trac.neo4j.org/ticket/241 ) but
> > I would like to hear what you guys think about it.
> >
> > Basically it just involves the ability to index Neo4j Relationships with
> > Lucene Index.
> >
> > Neo4j works great on sparse graphs, but what happens when you have a very
> > tight graph with several thousands of neighbors to one node?
> > Additionally as soon as you store informations on Relationships you will
> > get
> > into trouble, because you will have to iterate through all those edges to
> > find the
> > properties you seek.
> >
> > If this sounds far fetched please take a look at this example where one
> > might need properties on Relationships:
> > One "Document" node is related to another Document node by a similarity
> > function which is stored in the Relationships between those document
> nodes.
> > Lets just say that we save a float between [0 - 1] on those
> relationships,
> > which makes it impossible to create RelationshipTypes for every value.
> >
> > Using Index to fetch Relationships by their indexed properties would
> > greatly
> > speed up the process and increase the attractiveness of using properties
> on
> > Relationships. I would love to have quick access to Relationship
> properties
> > where I could add and implement fuzzy
> > logic, probabilities, Bayesian networks, similarities, ranking ... and so
> > on
> > ... As said thank you for Relationship properties, they are great and
> > already there, but what I miss is quick access to them.
> >
> > Thank you very much and best regards!
> >
> > Marius
> >
> > --
> > "Programs must be written for people to read, and only incidentally for
> > machines to execute."
> >
> > - Abelson & Sussman, SICP, preface to the first edition
> > _______________________________________________
> > Neo4j mailing list
> > User at lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> User at lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
"Programs must be written for people to read, and only incidentally for
machines to execute."

- Abelson & Sussman, SICP, preface to the first edition


More information about the User mailing list