The corner stone of this data will be an identity to which we will want to associate information about that identity. The problem is sometimes we want to see what is related to the identity and other times we want to see what identities are related to some identity attribute, kind of LinkedInish.


> Potentially stupid questions follow: In looking at how to add sharding
> to Neo4j, I was wondering if it made any sense to put Neo4j on top of
> Cassandra or maybe a distributed BTree+ system? I love the relationship
> modeling in Neo4j but I need the scalability of sharding; preferable not
> done at the client.

Hi Rick --

Worry not, that's not a stupid question at all. The problem with just
putting the Neo4j API on top of something like Cassandra is that it
doesn't really solve the problem. The challenge with auto-sharding a
graph isn't the engineering of writing a distributed system. It's the
science of efficiently partitioning a dynamic graph.

Cassandra shards everything by a defined key. That will lead to an
inefficienct sharding scheme if you have a graph-like connected data
structure that you want to be able to traverse in an ad-hoc manner.

Do you know any invariants about the domain, like "entity of type X
will NEVER be connected to entity of type Y"?


