[Neo4j] Neo4j in GIS Applications

Craig Taverner craig at amanzi.com
Fri Oct 7 23:07:09 CEST 2011


Hi all,

I am certainly behind on my emails, but I did just answer a related question
about OSM and fragmentation, and I think that might have answered some of
Daniels questions.

But I can say a little more about OSM and Neo4j here, specifically about the
issue of joins in postgres. Let me start by describing where I think
postgres might be faster than neo4j, and then move onto where neo4j is
faster than postgres.

Importing OSM data into postgres will be faster than neo4j because the
foreign keys are simple integer references between tables and are indexed
using postgres high performance indexes. In Neo4j the relationships are much
more detailed explicit bi-directional references taking more disk space (but
no index space). The disk write time is longer (more data written), but the
advantages of not having an index make it worth while.

So that leads naturally to where neo4j is faster. The reason there is no
index on the foreign key is because there is no need for one. Each
relationship contains the id of the node it points to (and points from), and
that id is directly mapped to the location on disk of the node itself. So
this is more like an array lookup, because all nodes are the same size on
disk. So the 'join' you perform when traversing from one osm-node to another
is extremely fast, but more importantly it is not affected by database size.
It is O(1) in performance! Fantastic! In rdbms, the need for an index on the
foreign key means you are building a tree structure to get the join down
from O(N) to O(ln(N)) or something better, but never as good as O(1).

In neo4j-spatial, if you perform a bounding box query, you are traversing an
RTree, which does not exist in posgres, but does exist in PostGIS. In both
Neo4j-Spatial and PostGIS you are working with a tree index that will slow
things down if there is a lot of data, and currently the postgis rtree is
better optimized than the neo4j-spatial rtree. But if you are performing
more graph-like processing, for example proximity searches, or routing
analysis, then you will get the full O(1) benefits of the graph database,
and no way can postgres match that :-)

OK. Lots of hype, but I get enthusiastic sometimes. Take anything I say with
a pinch of salt. Believe the part that make sense to you, and try some tests
otherwise. It would be great to hear your experiences with modeling OSM in
neo4j versus postgres.

Regards, Craig

On Tue, Oct 4, 2011 at 7:18 PM, Andreas Kollegger <
andreas.kollegger at neotechnology.com> wrote:

> Hi Daniel,
>
> If you haven't yet, you should check out the work done in the Neo4j Spatial
> project - https://github.com/neo4j/spatial - which has fairly
> comprehensive
> support for GIS.
>
> Data locality, as you mention, is exactly a big advantage of using a graph
> for geospatial data. Take a look at the Neo4j Spatial project and let us
> know what you think.
>
> Best,
> Andreas
>
> On Tue, Oct 4, 2011 at 9:58 AM, danielb <danielberchtold at gmail.com> wrote:
>
> > Hello everyone,
> >
> > I am going to write my master thesis about the suitability of graph
> > databases in GIS applications (at least I hope so^^). The database has to
> > provide topological queries, network analysis and the ability to store
> > large
> > amount of mapdata for viewing - all based on OSM-data of Germany (< 100M
> > nodes). Most likely I will compare Neo4j to PostGIS.
> > As a starting point I want to know why you would recommend Neo4j to do
> the
> > job? What are the main advantages of a graph database compared to a
> > (object-)relational database in the GIS environment? The main focus and
> the
> > goal of this work should be to show a performance improvement over
> > relational databases.
> > In a student project (OSM navigation system) we worked with relational
> > (SQLite) and object-oriented (Perst) databases on netbook hardware and
> > embedded systems. The relational database approach showed us two
> problems:
> > If you transfer the OSM model directly into tables then you have a lot of
> > joins which slows everything down (and lots of redundancy when using
> > different tables for each zoom level). The other way is to store as much
> as
> > possible in one big (sparse) table. But this would also have some
> > performance issues I guess and from a design perspective it is not a nice
> > solution. The object-oriented database also suffered from many random
> reads
> > when loading a bounding box. In addition we could not say how data was
> > stored in detail.
> > The performance indeed increased after caching occured or by the use of
> SSD
> > hardware. You can also store everything in RAM (money does the job), but
> > for
> > now you have to assume that all of the data has to be read from a slow
> disk
> > the first time. Can Neo4j be configured to read for example a bounding
> box
> > of OSM data from disk in an efficient way (data locality)?
> > Maybe you also have some suggestions where I should have a look at in
> this
> > work and what can be improved in Neo4j to get better results. I also
> would
> > appreciate related papers.
> >
> > kindly regards, Daniel
> >
> > --
> > View this message in context:
> >
> http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-in-GIS-Applications-tp3393925p3393925.html
> > Sent from the Neo4j Community Discussions mailing list archive at
> > Nabble.com.
> > _______________________________________________
> > Neo4j mailing list
> > User at lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> User at lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>


More information about the User mailing list