"Spatial Clustering with RapidMiner"

WaggaWagga
WaggaWagga New Altair Community Member
edited November 5 in Community Q&A
Hello,

for an analysis of POis, I would like to consider a spatial clustering use case. Normally, DBSCAN is suitable for using clustering use cases based on geo positions (lat, long). But what distance measure should be used? The Haversine Distance is not part of RaoidMiner. Is there also a possibility to use lat/long and nominal values for a clustering analysis?

In the RapidMiner forum they reference to external libraries:

http://rapid-i.com/rapidforum/index.php?topic=6888.0

Best

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,

    i think there is not that much built in. But you might check this post by Tom: http://www.neuralmarkettrends.com/2015/11/04/Geo-Distance-In-RapidMiner-and-Python/

    ~Martin
  • WaggaWagga
    WaggaWagga New Altair Community Member
    Thanks Martin for this reference. Are there any spatial clustering extensions planned for the future?

    Best
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,

    sorry but I can not comment on RapidMiner's Roadmap. This is in the end internal information.

    I do not know of any ongoing community project. But maybe you are the one to start this :-)

    Best.
    Martin
  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    How big is the geography you're analyizing? Is it a city, a small country like Austria, or a continent?

    If the geography is small so that the shape of Earth (ellipsoid) doesn't matter, you can transform your latitude-longitude coordinates into a meter-based projection. There are many open source tools for that (e. g. http://www.gdal.org/ogr2ogr.html). Just search for a projection that is usually used by cartographers in your area.

    When you have meter-based coordinates, you can easily interpret Euclidian distances and they will be quite correct.

    Beginning at the size of a country like Germany or France and also depending on the distance from the equator, latitude/longitude coordinates don't express true Earth distance.

    You will get the best results if you use a geospatially enabled database like PostgreSQL with the PostGIS extension. You can then convert between coordinate reference systems/projections and even calculate exact distances.
  • WaggaWagga
    WaggaWagga New Altair Community Member
    Martin Schmitz wrote:

    Hi,

    sorry but I can not comment on RapidMiner's Roadmap. This is in the end internal information.

    I do not know of any ongoing community project. But maybe you are the one to start this :-)

    Best.
    Martin
    Hello Martin,

    I guess I have still a lack of experiences with RapidMiner to implement new RapidMiner functions....;-)

    Best
  • WaggaWagga
    WaggaWagga New Altair Community Member
    Balázs Bárány wrote:

    How big is the geography you're analyizing? Is it a city, a small country like Austria, or a continent?

    If the geography is small so that the shape of Earth (ellipsoid) doesn't matter, you can transform your latitude-longitude coordinates into a meter-based projection. There are many open source tools for that (e. g. http://www.gdal.org/ogr2ogr.html). Just search for a projection that is usually used by cartographers in your area.

    When you have meter-based coordinates, you can easily interpret Euclidian distances and they will be quite correct.

    Beginning at the size of a country like Germany or France and also depending on the distance from the equator, latitude/longitude coordinates don't express true Earth distance.

    You will get the best results if you use a geospatially enabled database like PostgreSQL with the PostGIS extension. You can then convert between coordinate reference systems/projections and even calculate exact distances.
    Hi,

    thanks for your comments. The data corpus is based on European POIs (from Germany, France to UK). We are using PostgreSQL and PostGIS (e.g, the data type geometry), and I also found the function ST_ClusterIntersecting during my research. But I guess it is not a fully geo-spatial clustering algorithm. I have to admit, the documentation is very sparse.

    Best
  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    OK, if you're using PostGIS, you can easily create a distance matrix.
    Just do a self join the datasets (select ... from data d1 cross join data d2) and calculate the distance between the geometries: ST_Distance(d1.geom, d2.geom) (assuming that you're using a meter based projection/CRS).

    Or even better, using Geography instead of Geometry based calculation (slower, but more precise):

        ST_Distance(ST_Transform(d1.geo, 4326)::geography, ST_Transform(d2.geo, 4326)::geography) as distance
  • WaggaWagga
    WaggaWagga New Altair Community Member
    Hi,

    may be it is interesting for you. I have found the ELKI library. It is a result of a research project by the LMU Munich. It is Java based, but I recommend to use the frontend. For the source code, one weak point is the missing/sparse documentation.

    http://elki.dbs.ifi.lmu.de/

    ELKI contains 5 different variations of the OPTICS algorithm and a wide list of distance metrics. For the geo-spatial analysis with OPTICS, they provide a latlong-distance metric.

    All the best

    WaggaWagga