Parsing Latitudes and longitudes

Shourya
Shourya New Altair Community Member
edited November 2024 in Community Q&A
Hello,
I am using rapidminer studio 9.6.000 on ubuntu 18.04.
I have latitude and longitudes values expressed as e.g. 50.833.333, where 50 is the degree, 833 is minutes and 333 is seconds. By default the values are loading as nominal. Not being able to use format numbers or parse numbers.
Can anyone please help me understand how I can use this values to plot maps in visualization?

Best Answer

Answers

  • jacobcybulski
    jacobcybulski New Altair Community Member
    edited April 2020
    I am not aware of a general function to achieve conversion of map coordinates. However, you could parse your nominal lat and long values using regular expressions into their three components, i.e. degrees, minutes and seconds and then generate a new attribute with a formula:
    decimal_degrees = degrees + (minutes/60) + (seconds/3600)
    Alternatively you can convert it in R or Python via the scripting extension.
    Jacob
    P.S. By the way, I'd expect some additional information in your coordinates, i.e. E,W,N or S? Or at least an optional sign in front of your coordinates? Also are minutes a sufficient precision on the map, I'd assume some additional decimal points at the end, e.g. 37.58.01.8.S,145.25.02.5.E.


  • jacobcybulski
    jacobcybulski New Altair Community Member
    Having said this, you really need to be careful while converting map coordinates as rounding errors will move your points by 20-30 meters in real terms. You need to convert the coordinates in double precision. It is probably best to convert your geo-locations outside RapidMiner using a specialist package. On Linux there is this package called GeoConvert, I have never used it but if you are going to put some places back on maps, it will pay to spend some time doing it right.
  • Shourya
    Shourya New Altair Community Member
    Making the file available as well. I also have numbers like 50.23 or 50.2. I am not being able to frame a regex to catch them all. so we have 6 formats to catch:

    1. 50.333.333
    2. 50.2
    3. 50.333
    4. 50.33
    5. 4.333.333
    6. 433.333

  • jacobcybulski
    jacobcybulski New Altair Community Member
    edited April 2020
    I attach an example with formats which I think should be there, but you are welcome to adapt it to your needs, e.g. when the minutes or seconds are missing you can include an "if" statement matching the pattern and if the pattern fails, insert a zero for these missing bits (the way I dealt with W, E, S and N). Check this out (you will need to save this XML as a RMP file into your repository).
    I made a correction to the millisecond translation.
    <?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="-1"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.6.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="id,place,long,lat&#10;1,Cardinia Reservoir Emerald VIC 3782,37.58.01.8.S,145.25.02.5.E&#10;2,French Island VIC 3921,38.20.48.5.S,145.20.56.2.E&#10;3,1020 Studewood St Houston TX 77008 United States,29.47.22.1.N,95.23.15.3.W"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.6.000" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
            <list key="function_descriptions">
              <parameter key="dd_long_deg" value="parse(replaceAll(long,&quot;^([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_long_min" value="parse(replaceAll(long,&quot;^[0-9]+\\.([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_long_sec" value="parse(replaceAll(long,&quot;^[0-9]+\\.[0-9]+\\.([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_long_msec" value="parse(replaceAll(long,&quot;^[0-9]+\\.[0-9]+\\.[0-9]+\\.([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_long_sign" value="if(matches(long,&quot;^[0-9]+\\.[0-9]+\\.[0-9]+\\.([0-9]+)\\.S.*$&quot;),-1,1)"/>
              <parameter key="dd_long" value="dd_long_sign*(dd_long_deg + (dd_long_min/60.0) + (dd_long_sec/3600.0)) + (dd_long_msec/1000000.0)"/>
              <parameter key="dd_lat_deg" value="parse(replaceAll(lat,&quot;^([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_lat_min" value="parse(replaceAll(lat,&quot;^[0-9]+\\.([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_lat_sec" value="parse(replaceAll(lat,&quot;^[0-9]+\\.[0-9]+\\.([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_lat_msec" value="parse(replaceAll(lat,&quot;^[0-9]+\\.[0-9]+\\.[0-9]+\\.([0-9]+)\\..*$&quot;,&quot;$1&quot;))"/>
              <parameter key="dd_lat_sign" value="if(matches(lat,&quot;^[0-9]+\\.[0-9]+\\.[0-9]+\\.([0-9]+)\\.W.*$&quot;),-1,1)"/>
              <parameter key="dd_lat" value="dd_long_sign*(dd_lat_deg + (dd_lat_min/60.0) + (dd_lat_sec/3600.0)) + (dd_lat_msec/1000000.0)"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.6.000" expanded="true" height="82" name="Check Precision" width="90" x="313" y="34">
            <list key="function_descriptions">
              <parameter key="dd_long_x" value="dd_long*1000"/>
              <parameter key="dd_lat_x" value="dd_lat*1000"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Check Precision" to_port="example set input"/>
          <connect from_op="Check Precision" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    Also note that if you incorporate milliseconds, RapidMiner seems to be losing precision, which will translate into meters of difference.

    Jacob
  • jacobcybulski
    jacobcybulski New Altair Community Member
    edited April 2020
    BTW, without information about East, West, North and South, your translation will be incorrect for southern or northern coordinates. So if the coordinates you received are very specific to the region, e.g. some place in USA, you will have to assume appropriate E/W, N/S location.
  • jacobcybulski
    jacobcybulski New Altair Community Member
    I have just discovered RapidMiner has a GeoProcessing extension, which I have never used as all my coordinates processing was always done externally to RapidMiner (in R or Python). However, it may do the trick?
  • Shourya
    Shourya New Altair Community Member
    Answer ✓
    I have solved the problem using R and I am attaching the working solution here.
  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    Hi,

    I'm the author of the GeoProcessing extension, feel free to ask me. It doesn't have an operator or conversion for this format, though.

    Regards,

    Balázs