[Solved] Range Join: Joining two tables --

uenge-san
uenge-san New Altair Community Member
edited November 5 in Community Q&A

Hello Rapidminer Community,

 

I'm trying to join two tables: one contains a start- and an end-date_time, the other only a timestamp.

The goal is, to classify, between which timestamp the event in table 2 has been

 

Example:

Table 1

ID # shift #  start  # end

1 ; "A" ; 2017-01-01 00:00:00 ; 2017-01-03 23:59:59

2 ; "B" ; 2017-01-04 00:00:00 ; 2017-01-05 23:59:59

3 ; "C" ; 2017-01-06 00:00:00 ; 2017-02-01 23:59:59

 

Table 2

ID # attr1 # timestamp

1; "error 1" ; 2017-01-01 07:00

2; "error 2" ; 2017-01-02 09:00

3; "error 4" ; 2017-01-06 07:00

 

Expected joined table

ID # attr1 # timestamp # shift (Table1)

1; "error 1" ; 2017-01-01 07:00 ; "A"

2; "error 2" ; 2017-01-02 09:00 ; "A"

3; "error 4" ; 2017-01-06 07:00 ; "C"

 

Any hint, how to get to the expected joined table??

 

Thanks in advance

 

 

 

Tagged:

Best Answers

  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    Answer ✓

    Hi,

     

    you could try using my generic join solution in a Groovy script:

    https://datascientist.at/2016/06/generic-joins-in-rapidminer/#english

     

    There's also a template building block: 

    http://community.rapidminer.com/t5/RapidMiner-Building-Blocks/Generic-join-script/m-p/33908#U33908

     

    You would extend the script to compare for the timestamp between the start and the end times.

     

    Regards,

     

    Balázs

  • uenge-san
    uenge-san New Altair Community Member
    Answer ✓

    Amazing,

     

    never heard about Groovy Script, but got it working in less than 30 minutes (and most of the time was consumed regarding a bad typo during copying your script into notepad++ ...)

     

    Many, many thanks for your advice! 

     

     

     

    I copied the adjusted script below and highlighted the changes with bold fonts

     

     

     

     

    /**
    * Configuration
    * es1AttName: join attribute name in ExampleSet 1
    * es2AttName: join attribute name in ExampleSet 2
    * joinFunc: Closure (function) to call on both arguments
    */

    es1AttName1 = "start";
    es1AttName2 = "end";

    es2AttName = "TIMESTAMP";

    def joinFunc = { e11, e12, e2 ->
    (e11 <= e2 && e12 >= e2)
    || e2 == null
    }

    /**
    * End of configuration
    */

    //RapidMiner data structures
    import com.rapidminer.tools.Ontology;

    //Determines if an attribute is nominal
    def isNominal = { att ->
    type = att.getValueType();
    type == Ontology.NOMINAL || type == Ontology.BINOMINAL || type == Ontology.POLYNOMINAL;
    }

    //Fetches the correct value type from the example, depending on the attribute type
    def getExampleValue = { ex, att ->
    if (isNominal(att)) {
    ret = ex.getNominalValue(att);
    } else {
    ret = ex.getValue(att);
    }
    ret;
    }

    // First input example set
    ExampleSet es1 = input[0];
    Attributes es1Attributes = es1.getAttributes();
    es1att1 = es1Attributes.get(es1AttName1);
    es1att2 = es1Attributes.get(es1AttName2);

    // Second input example set
    ExampleSet es2 = input[1];
    Attributes es2Attributes = es2.getAttributes();
    es2att = es2Attributes.get(es2AttName);

    int fields = es1Attributes.size() + es2Attributes.size();

    //Arrays for attributes and field metadata
    Attribute[] attributes= new Attribute[fields];

    //Field counter
    int fld = 0;

    //Copy each attribute from ExampleSets 1 and 2 to the result attribute list
    es1Attributes.each{f ->
    attributes[fld] = AttributeFactory.createAttribute(f.name, f.getValueType());
    fld++;
    }
    es2Attributes.each{f ->
    attributes[fld] = AttributeFactory.createAttribute(f.name, f.getValueType());
    fld++;
    }

    MemoryExampleTable table = new MemoryExampleTable(attributes);
    DataRowFactory rowFactory = new DataRowFactory(0);

    // Loop ExampleSet 1
    for ( Example e1 : es1 ) {

    //Attribute value from ExampleSet 1, current example
    e1val1 = getExampleValue(e1, es1att1);
    e1val2 = getExampleValue(e1, es1att2);


    for (Example e2: es2) {

    e2val = getExampleValue(e2, es2att);

    //Join condition - function configured at the top of the script
    if (joinFunc(e1val1, e1val2, e2val)) {

    data = new Object[fields];
    fld = 0;

    // Copy values from both example sets into the new record
    es1Attributes.each{f ->
    if (f.getValueType() == Ontology.NOMINAL || f.getValueType() == Ontology.BINOMINAL || f.getValueType() == Ontology.POLYNOMINAL) {
    data[fld] = e1.getNominalValue(f);
    } else {
    data[fld] = e1.getValue(f);
    }
    fld++;
    }
    es2Attributes.each{f ->
    if (f.getValueType() == Ontology.NOMINAL || f.getValueType() == Ontology.BINOMINAL || f.getValueType() == Ontology.POLYNOMINAL) {
    data[fld] = e2.getNominalValue(f);
    } else {
    data[fld] = e2.getValue(f);
    }
    fld++;
    }

    DataRow row = rowFactory.create(data, attributes);
    table.addDataRow(row);
    }
    }
    }

    ExampleSet exampleSet = table.createExampleSet();

    return(exampleSet);

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Make sure that the date-times are recongized as a date-time in RapidMiner. Then you can select Start Time in Table 1 to join with Timestamp in Table 2 in the Join operator. Just toggle of "Use attribute ID as key" option. 

  • uenge-san
    uenge-san New Altair Community Member

    Hello T-Bone,

     

    thanks for your reply.

     

    Trying to solve the problem with your suggested solution I found out, that there was an import problem with the date_time in table 1 --> solved

     

    so trying to use the Join Operator after this I only get one result out of the joining, which is the event, where the timestamp and the start date matches exactly. 

    But unfortunately didn't come to my expected table...

     

    Any other hints??

     

    BR

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Are you using the right join type? Right, Left, Inner, or Outer?

  • uenge-san
    uenge-san New Altair Community Member

    Hi T-bone

     

    I think so, the join type INNER should be the correct one, isn't it?

     

    Thank you very much

  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    Answer ✓

    Hi,

     

    you could try using my generic join solution in a Groovy script:

    https://datascientist.at/2016/06/generic-joins-in-rapidminer/#english

     

    There's also a template building block: 

    http://community.rapidminer.com/t5/RapidMiner-Building-Blocks/Generic-join-script/m-p/33908#U33908

     

    You would extend the script to compare for the timestamp between the start and the end times.

     

    Regards,

     

    Balázs

  • uenge-san
    uenge-san New Altair Community Member
    Answer ✓

    Amazing,

     

    never heard about Groovy Script, but got it working in less than 30 minutes (and most of the time was consumed regarding a bad typo during copying your script into notepad++ ...)

     

    Many, many thanks for your advice! 

     

     

     

    I copied the adjusted script below and highlighted the changes with bold fonts

     

     

     

     

    /**
    * Configuration
    * es1AttName: join attribute name in ExampleSet 1
    * es2AttName: join attribute name in ExampleSet 2
    * joinFunc: Closure (function) to call on both arguments
    */

    es1AttName1 = "start";
    es1AttName2 = "end";

    es2AttName = "TIMESTAMP";

    def joinFunc = { e11, e12, e2 ->
    (e11 <= e2 && e12 >= e2)
    || e2 == null
    }

    /**
    * End of configuration
    */

    //RapidMiner data structures
    import com.rapidminer.tools.Ontology;

    //Determines if an attribute is nominal
    def isNominal = { att ->
    type = att.getValueType();
    type == Ontology.NOMINAL || type == Ontology.BINOMINAL || type == Ontology.POLYNOMINAL;
    }

    //Fetches the correct value type from the example, depending on the attribute type
    def getExampleValue = { ex, att ->
    if (isNominal(att)) {
    ret = ex.getNominalValue(att);
    } else {
    ret = ex.getValue(att);
    }
    ret;
    }

    // First input example set
    ExampleSet es1 = input[0];
    Attributes es1Attributes = es1.getAttributes();
    es1att1 = es1Attributes.get(es1AttName1);
    es1att2 = es1Attributes.get(es1AttName2);

    // Second input example set
    ExampleSet es2 = input[1];
    Attributes es2Attributes = es2.getAttributes();
    es2att = es2Attributes.get(es2AttName);

    int fields = es1Attributes.size() + es2Attributes.size();

    //Arrays for attributes and field metadata
    Attribute[] attributes= new Attribute[fields];

    //Field counter
    int fld = 0;

    //Copy each attribute from ExampleSets 1 and 2 to the result attribute list
    es1Attributes.each{f ->
    attributes[fld] = AttributeFactory.createAttribute(f.name, f.getValueType());
    fld++;
    }
    es2Attributes.each{f ->
    attributes[fld] = AttributeFactory.createAttribute(f.name, f.getValueType());
    fld++;
    }

    MemoryExampleTable table = new MemoryExampleTable(attributes);
    DataRowFactory rowFactory = new DataRowFactory(0);

    // Loop ExampleSet 1
    for ( Example e1 : es1 ) {

    //Attribute value from ExampleSet 1, current example
    e1val1 = getExampleValue(e1, es1att1);
    e1val2 = getExampleValue(e1, es1att2);


    for (Example e2: es2) {

    e2val = getExampleValue(e2, es2att);

    //Join condition - function configured at the top of the script
    if (joinFunc(e1val1, e1val2, e2val)) {

    data = new Object[fields];
    fld = 0;

    // Copy values from both example sets into the new record
    es1Attributes.each{f ->
    if (f.getValueType() == Ontology.NOMINAL || f.getValueType() == Ontology.BINOMINAL || f.getValueType() == Ontology.POLYNOMINAL) {
    data[fld] = e1.getNominalValue(f);
    } else {
    data[fld] = e1.getValue(f);
    }
    fld++;
    }
    es2Attributes.each{f ->
    if (f.getValueType() == Ontology.NOMINAL || f.getValueType() == Ontology.BINOMINAL || f.getValueType() == Ontology.POLYNOMINAL) {
    data[fld] = e2.getNominalValue(f);
    } else {
    data[fld] = e2.getValue(f);
    }
    fld++;
    }

    DataRow row = rowFactory.create(data, attributes);
    table.addDataRow(row);
    }
    }
    }

    ExampleSet exampleSet = table.createExampleSet();

    return(exampleSet);
  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member

    Hi uenge-san,

     

    glad it's working for you!

     

    This will be a good reference for everyone trying to do range joins in RapidMiner.

     

    Regards,

     

    Balázs