Relating datasets
artdijk
New Altair Community Member
I am new to datamining and RM,
I want to relate two data tables. Each table has a unique key of date and number. Thus the tables should be related via this key. How can this be done in RM ?
Thanks
Arthur
I want to relate two data tables. Each table has a unique key of date and number. Thus the tables should be related via this key. How can this be done in RM ?
Thanks
Arthur
Tagged:
0
Answers
-
Hi Arthur,
what exactly do you mean by relate? I guess it's the same you do, when you join some relational data base tables? We have an operator called "Join" which will do this.
Greetings,
Sebastian0 -
Thanks Sebastian,
The problem is before the join is executed.
Table A: has as field names: Flight Date Delay Pax
Table B: has as field names: Flight Date Report
In the result of the Join I want to have a table C which has as rows the flights that appears in both tables and as fieldnames the fieldnames that appear in both tables. So I can see how many pax had delay.
Table A and Table B have no Id. They are linked by Flight and Date.
How is this done in Rapid Miner,
Thanks
Arthur0 -
Hi Arthur,
so neither flight and date are unique, but flight + date is unique? Then I would suggest building a new id column from the values of flight and date column. This column can be used for the join. Let make a short sample process:
After we generated some data and transformed it for having an comparable situation, we build a new id like attribute by concatenating the values of this two attribute:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
I hope, it will help you.
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="296" width="1018">
<operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="number_examples" value="150"/>
<parameter key="use_local_random_seed" value="true"/>
</operator>
<operator activated="true" class="rename" expanded="true" height="76" name="Rename" width="90" x="179" y="30">
<parameter key="old_name" value="att1"/>
<parameter key="new_name" value="Flight"/>
</operator>
<operator activated="true" class="rename" expanded="true" height="76" name="Rename (2)" width="90" x="313" y="30">
<parameter key="old_name" value="att2"/>
<parameter key="new_name" value="Date"/>
</operator>
<operator activated="true" class="numerical_to_polynominal" expanded="true" height="76" name="Numerical to Polynominal" width="90" x="447" y="30"/>
<operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes" width="90" x="581" y="30">
<list key="function_descriptions">
<parameter key="FlightDate" value="Flight + " - " + Date"/>
</list>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="715" y="30">
<parameter key="name" value="FlightDate"/>
<parameter key="target_role" value="id"/>
</operator>
<operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="210">
<parameter key="use_local_random_seed" value="true"/>
</operator>
<operator activated="true" class="rename" expanded="true" height="76" name="Rename (3)" width="90" x="175" y="210">
<parameter key="old_name" value="att1"/>
<parameter key="new_name" value="Flight"/>
</operator>
<operator activated="true" class="rename" expanded="true" height="76" name="Rename (4)" width="90" x="310" y="210">
<parameter key="old_name" value="att2"/>
<parameter key="new_name" value="Date"/>
</operator>
<operator activated="true" class="numerical_to_polynominal" expanded="true" height="76" name="Numerical to Polynominal (2)" width="90" x="445" y="210"/>
<operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="580" y="210">
<list key="function_descriptions">
<parameter key="FlightDate" value="Flight + " - " + Date"/>
</list>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role (2)" width="90" x="715" y="210">
<parameter key="name" value="FlightDate"/>
<parameter key="target_role" value="id"/>
</operator>
<operator activated="true" class="join" expanded="true" height="76" name="Join" width="90" x="916" y="75"/>
<connect from_op="Generate Data" from_port="output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
<connect from_op="Rename (2)" from_port="example set output" to_op="Numerical to Polynominal" to_port="example set input"/>
<connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Generate Data (2)" from_port="output" to_op="Rename (3)" to_port="example set input"/>
<connect from_op="Rename (3)" from_port="example set output" to_op="Rename (4)" to_port="example set input"/>
<connect from_op="Rename (4)" from_port="example set output" to_op="Numerical to Polynominal (2)" to_port="example set input"/>
<connect from_op="Numerical to Polynominal (2)" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
<connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Greetings,
Sebastian0 -
Thanks for your process. It is not yet the solution but we are getting closer.
I can never get the same Id in both file for the same FlightNr and Data combination because the number of records is different in both files.
I try to explain the problem in different words:
File A has fields: FlightNr, Date, Pax, Fuel
File B has fields: FlightNr, Date, Event
File B has less records. Only the flight that have an event are in file B.
I want to get
File C with fields: FlightNr, Date, Pax, Fuel, Event
On most flights no events are recorded so many event fields will be empty in file C.
Resulting file example:
Flt112,1-1-2010,34,2300,?
Flt116,1-1,2010,45,2900,turbulence
Flt112,2-1-2010,23,2100,?
Flt116,2-1-2010,41,2800,?
Thanks
Arthur0