Hi Guys,
My Data is in the Format shown below:
Vechicleid Drivendistance Time TotalConsumption Weight DriverNote Routdificulty
1 582 39060 143 27 9 3.5
2 478 45980 135 38 9,3 4,4
Its real Data from a Transporting agency where I am doing my bachelor thesis at the moment. I will first explain the data even if most of it should pretty clear. The first Attribute "id" is just the id of the vehicle sending the Data. Second Attribute "DrivenDistance" is the Distance the truck traveld, the the third attribute is the Time the Truck travlled in seconds, the fourth attribute are the litres the truck used for the traveled distance, the fith attribute "weight" is the averrage weight of the truck during the journey, the six attribute is the note calculate for the driver beacause of his style of driving and the seventh attribute "Routdificulty" means how hard the rout to drive is, that means for example driving thorugh the mountains with a lot of weight and speed will give a higher mark.
So what i would like to find out is how the ratio between such variable is in average to check the plausibility of each veriable. For example i would like to make conclusions like:" If the DrivenDistance, time,weight,Routdificult, DriverNote the TotalConsumption should be between x and y".
So i started to calculate the correlation between the attributes and they are pretty weak with one exception the TotalConsumption is strongly correlated to Drivendistance (0,944) which is pretty logical. But i know from field tests that the Weight and Routdificulty should influence it more than the correlation schows (0.0177 and 0.22).
So my question is if there is anyway to find out/make conclusion about the ratios between more than 2 variables? should i use another method than the correlation matrix? or should i change my process listed below?:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
<process expanded="true" height="446" width="628">
<operator activated="true" class="read_excel" compatibility="5.1.006" expanded="true" height="60" name="Read Excel" width="90" x="45" y="120">
<parameter key="excel_file" value="C:\Users\Rojas\Desktop\BA_A-z\Analyse\Rapidminer_Forum.xls"/>
<parameter key="imported_cell_range" value="A1:G77"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="VEHICLEID.true.integer.id"/>
<parameter key="1" value="DrivenDistance.true.numeric.attribute"/>
<parameter key="2" value="Time.true.integer.attribute"/>
<parameter key="3" value="TotalConsumptio.true.real.attribute"/>
<parameter key="4" value="Weight.true.numeric.attribute"/>
<parameter key="5" value="DriverNote.true.real.attribute"/>
<parameter key="6" value="RoutDificulty.true.real.attribute"/>
</list>
</operator>
<operator activated="true" class="correlation_matrix" compatibility="5.1.006" expanded="true" height="94" name="Correlation Matrix" width="90" x="246" y="120">
<parameter key="squared_correlation" value="true"/>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
<connect from_op="Correlation Matrix" from_port="example set" to_port="result 1"/>
<connect from_op="Correlation Matrix" from_port="matrix" to_port="result 2"/>
<connect from_op="Correlation Matrix" from_port="weights" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
Any advice would be highly apreciatted (if i didnt explained it in suffiecient detail or logicaly enough please ask me - english isnt my naitive languae) ;D