[Solved] Calculating the deltas between following examples
qwertz
New Altair Community Member
Dear all,
I have a data set like this
i=id att1 att2
1 5 1
2 8 4
3 3 3
4 4 7
Now I would like to transform this into a new example set by applying the following rule:
Subtract example i+1 of attribute x by example i of the same attribute (e.g. "8-5")
Even better would be a custom formula that allows to calculate the percental change between two following examples (e.g. "(8-5)/5*100" )
I tried the "distance transformation" operator of the series extension for Rapidminer. However, it only provides absolutes while it remains unclear wheter the delta is positive or negative. Moreover, this operator additionally requires transformation from data to series and back.
Another way I could think of is to use the "windowing" operator by generating additional attributes shifted by one example. Then one could apply the "generate attributes" operator for calculation. However, I wasn't able so far to figure out a working process.
Especially as I have to run it with different attributes all the time so that an automated handling of the attribute's names would be highly appreciated.
Search tags "delta" and "distance" revealed no useful results.
Looking forward to hearing from you
Sachs
I have a data set like this
i=id att1 att2
1 5 1
2 8 4
3 3 3
4 4 7
Now I would like to transform this into a new example set by applying the following rule:
Subtract example i+1 of attribute x by example i of the same attribute (e.g. "8-5")
Even better would be a custom formula that allows to calculate the percental change between two following examples (e.g. "(8-5)/5*100" )
I tried the "distance transformation" operator of the series extension for Rapidminer. However, it only provides absolutes while it remains unclear wheter the delta is positive or negative. Moreover, this operator additionally requires transformation from data to series and back.
Another way I could think of is to use the "windowing" operator by generating additional attributes shifted by one example. Then one could apply the "generate attributes" operator for calculation. However, I wasn't able so far to figure out a working process.
Especially as I have to run it with different attributes all the time so that an automated handling of the attribute's names would be highly appreciated.
Search tags "delta" and "distance" revealed no useful results.
Looking forward to hearing from you
Sachs
Tagged:
0
Answers
-
Hi,
Loop though the examples using macros.
Best H0 -
Hi Sachs,
I think you're on the right track with the series/windowing operators. The ones you're looking for are "Lag" (which finds the previous value) and "Differentiate" (which finds the difference in absolute (signed) terms). Then all you need to need to do is generate the % based on these two values. Since both operators require an attribute as argument, you need to wrap them in a Loop Attributes Operator to repeat for multiple attributes in an example set. I'll attach examples for single and multiple attributes using the Iris dataset (nonsense values of course), which you should be able to adapt - as soon as I've worked out how!
Cheers,
Russ0 -
Hi Sachs,
Looks like attachments aren't possible (really?!), here's the XML, just cut out, save and import.
Single Attribute:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true" height="521" width="955">
<operator activated="true" class="retrieve" compatibility="5.2.006" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="75">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value="id|a1|"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="series:differentiate_example_set" compatibility="5.2.000" expanded="true" height="76" name="Differentiate" width="90" x="313" y="75">
<parameter key="attribute_name" value="a1"/>
<parameter key="change_mode" value="difference"/>
<parameter key="lag" value="1"/>
<parameter key="keep_original_attribute" value="true"/>
</operator>
<operator activated="true" class="series:lag_series" compatibility="5.2.000" expanded="true" height="76" name="Lag Series" width="90" x="447" y="75">
<list key="attributes">
<parameter key="a1" value="1"/>
</list>
</operator>
<operator activated="true" class="rename" compatibility="5.2.006" expanded="true" height="76" name="Rename" width="90" x="581" y="75">
<parameter key="old_name" value="change(a1)"/>
<parameter key="new_name" value="change_a1"/>
<list key="rename_additional_attributes">
<parameter key="a1-1" value="lag_a1"/>
</list>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.2.006" expanded="true" height="76" name="Generate Attributes" width="90" x="715" y="75">
<list key="function_descriptions">
<parameter key="pcchange(a1)" value="change_a1/lag_a1"/>
</list>
<parameter key="use_standard_constants" value="true"/>
<parameter key="keep_all" value="true"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes (2)" width="90" x="849" y="75">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="change_a1"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Differentiate" to_port="example set input"/>
<connect from_op="Differentiate" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
<connect from_op="Lag Series" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Multiple Attributes:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
HTH,
<process version="5.2.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true" height="521" width="882">
<operator activated="true" class="retrieve" compatibility="5.2.006" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="75">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value="|id|a2|a1"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="loop_attributes" compatibility="5.2.006" expanded="true" height="60" name="Loop Attributes" width="90" x="380" y="75">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value="|a2|a1"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="iteration_macro" value="loop_attribute"/>
<process expanded="true" height="575" width="992">
<operator activated="true" class="series:differentiate_example_set" compatibility="5.2.000" expanded="true" height="76" name="Differentiate" width="90" x="45" y="30">
<parameter key="attribute_name" value="%{loop_attribute}"/>
<parameter key="change_mode" value="difference"/>
<parameter key="lag" value="1"/>
<parameter key="keep_original_attribute" value="true"/>
</operator>
<operator activated="true" class="series:lag_series" compatibility="5.2.000" expanded="true" height="76" name="Lag Series" width="90" x="179" y="75">
<list key="attributes">
<parameter key="%{loop_attribute}" value="1"/>
</list>
</operator>
<operator activated="true" class="rename" compatibility="5.2.006" expanded="true" height="76" name="Rename" width="90" x="380" y="120">
<parameter key="old_name" value="change(%{loop_attribute})"/>
<parameter key="new_name" value="change_%{loop_attribute}"/>
<list key="rename_additional_attributes">
<parameter key="%{loop_attribute}-1" value="lag_%{loop_attribute}"/>
</list>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.2.006" expanded="true" height="76" name="Generate Attributes" width="90" x="514" y="120">
<list key="function_descriptions">
<parameter key="pcchange(%{loop_attribute})" value="change_%{loop_attribute}/lag_%{loop_attribute}"/>
</list>
<parameter key="use_standard_constants" value="true"/>
<parameter key="keep_all" value="true"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes (2)" width="90" x="715" y="120">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value="change_%{loop_attribute}"/>
<parameter key="attributes" value="|change_%{loop_attribute}|lag_%{loop_attribute}"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<connect from_port="example set" to_op="Differentiate" to_port="example set input"/>
<connect from_op="Differentiate" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
<connect from_op="Lag Series" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_port="example set"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_example set" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Loop Attributes" to_port="example set"/>
<connect from_op="Loop Attributes" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Russ0 -
Hi there!
That really worked out Thank you!
I would have never expected the function "difference" under an operator called "differentiate".
Isn't that something completly different?
Anyway, glad to have this operator being part of Rapidminer
@haddock: Just to get the idea behing your approach: Do you mean something like in the attached code? While I loop through the examples I store the last one in a macro to do calculation before iterating to the next example.
Observation 1: The very first calculated value is wrong because I need to initialize the macro. Of course, this could be filtered / corrected later after the loop.
Observation 2: It is not possible to use the "generate attributes" operator in the loop because that way it would overwrite the new attribute all the time and in the end it would read the same value in all lines.
That's probably not surprising to the more experienced user but I wanted to share what I came across on my learning curve.
PS: Indeed, there is no upload function - at least not to my knowledge as I was looking for it also.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
<process expanded="true" height="161" width="547">
<operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
<operator activated="true" class="set_macro" compatibility="5.2.003" expanded="true" height="76" name="Set Macro" width="90" x="179" y="30">
<parameter key="macro" value="last"/>
<parameter key="value" value="1"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.2.003" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="313" y="30">
<list key="function_descriptions">
<parameter key="new" value="0"/>
</list>
</operator>
<operator activated="true" class="loop_examples" compatibility="5.2.003" expanded="true" height="76" name="Loop Examples" width="90" x="447" y="30">
<process expanded="true" height="512" width="640">
<operator activated="true" class="extract_macro" compatibility="5.2.003" expanded="true" height="60" name="Extract Macro (2)" width="90" x="45" y="30">
<parameter key="macro" value="current"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="att1"/>
<parameter key="example_index" value="%{example}"/>
</operator>
<operator activated="true" class="generate_macro" compatibility="5.2.003" expanded="true" height="76" name="Generate Macro" width="90" x="179" y="30">
<list key="function_descriptions">
<parameter key="result" value="%{current}/%{last}"/>
</list>
</operator>
<operator activated="true" class="set_data" compatibility="5.2.003" expanded="true" height="76" name="Set Data" width="90" x="313" y="30">
<parameter key="example_index" value="%{example}"/>
<parameter key="attribute_name" value="new"/>
<parameter key="value" value="%{result}"/>
<list key="additional_values"/>
</operator>
<operator activated="true" class="extract_macro" compatibility="5.2.003" expanded="true" height="60" name="Extract Macro" width="90" x="447" y="30">
<parameter key="macro" value="last"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="att1"/>
<parameter key="example_index" value="%{example}"/>
</operator>
<connect from_port="example set" to_op="Extract Macro (2)" to_port="example set"/>
<connect from_op="Extract Macro (2)" from_port="example set" to_op="Generate Macro" to_port="through 1"/>
<connect from_op="Generate Macro" from_port="through 1" to_op="Set Data" to_port="example set input"/>
<connect from_op="Set Data" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Extract Macro" from_port="example set" to_port="example set"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_example set" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
</process>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Set Macro" to_port="through 1"/>
<connect from_op="Set Macro" from_port="through 1" to_op="Generate Attributes (2)" to_port="example set input"/>
<connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Loop Examples" to_port="example set"/>
<connect from_op="Loop Examples" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Thank you all
Sachs0 -
No problem! I guess differentiate is usually associated with the meaning assigned to it in calculus, but I'm not a series specialist, so maybe it's the correct expression in that context!
Glad it helped
Russ0