Generate attribute

New Altair Community Member
I am trying the add a column in order to check if an attribute is in between it's [Average -/+ one Standard Deviation] give it a value of "2" if it is less then Average- one Standard Deviation give it a value of "1" and finally if it is Greater then Average+Standard deviation give it a value of "3". Please check the example below.
Thank you,
I am trying the add a column in order to check if an attribute is in between it's [Average -/+ one Standard Deviation] give it a value of "2" if it is less then Average- one Standard Deviation give it a value of "1" and finally if it is Greater then Average+Standard deviation give it a value of "3". Please check the example below.
ID | X1 | Average of X1 | Standard deviation o fX1 | Average - STD | Average + STD | Generated Attribute |
1 | 1,000 | 2,417 | 1,017 | 1,400 | 3,434 | 1 |
2 | 2,000 | 2,417 | 1,017 | 1,400 | 3,434 | 2 |
3 | 2,000 | 2,417 | 1,017 | 1,400 | 3,434 | 2 |
4 | 3,500 | 2,417 | 1,017 | 1,400 | 3,434 | 3 |
5 | 4,000 | 2,417 | 1,017 | 1,400 | 3,434 | 3 |
6 | 2,000 | 2,417 | 1,017 | 1,400 | 3,434 | 1 |
Thank you,
Hi @mario_sark,
You can use Generate attributes operator:
Here the process with a sample of your data :<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="utility:create_exampleset" compatibility="9.2.001" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85"> <parameter key="generator_type" value="comma separated text"/> <parameter key="number_of_examples" value="100"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="input_csv_text" value="X1,average-std,average+std 1,1.4,3.434 2,1.4,3.434 3.5,1.4,3.434"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="85"> <list key="function_descriptions"> <parameter key="generated_attribute" value="if(X1<[average-std],1,if(X1>[average+std],3,2))"/> </list> <parameter key="keep_all" value="true"/> </operator> <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Attributes" to_port="example set input"/> <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Hope this helps,
0 -
Hi @lionelderkrikor Thank you for you reply,
but I forget to mention that i don't have the Average and the standard Deviation columns in my data, however I need to generate it, how can i do that ? (repeat the Average and Standard Deviation of X1 in a single columns like i shared the table above?)
Hope that you can help with this!
Thank you,
Mario0 -
Hi,In this case you can extract the average and the standard deviation values from your data with the operator Extract Macro and use the macros in the Generate Attributes operator. You can also do that in a Loop Attributes operator if you want to perform the same operation on multiple columns.Below is a little example process.Hope this helps,
Ingo<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="UTF-8"/><br> <process expanded="true"><br> <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34"><br> <parameter key="repository_entry" value="//Samples/data/Iris"/><br> </operator><br> <operator activated="true" class="extract_macro" compatibility="9.2.001" expanded="true" height="68" name="Extract Avg" width="90" x="179" y="34"><br> <parameter key="macro" value="a1_avg"/><br> <parameter key="macro_type" value="statistics"/><br> <parameter key="statistics" value="average"/><br> <parameter key="attribute_name" value="a1"/><br> <list key="additional_macros"/><br> </operator><br> <operator activated="true" class="extract_macro" compatibility="9.2.001" expanded="true" height="68" name="Extract SD" width="90" x="313" y="34"><br> <parameter key="macro" value="a1_std_dev"/><br> <parameter key="macro_type" value="statistics"/><br> <parameter key="statistics" value="deviation"/><br> <parameter key="attribute_name" value="a1"/><br> <list key="additional_macros"/><br> </operator><br> <operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="34"><br> <list key="function_descriptions"><br> <parameter key="a1_new" value="if(a1<eval(%{a1_avg})-eval(%{a1_std_dev}),1,if(a1>eval(%{a1_avg})+eval(%{a1_std_dev}),3,2))"/><br> </list><br> <parameter key="keep_all" value="true"/><br> </operator><br> <connect from_op="Retrieve Iris" from_port="output" to_op="Extract Avg" to_port="example set"/><br> <connect from_op="Extract Avg" from_port="example set" to_op="Extract SD" to_port="example set"/><br> <connect from_op="Extract SD" from_port="example set" to_op="Generate Attributes" to_port="example set input"/><br> <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> </process><br> </operator><br></process>
1 -
To accomplish this even more easily, you can also just use Normalize operator with Z-score transformation method, and then it will compute the number of standard deviations from the mean for you automatically!
You could then use Generate Attributes to recode this (e.g., round it, truncate it, take the absolute value, etc.) as desired.