binomalize multiple polynominal columns together
I have many columns A1-A10 (10 in this example)
Each column got polynominal values.
I want to binomalize the columns, but not each on its own, instead i want one attribute for each value, which appears in any of these columns A1-A10?
Example:
Input:
A1 A2 A3
"green" "red" ?
"red" ? "blue"
Output:
A = green A = red A = blue
True True False
False True True
Answers
-
Use the Nominal to Numerical operator and leave the default parameter of Dummy Coding set.
0 -
hi thank but i still get
A1 = green A2 = green A1 = red A2 = red
i want to something like:
A = green A = red
0 -
I found a solution. First create an id for each example. Then do unpivot on all attributes A1-A10 into one attribute A. Then do Nom-to-Bin on A. Then do an pivot with group set to your id and index set to the index of unpivot.
1 -
I think that the combination of Nominal to Numerical and Generate Aggregation is an elegant solution.
Have a look at this process (an artificial dataset is generated using a R script):
<?xml version="1.0" encoding="UTF-8"?><process version="7.5.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.003" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Execute R" width="90" x="112" y="34">
<parameter key="script" value="# rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) rm_main = function() { cat1 <- c("A", "B", "C", NA) 	 df <- data.frame(sample(cat1, 2500, replace = T), 	 sample(cat1, 2500, replace = T), 	 sample(cat1, 2500, replace = T)) 	 colnames(df) <- paste("Att", 1:3) # connect 2 output ports to see the results return(df) } "/>
</operator>
<operator activated="true" class="nominal_to_numerical" compatibility="7.5.003" expanded="true" height="103" name="Nominal to Numerical" width="90" x="380" y="34">
<list key="comparison_groups"/>
</operator>
<operator activated="true" class="generate_aggregation" compatibility="7.5.003" expanded="true" height="82" name="Generate Aggregation" width="90" x="581" y="34">
<parameter key="attribute_name" value="HasA"/>
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value=".*A"/>
<parameter key="aggregation_function" value="maximum"/>
</operator>
<connect from_op="Execute R" from_port="output 1" to_op="Nominal to Numerical" to_port="example set input"/>
<connect from_op="Nominal to Numerical" from_port="example set output" to_op="Generate Aggregation" to_port="example set input"/>
<connect from_op="Generate Aggregation" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>Here are the parameters of the Generate Aggregation for a quick view:
Best,
Sebastian
Edit: Here is a version that uses Loop Values
<?xml version="1.0" encoding="UTF-8"?><process version="7.5.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.003" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Execute R" width="90" x="112" y="34">
<parameter key="script" value="# rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) rm_main = function() { cat1 <- c("A", "B", "C", NA) 	 df <- data.frame(sample(cat1, 2500, replace = T), 	 sample(cat1, 2500, replace = T), 	 sample(cat1, 2500, replace = T)) 	 colnames(df) <- paste("Att", 1:3) # connect 2 output ports to see the results return(df) } "/>
</operator>
<operator activated="true" class="concurrency:loop_values" compatibility="7.5.003" expanded="true" height="82" name="Loop Values" width="90" x="447" y="34">
<parameter key="attribute" value="Att 1"/>
<parameter key="reuse_results" value="true"/>
<process expanded="true">
<operator activated="true" class="nominal_to_numerical" compatibility="7.5.003" expanded="true" height="103" name="Nominal to Numerical" width="90" x="246" y="34">
<list key="comparison_groups"/>
</operator>
<operator activated="true" class="generate_aggregation" compatibility="7.5.003" expanded="true" height="82" name="Generate Aggregation" width="90" x="514" y="34">
<parameter key="attribute_name" value="Has%{loop_value}"/>
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value=".*%{loop_value}"/>
<parameter key="aggregation_function" value="maximum"/>
</operator>
<connect from_port="input 1" to_op="Nominal to Numerical" to_port="example set input"/>
<connect from_op="Nominal to Numerical" from_port="example set output" to_op="Generate Aggregation" to_port="example set input"/>
<connect from_op="Generate Aggregation" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Execute R" from_port="output 1" to_op="Loop Values" to_port="input 1"/>
<connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>1 -
edit: you have to do aggregation with the attribute id instead of the last pivot step and select all A rows with RegEx and set to only occuring as type
0