Get Dispersion Information About Categorical Data
joshhazel
New Altair Community Member
I have attributes thats are categorical (a finite number of different values like 1st class, 2nd class, 3rd class, 4th etc.) And I would like to figure out with Rapid Miner if it is possible to output some information about an attribute such as the count of each value (ie. there are 100 1st class, 150 2nd class, etc).
Can I do this ? I noticed that after I run my process it has a meta data view and gives me the Most common and its count and the Least common and its count, but how do I see the rest of them?
Can I do this ? I noticed that after I run my process it has a meta data view and gives me the Most common and its count and the Least common and its count, but how do I see the rest of them?
Tagged:
0
Answers
-
You can get the count of each value by using the "Aggregate" operator with the count function. Here is an example on generated data.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.009">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.009" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_nominal_data" compatibility="5.3.009" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="30">
<parameter key="number_of_attributes" value="1"/>
</operator>
<operator activated="true" class="aggregate" compatibility="5.3.009" expanded="true" height="76" name="Aggregate" width="90" x="246" y="30">
<list key="aggregation_attributes">
<parameter key="att1" value="count"/>
</list>
<parameter key="group_by_attributes" value="|att1"/>
</operator>
<connect from_op="Generate Nominal Data" from_port="output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0