transform example set to histogram
keyser84
New Altair Community Member
Is there a possibility to work on the histogram of the data (as can be shown by the plotter)?
In particular:
- given an example set with several instances
- discretize values of one attribute in f bins
- return an example set with f attributes (corresponding to the f bins) which contains only one instance (values are instance counts of source example set)
I want to use this to get a representation of an example set and then compare the histogram against histograms of other example sets (e.g. by applying Euklidian or Manhattan distance measure to the histrogram vector).
There are operators like BinDiscretization and Aggregation, but is there another operator which performs exactly what can be shown by histogram plotter?
Tagged:
0
Answers
-
Hi,
unfortunately there is no operator performing this task in one step, but it's quite easy to use a combination of Discretization and Aggregation as you already suggested. If you combine the operators in the way shown below, it should do the trick.<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="C:\Dokumente und Einstellungen\sland\Eigene Dateien\yale\workspace\sample\data\iris.aml"/>
</operator>
<operator name="ToHistogram" class="OperatorChain" expanded="yes">
<operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="attribute_name_regex" value="a1"/>
<operator name="BinDiscretization" class="BinDiscretization">
<parameter key="number_of_bins" value="10"/>
<parameter key="range_name_type" value="short"/>
</operator>
</operator>
<operator name="Aggregation" class="Aggregation">
<list key="aggregation_attributes">
<parameter key="a1" value="count"/>
</list>
<parameter key="group_by_attributes" value="a1"/>
</operator>
</operator>
</operator>0 -
Thank you... this would help.0