An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
<operator name="Root" class="Process" expanded="yes"> <operator name="ExampleSetGenerator" class="ExampleSetGenerator"> <parameter key="target_function" value="random"/> <parameter key="number_of_attributes" value="20"/> </operator> <operator name="IdTagging" class="IdTagging"> </operator> <operator name="CSVExampleSetWriter" class="CSVExampleSetWriter"> <parameter key="csv_file" value="bla"/> </operator> <operator name="Normalization" class="Normalization"> </operator> <operator name="KMeans" class="KMeans"> </operator> <operator name="CSVExampleSource" class="CSVExampleSource"> <parameter key="filename" value="bla"/> </operator> <operator name="IdTagging (2)" class="IdTagging"> </operator> <operator name="ExampleSetJoin" class="ExampleSetJoin"> <parameter key="remove_double_attributes" value="false"/> </operator> <operator name="FeatureNameFilter" class="FeatureNameFilter"> <parameter key="skip_features_with_name" value="att[0-9]*"/> </operator> <operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace"> <parameter key="replace_what" value="_from_ES2"/> <parameter key="apply_on_special" value="false"/> </operator></operator>
<operator name="Root" class="Process" expanded="yes"> <operator name="ExampleSetGenerator" class="ExampleSetGenerator"> <parameter key="target_function" value="random"/> <parameter key="number_of_attributes" value="20"/> </operator> <operator name="IdTagging" class="IdTagging"> </operator> <operator name="IOStorer" class="IOStorer"> <parameter key="name" value="original"/> <parameter key="io_object" value="ExampleSet"/> <parameter key="remove_from_process" value="false"/> </operator> <operator name="Normalization" class="Normalization"> <parameter key="return_preprocessing_model" value="true"/> <parameter key="create_view" value="true"/> </operator> <operator name="KMeans" class="KMeans"> </operator> <operator name="IORetriever" class="IORetriever"> <parameter key="name" value="original"/> <parameter key="io_object" value="ExampleSet"/> </operator> <operator name="ExampleSetJoin" class="ExampleSetJoin"> <parameter key="remove_double_attributes" value="false"/> </operator> <operator name="FeatureNameFilter" class="FeatureNameFilter"> <parameter key="skip_features_with_name" value="att[0-9]*"/> </operator> <operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace"> <parameter key="replace_what" value="_from_ES2"/> <parameter key="apply_on_special" value="false"/> </operator></operator>
The problem now is that i want to de-normalize values of all 20 fields to the original values
The problem is though that the clustering output still does not show you the DE-normalized
The problem now is that i want to de-normalize values of all 20 fields to the original values so that cluster values make sense
I'm always amused by posts that start "i do not mean to sound rude".
Versions one and two of the code did the job Did you run them?
Version three was only put in to make things clearer for you. Something got flipped and the clusters got lost. So I'll edit version three out.
Maybe you'll want to edit your last post as well.
No they didn't, they did the job the way you perceived it / Yes i did run all of them
So that means that there can be an output like the one i explained? To have the numbers in the cluster model prior the normalization? I sure would like to see how this is possible because this is actually what i wanted originally.
QuoteMaybe you'll want to edit your last post as well.Sure if you explain why should i, no problem!
4) show the CLUSTERING MODEL'S RESULTS DENORMALIZED. I do *not* want for every row it's associated de-normalized value!!
The problem now is that i want to de-normalize values of all 20 fields to the original values so that cluster values make sense.
While RM computes the sum and std dev as part of the meta data view of an ExampleSet, I'm not sure there's a way to get to those values.
<operator name="Root" class="Process" expanded="yes"> <operator name="ExampleSetGenerator" class="ExampleSetGenerator"> <parameter key="target_function" value="random"/> <parameter key="number_of_attributes" value="1"/> </operator> <operator name="MovingAverage" class="MovingAverage"> <parameter key="attribute_name" value="att1"/> <parameter key="window_width" value="100"/> <parameter key="result_position" value="start"/> </operator> <operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace"> <parameter key="replace_what" value="\(|\)"/> </operator> <operator name="ChangeAttributeName" class="ChangeAttributeName"> <parameter key="old_name" value="moving_averageatt1"/> <parameter key="new_name" value="avg_att1"/> </operator> <operator name="MovingAverage (2)" class="MovingAverage"> <parameter key="attribute_name" value="att1"/> <parameter key="window_width" value="100"/> <parameter key="aggregation_function" value="standard_deviation"/> <parameter key="result_position" value="start"/> </operator> <operator name="ChangeAttributeNamesReplace (2)" class="ChangeAttributeNamesReplace"> <parameter key="replace_what" value="\(|\)"/> </operator> <operator name="ChangeAttributeName (2)" class="ChangeAttributeName"> <parameter key="old_name" value="moving_averageatt1"/> <parameter key="new_name" value="stddev_att1"/> </operator> <operator name="MissingValueReplenishment" class="MissingValueReplenishment"> <list key="columns"> </list> </operator></operator>
<operator name="Root" class="Process" expanded="yes"> <operator name="ExampleSetGenerator" class="ExampleSetGenerator"> <parameter key="target_function" value="random"/> <parameter key="number_of_attributes" value="20"/> </operator> <operator name="IdTagging" class="IdTagging"> </operator> <operator name="IOStorer" class="IOStorer"> <parameter key="name" value="original"/> <parameter key="io_object" value="ExampleSet"/> <parameter key="remove_from_process" value="false"/> </operator> <operator name="Normalization" class="Normalization"> <parameter key="return_preprocessing_model" value="true"/> <parameter key="create_view" value="true"/> </operator> <operator name="KMeans" class="KMeans"> </operator> <operator name="IORetriever" class="IORetriever"> <parameter key="name" value="original"/> <parameter key="io_object" value="ExampleSet"/> </operator> <operator name="ExampleSetJoin" class="ExampleSetJoin"> <parameter key="remove_double_attributes" value="false"/> </operator> <operator name="FeatureNameFilter" class="FeatureNameFilter"> <parameter key="skip_features_with_name" value="att[0-9]*"/> </operator> <operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace"> <parameter key="replace_what" value="_from_ES2"/> <parameter key="apply_on_special" value="false"/> </operator> <operator name="ChangeAttributeRole" class="ChangeAttributeRole"> <parameter key="name" value="cluster"/> </operator> <operator name="Aggregation" class="Aggregation"> <list key="aggregation_attributes"> <parameter key="att1" value="average"/> <parameter key="att2" value="average"/> <parameter key="att3" value="average"/> <parameter key="att4" value="average"/> <parameter key="att5" value="average"/> <parameter key="att6" value="average"/> <parameter key="att7" value="average"/> <parameter key="att8" value="average"/> <parameter key="att9" value="average"/> <parameter key="att10" value="average"/> <parameter key="att11" value="average"/> <parameter key="att12" value="average"/> <parameter key="att13" value="average"/> <parameter key="att14" value="average"/> <parameter key="att15" value="average"/> <parameter key="att16" value="average"/> <parameter key="att17" value="average"/> <parameter key="att18" value="average"/> <parameter key="att19" value="average"/> <parameter key="att20" value="average"/> </list> <parameter key="group_by_attributes" value="cluster"/> </operator></operator>
hgwelec wrote:@keith,This is what i am talking about and steffen understood what i meant right from my 1st post.
I did not know the operator MovingAverage yet ... really nice. However, it seems the calculation of stdev is messed up, isn't it ?
haddock wrote:Nice one Keith,Now that I do understand, and curiously he'll still need the original/raw dataI think this does the necessary.<code deleted>
and this works out the average for each cluster - just added a change of role on the cluster and an OLAP operator to my original offering.<code deleted>
Thanks again for bringing clarity to the question, how we were meant to get that from the original question remains a mystery to me.
Despite the frustrations expressed on this thread, this forum is still a friendlier place for earnest newbies (which I was not that long ago) to learn RapidMiner than the R-help list is for R, and is one of the many things I think is great about RM.
So by using attribute construction it can be done but imagine building new attributes for 60 input variables! so the question is whether some node can be used to calculate all this information for all -say- 60 attributes
<operator name="Root" class="Process" expanded="yes"> <operator name="ExampleSetGenerator" class="ExampleSetGenerator"> <parameter key="target_function" value="random"/> <parameter key="number_of_attributes" value="20"/> </operator> <operator name="IdTagging" class="IdTagging"> </operator> <operator name="IOStorer" class="IOStorer"> <parameter key="name" value="original"/> <parameter key="io_object" value="ExampleSet"/> <parameter key="remove_from_process" value="false"/> </operator> <operator name="Normalization" class="Normalization"> <parameter key="return_preprocessing_model" value="true"/> <parameter key="create_view" value="true"/> </operator> <operator name="KMeans" class="KMeans"> </operator> <operator name="IORetriever" class="IORetriever"> <parameter key="name" value="original"/> <parameter key="io_object" value="ExampleSet"/> </operator> <operator name="ExampleSetJoin" class="ExampleSetJoin"> <parameter key="remove_double_attributes" value="false"/> </operator> <operator name="FeatureNameFilter" class="FeatureNameFilter"> <parameter key="skip_features_with_name" value="att[0-9]*"/> </operator> <operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace"> <parameter key="replace_what" value="_from_ES2"/> <parameter key="apply_on_special" value="false"/> </operator> <operator name="ChangeAttributeRole" class="ChangeAttributeRole"> <parameter key="name" value="cluster"/> </operator> <operator name="ValueIterator" class="ValueIterator" expanded="yes"> <parameter key="attribute" value="cluster"/> <operator name="ExampleFilter" class="ExampleFilter"> <parameter key="condition_class" value="attribute_value_filter"/> <parameter key="parameter_string" value="cluster=%{loop_value}"/> </operator> <operator name="AttributeFilter" class="AttributeFilter"> <parameter key="condition_class" value="attribute_name_filter"/> <parameter key="parameter_string" value="att.*"/> <parameter key="apply_on_special" value="true"/> </operator> <operator name="ExampleSetTranspose" class="ExampleSetTranspose"> </operator> <operator name="AttributeAggregation" class="AttributeAggregation"> <parameter key="attribute_name" value="Centroid_%{loop_value}"/> <parameter key="aggregation_attributes" value="att_.*"/> <parameter key="aggregation_function" value="average"/> <parameter key="keep_all" value="false"/> </operator> </operator> <operator name="ExampleSetJoin (2)" class="ExampleSetJoin"> <parameter key="remove_double_attributes" value="false"/> </operator> <operator name="ExampleSetTranspose (2)" class="ExampleSetTranspose"> </operator></operator>
If it was possible to access the centroid values directly and apply the mean/stdev calculations from your first code sample, that would probably be a more scalable solution than joining the data to itself and computing the sum/stdev across the entire data set (depends on how many rows he's dealing with). It would also (I think) handle the case where the cluster centers are calculated by something other than mean (as steffen alludes to).
<operator name="Root" class="Process" expanded="yes"> <operator name="ExampleSetGenerator" class="ExampleSetGenerator"> <parameter key="target_function" value="random"/> <parameter key="number_of_attributes" value="20"/> </operator> <operator name="KMeans" class="KMeans"> <parameter key="k" value="3"/> </operator> <operator name="Model_To_ExampleSet" class="OperatorChain" expanded="yes"> <operator name="ResultWriter" class="ResultWriter"> <parameter key="result_file" value="Z:\Clus.csv"/> </operator> <operator name="CSVExampleSource" class="CSVExampleSource"> <parameter key="filename" value="Z:\clus.csv"/> <parameter key="read_attribute_names" value="false"/> <parameter key="column_separators" value=";\s*"/> <parameter key="trim_lines" value="true"/> </operator> <operator name="ChangeAttributeNames2Generic" class="ChangeAttributeNames2Generic"> </operator> <operator name="ExampleFilter (1)" class="ExampleFilter"> <parameter key="condition_class" value="attribute_value_filter"/> <parameter key="parameter_string" value="att1=.*\t.*|Cluster \d"/> </operator> <operator name="Split (1)" class="Split"> <parameter key="attributes" value="att1"/> <parameter key="split_pattern" value=" "/> </operator> <operator name="NominalNumbers2Numerical (1)" class="NominalNumbers2Numerical"> </operator> <operator name="AttributeConstruction" class="AttributeConstruction"> <list key="function_descriptions"> <parameter key="mid" value="if(att1_2>1,1,att1_2)"/> </list> </operator> <operator name="CumulateSeries" class="CumulateSeries"> <parameter key="attribute_name" value="mid"/> <parameter key="keep_original_attribute" value="false"/> </operator> <operator name="ExampleFilter (2)" class="ExampleFilter"> <parameter key="condition_class" value="attribute_value_filter"/> <parameter key="parameter_string" value="att1_1=.*\t.*"/> </operator> <operator name="Split (2)" class="Split"> <parameter key="attributes" value="att1_1"/> <parameter key="split_pattern" value=":\t"/> </operator> <operator name="AttributeFilter" class="AttributeFilter"> <parameter key="condition_class" value="attribute_name_filter"/> <parameter key="parameter_string" value="att1_2"/> <parameter key="invert_filter" value="true"/> </operator> <operator name="NominalNumbers2Numerical (2)" class="NominalNumbers2Numerical"> </operator> <operator name="ChangeAttributeName (1)" class="ChangeAttributeName"> <parameter key="old_name" value="att1_1_2"/> <parameter key="new_name" value="Centroid"/> </operator> <operator name="ChangeAttributeName (2)" class="ChangeAttributeName"> <parameter key="old_name" value="cumulative(mid)"/> <parameter key="new_name" value="cluster_num"/> </operator> <operator name="Example2AttributePivoting" class="Example2AttributePivoting"> <parameter key="group_attribute" value="cluster_num"/> <parameter key="index_attribute" value="att1_1_1"/> <parameter key="consider_weights" value="false"/> </operator> </operator></operator>
<operator name="Root" class="Process" expanded="yes"> <operator name="ExampleSetGenerator" class="ExampleSetGenerator"> <parameter key="target_function" value="random"/> <parameter key="number_of_attributes" value="20"/> </operator> <operator name="KMeans" class="KMeans"> <parameter key="k" value="3"/> </operator> <operator name="Model_To_ExampleSet" class="OperatorChain" expanded="yes"> <operator name="ResultWriter" class="ResultWriter"> <parameter key="result_file" value="Clus.csv"/> </operator> <operator name="CSVExampleSource" class="CSVExampleSource"> <parameter key="filename" value="clus.csv"/> <parameter key="read_attribute_names" value="false"/> <parameter key="column_separators" value=";\s*"/> <parameter key="trim_lines" value="true"/> </operator> <operator name="ChangeAttributeNames2Generic" class="ChangeAttributeNames2Generic"> </operator> <operator name="ExampleFilter (1)" class="ExampleFilter" breakpoints="after"> <parameter key="condition_class" value="attribute_value_filter"/> <parameter key="parameter_string" value="att1=.*\t.*|Cluster \d"/> </operator> <operator name="Split (1)" class="Split"> <parameter key="attributes" value="att1"/> <parameter key="split_pattern" value=" "/> </operator> <operator name="NominalNumbers2Numerical (1)" class="NominalNumbers2Numerical"> </operator> <operator name="AttributeConstruction" class="AttributeConstruction"> <list key="function_descriptions"> <parameter key="mid" value="if(att1_2>1,1,att1_2)"/> </list> </operator> <operator name="CumulateSeries" class="CumulateSeries"> <parameter key="attribute_name" value="mid"/> <parameter key="keep_original_attribute" value="false"/> </operator> <operator name="ExampleFilter (2)" class="ExampleFilter"> <parameter key="condition_class" value="attribute_value_filter"/> <parameter key="parameter_string" value="att1_1=.*\t.*"/> </operator> <operator name="Split (2)" class="Split"> <parameter key="attributes" value="att1_1"/> <parameter key="split_pattern" value=":\t"/> </operator> <operator name="AttributeFilter" class="AttributeFilter"> <parameter key="condition_class" value="attribute_name_filter"/> <parameter key="parameter_string" value="att1_2"/> <parameter key="invert_filter" value="true"/> </operator> <operator name="NominalNumbers2Numerical (2)" class="NominalNumbers2Numerical"> </operator> <operator name="ChangeAttributeName (1)" class="ChangeAttributeName"> <parameter key="old_name" value="att1_1_2"/> <parameter key="new_name" value="Centroid"/> </operator> <operator name="ChangeAttributeName (2)" class="ChangeAttributeName"> <parameter key="old_name" value="cumulative(mid)"/> <parameter key="new_name" value="cluster_num"/> </operator> <operator name="Example2AttributePivoting" class="Example2AttributePivoting"> <parameter key="group_attribute" value="cluster_num"/> <parameter key="index_attribute" value="att1_1_1"/> <parameter key="consider_weights" value="false"/> </operator> </operator></operator>