"Clustering and Normalization"
hgwelec
New Altair Community Member
Dear All,
I have a dataset which consists of 20 numeric variables.
I would like to apply z-score transformation to all variables : I use normalization node and all ok until here
The problem now is that i want to de-normalize values of all 20 fields to the original values so that cluster values make sense.
1) Is there a nore to do this for all 20 fields
2) If not can someone provide an example on how to do it for a single field only?
Thanks!
I have a dataset which consists of 20 numeric variables.
I would like to apply z-score transformation to all variables : I use normalization node and all ok until here
The problem now is that i want to de-normalize values of all 20 fields to the original values so that cluster values make sense.
1) Is there a nore to do this for all 20 fields
2) If not can someone provide an example on how to do it for a single field only?
Thanks!
Tagged:
0
Answers
-
Hello
The only hint I can give you is to use AttributeConstruction. Unfortunately you have to include the mean and stdev manually.
regards,
Steffen
0 -
Hi,
The nice thing about RM is that you can do things in many different ways...<operator name="Root" class="Process" expanded="yes">
Bit of a mess, because normalization seems to hit objects even if you store them away, but it does the job... I think.
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
<parameter key="number_of_attributes" value="20"/>
</operator>
<operator name="IdTagging" class="IdTagging">
</operator>
<operator name="CSVExampleSetWriter" class="CSVExampleSetWriter">
<parameter key="csv_file" value="bla"/>
</operator>
<operator name="Normalization" class="Normalization">
</operator>
<operator name="KMeans" class="KMeans">
</operator>
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="bla"/>
</operator>
<operator name="IdTagging (2)" class="IdTagging">
</operator>
<operator name="ExampleSetJoin" class="ExampleSetJoin">
<parameter key="remove_double_attributes" value="false"/>
</operator>
<operator name="FeatureNameFilter" class="FeatureNameFilter">
<parameter key="skip_features_with_name" value="att[0-9]*"/>
</operator>
<operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
<parameter key="replace_what" value="_from_ES2"/>
<parameter key="apply_on_special" value="false"/>
</operator>
</operator>
PS Can someone prod Ingo towards his PM box here, thanks.
0 -
Silly me :-\ if I tick "create view" on the normalization operator I don't need to write and read back the CSV, like this..
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
<parameter key="number_of_attributes" value="20"/>
</operator>
<operator name="IdTagging" class="IdTagging">
</operator>
<operator name="IOStorer" class="IOStorer">
<parameter key="name" value="original"/>
<parameter key="io_object" value="ExampleSet"/>
<parameter key="remove_from_process" value="false"/>
</operator>
<operator name="Normalization" class="Normalization">
<parameter key="return_preprocessing_model" value="true"/>
<parameter key="create_view" value="true"/>
</operator>
<operator name="KMeans" class="KMeans">
</operator>
<operator name="IORetriever" class="IORetriever">
<parameter key="name" value="original"/>
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="ExampleSetJoin" class="ExampleSetJoin">
<parameter key="remove_double_attributes" value="false"/>
</operator>
<operator name="FeatureNameFilter" class="FeatureNameFilter">
<parameter key="skip_features_with_name" value="att[0-9]*"/>
</operator>
<operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
<parameter key="replace_what" value="_from_ES2"/>
<parameter key="apply_on_special" value="false"/>
</operator>
</operator>0 -
Hello and Thanks for reply,
However i do not understand the example given : Where is the DE-normalization happening for every attribute?
Thanks again!
0 -
Haddock,
Really interesting method
The problem is though that the clustering output still does not show you the DE-normalized values such as in:
Cluster 0 :
attr1 : x
attr2 : y
attr3 : z
with x,y,z being DE-normalized
Perhaps a DE-normalize operator would be useful!?0 -
Hi,
The original problem was...
The method shows the original values, or do you not agree?The problem now is that i want to de-normalize values of all 20 fields to the original values
If by "DE-normalized" you mean something other than the "original" values then perhaps so, but that was not the question.The problem is though that the clustering output still does not show you the DE-normalized
In short, I disagree that a de-normalizer operator is necessary, because you can always just keep the originals!
0 -
Hi again Haddock,
First of all : ***Thanks for your help*** i do not mean to sound rude :-)
However the *full* quote was :The problem now is that i want to de-normalize values of all 20 fields to the original values so that cluster values make sense
Notice that the last part says : "so that cluster values make sense"
Unfortunately this is not the case with your solution. Again i do not want to appear rude i am just giving my opinion that perhaps an operator would prove helpful. Just trying to add my 2 cents...
Thanks!0 -
Hi,
I'm always amused by posts that start "i do not mean to sound rude".
Versions one and two of the code did the job. Did you run them? Version three was only put in to make things clearer for you. Something got flipped and the clusters got lost. So I'll edit version three out.
Maybe you'll want to edit your last post as well.
0 -
Great!.Now on with the problemI'm always amused by posts that start "i do not mean to sound rude".
No they didn't, they did the job the way you perceived it / Yes i did run all of themVersions one and two of the code did the job Did you run them?
So that means that there can be an output like the one i explained? To have the numbers in the cluster model prior the normalization? I sure would like to see how this is possible because this is actually what i wanted originally.Version three was only put in to make things clearer for you. Something got flipped and the clusters got lost. So I'll edit version three out.
Sure if you explain why should i, no problem!Maybe you'll want to edit your last post as well.
0 -
Excellent, in which case you can explain in what way the original values are not tied to the clusters.No they didn't, they did the job the way you perceived it / Yes i did run all of them
No, it means exactly what it says, I tried to clarify my solution by adding better titles to the operators, and things stopped working.So that means that there can be an output like the one i explained? To have the numbers in the cluster model prior the normalization? I sure would like to see how this is possible because this is actually what i wanted originally.
Because you are wrong. Do you disagree that if you normalise numbers and then de-normalise them you should end up with the numbers you started with? De-normalising can be effected just by keeping the originals, which is what my solution does, and I'm sorry you can't understand that.Quote
Maybe you'll want to edit your last post as well.
Sure if you explain why should i, no problem!
0 -
Haddock,
The point is that your solution does NOT output a ***Clustering Model window*** with de-normalized values! The sequence should be the following
1) Get unnormalized values
2) Normalize them
3) run clustering model using normalized values
4) show the CLUSTERING MODEL'S RESULTS DENORMALIZED. I do *not* want for every row it's associated de-normalized value!!
Your solution does not do step (4) , It writes each de-normalized values to a table! Do you understand the difference Haddock??
Please try to understand what is sought here..
From what i can tell (as steffen said) there is no way to do this automatically in RM. If someone else can help on this, please do so
Thanks!0 -
Please explain this term, and how we were meant to guess it from your original question, let me remind of what it actually was.....4) show the CLUSTERING MODEL'S RESULTS DENORMALIZED. I do *not* want for every row it's associated de-normalized value!!
A word of advice, when you can't see over the top of the hole you are digging, stop digging.The problem now is that i want to de-normalize values of all 20 fields to the original values so that cluster values make sense.
0 -
If I understand what hgwelec is asking for, he wants to be able to express the centroid values of each cluster in the scale of the original data.
He's not talking about having an ExampleSet that contains both the raw values and normalized values for each data point. He wants to describe the clusters in the data's natural scale. This would help, for example, in explaining the clusters are to other people, or even just to better interpret the model himself.
If my reading of the problem is correct, then the following discussion may be helpful...
You'd need to know the mean and standard deviation of each attribute in the original data to convert the normalized centroid values to original scale values (i.e. "denormalize"). While RM computes the sum and std dev as part of the meta data view of an ExampleSet, I'm not sure there's a way to get to those values. If you're reading data from a database, you might be able to have a second DatabaseExampleSource with a query that returns the mean and std dev for each attribute.
Once you have the mean and std dev, you need to get the centroid values into an example set. I haven't worked with clustering models, so I don't know how this would be done in RM. But once you have both the mean+stddev and the centroid values, you can probably use one of the Join operators to match up the clusters with their mean+stdev, and then use AttributeConstruction (as steffen mentioned in the first reply to this thread) to build the centroid values on the original data's scale.
Hopefully this doesn't add further confusion to the situation...
Keith
0 -
Nice one Keith,
Now that I do understand, and curiously he'll still need the original/raw data
I think this does the necessary.While RM computes the sum and std dev as part of the meta data view of an ExampleSet, I'm not sure there's a way to get to those values. <operator name="Root" class="Process" expanded="yes">
and this works out the average for each cluster - just added a change of role on the cluster and an OLAP operator to my original offering.
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
<parameter key="number_of_attributes" value="1"/>
</operator>
<operator name="MovingAverage" class="MovingAverage">
<parameter key="attribute_name" value="att1"/>
<parameter key="window_width" value="100"/>
<parameter key="result_position" value="start"/>
</operator>
<operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
<parameter key="replace_what" value="\(|\)"/>
</operator>
<operator name="ChangeAttributeName" class="ChangeAttributeName">
<parameter key="old_name" value="moving_averageatt1"/>
<parameter key="new_name" value="avg_att1"/>
</operator>
<operator name="MovingAverage (2)" class="MovingAverage">
<parameter key="attribute_name" value="att1"/>
<parameter key="window_width" value="100"/>
<parameter key="aggregation_function" value="standard_deviation"/>
<parameter key="result_position" value="start"/>
</operator>
<operator name="ChangeAttributeNamesReplace (2)" class="ChangeAttributeNamesReplace">
<parameter key="replace_what" value="\(|\)"/>
</operator>
<operator name="ChangeAttributeName (2)" class="ChangeAttributeName">
<parameter key="old_name" value="moving_averageatt1"/>
<parameter key="new_name" value="stddev_att1"/>
</operator>
<operator name="MissingValueReplenishment" class="MissingValueReplenishment">
<list key="columns">
</list>
</operator>
</operator><operator name="Root" class="Process" expanded="yes">
Thanks again for bringing clarity to the question, how we were meant to get that from the original question remains a mystery to me.
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
<parameter key="number_of_attributes" value="20"/>
</operator>
<operator name="IdTagging" class="IdTagging">
</operator>
<operator name="IOStorer" class="IOStorer">
<parameter key="name" value="original"/>
<parameter key="io_object" value="ExampleSet"/>
<parameter key="remove_from_process" value="false"/>
</operator>
<operator name="Normalization" class="Normalization">
<parameter key="return_preprocessing_model" value="true"/>
<parameter key="create_view" value="true"/>
</operator>
<operator name="KMeans" class="KMeans">
</operator>
<operator name="IORetriever" class="IORetriever">
<parameter key="name" value="original"/>
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="ExampleSetJoin" class="ExampleSetJoin">
<parameter key="remove_double_attributes" value="false"/>
</operator>
<operator name="FeatureNameFilter" class="FeatureNameFilter">
<parameter key="skip_features_with_name" value="att[0-9]*"/>
</operator>
<operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
<parameter key="replace_what" value="_from_ES2"/>
<parameter key="apply_on_special" value="false"/>
</operator>
<operator name="ChangeAttributeRole" class="ChangeAttributeRole">
<parameter key="name" value="cluster"/>
</operator>
<operator name="Aggregation" class="Aggregation">
<list key="aggregation_attributes">
<parameter key="att1" value="average"/>
<parameter key="att2" value="average"/>
<parameter key="att3" value="average"/>
<parameter key="att4" value="average"/>
<parameter key="att5" value="average"/>
<parameter key="att6" value="average"/>
<parameter key="att7" value="average"/>
<parameter key="att8" value="average"/>
<parameter key="att9" value="average"/>
<parameter key="att10" value="average"/>
<parameter key="att11" value="average"/>
<parameter key="att12" value="average"/>
<parameter key="att13" value="average"/>
<parameter key="att14" value="average"/>
<parameter key="att15" value="average"/>
<parameter key="att16" value="average"/>
<parameter key="att17" value="average"/>
<parameter key="att18" value="average"/>
<parameter key="att19" value="average"/>
<parameter key="att20" value="average"/>
</list>
<parameter key="group_by_attributes" value="cluster"/>
</operator>
</operator>
0 -
@keith,
This is what i am talking about and steffen understood what i meant right from my 1st post.
So by using attribute construction it can be done but imagine building new attributes for 60 input variables! so the question is whether some node can be used to calculate all this information for all -say- 60 attributes and i guess this cannot happen (?) as steffen originally said.
@haddock
It appears that you still don't get it but may be i am wrong...can you do the same example that you last posted for 60 input variables? How much time will it take you to do it? Let alone also having to do a log transformation to each of 60 variables to fix their skewed distributions...0 -
Hello
I'd like to see myself in such a glorious light, but sorry: I did understand it exactly as haddock did until keith made your point clear.hgwelec wrote:
@keith,
This is what i am talking about and steffen understood what i meant right from my 1st post.
@haddock:
I did not know the operator MovingAverage yet ... really nice. However, it seems the calculation of stdev is messed up, isn't it ?
@hgwelec:
The second process of haddock does exactly what you want. He was able to calculate the cluster centroids for the denormalized (ie. not normalized) values and hence the denormalized cluster centers (this is only correct if the cluster centroids of the cluster operator are calculated as mean .. which is correct for KMeans). The issue of scalability remains, but: Either you add an entry for each attribute in the aggregation operator manually OR you use a loop .... in JAVA, which means hacking an operator yourself. I do not see another option.
Again we have faced an example of the law of leaky abstraction ...
kind regards,
Steffen
PS: the process of haddock is ok, but I did not check the calculation of the values by an example (just to be sure) .. my head is a little fuzzy today...
0 -
Greets Steff!
Needs checking - but if you think so, that'll do for me. You'll probably understand if I say that my interest in this thread has waned somewhat ;DI did not know the operator MovingAverage yet ... really nice. However, it seems the calculation of stdev is messed up, isn't it ?
Reminds me of an old Oxford philosophy exam story.....
Is this a question?
Yes, if this is an answer.
0 -
Ah, clever. Using the moving average to create a window that spans the entire dataset, and calculating the mean/stdev. Wouldn't have thought to approach it that way.haddock wrote:
Nice one Keith,
Now that I do understand, and curiously he'll still need the original/raw data
I think this does the necessary.
<code deleted>
Also a smarter approach to the problem than I would have thought of. I was fixated on trying to access the centroid values and convert them back to the original, non-normalized scale. Instead, you're labelling all the original data rows with the cluster, and calculating the means directly. Clever...
and this works out the average for each cluster - just added a change of role on the cluster and an OLAP operator to my original offering.
<code deleted>
If it was possible to access the centroid values directly and apply the mean/stdev calculations from your first code sample, that would probably be a more scalable solution than joining the data to itself and computing the sum/stdev across the entire data set (depends on how many rows he's dealing with). It would also (I think) handle the case where the cluster centers are calculated by something other than mean (as steffen alludes to). But what you presented certainly solves the problem as presented. Thanks, I learned something today.
That's what great about having a forum where you get many eyeballs looking at a question. For example, to me, when I read:
Thanks again for bringing clarity to the question, how we were meant to get that from the original question remains a mystery to me.
... it was pretty quickly apparent that, even if he didn't have the terminology quite right, he was talking about data that describe the clusters ("cluster values" a.k.a. centroids), and meant "original scale" rather than "original values". But I never would have come up with the solution haddock did.
The problem now is that i want to de-normalize values of all 20 fields to the original values so that cluster values make sense.
Despite the frustrations expressed on this thread, this forum is still a friendlier place for earnest newbies (which I was not that long ago) to learn RapidMiner than the R-help list is for R, and is one of the many things I think is great about RM.
Keith
0 -
Hi Keith!
Both you and Steffen come out of this episode as very solid citizens who deserve the respect you get, so many thanks to you both on behalf of all Rapido heads.
I've learnt from two sources, Ralf's most excellent course, and trying to answer the puzzles set right here, so absolutely spot on, my friend, spot on.Despite the frustrations expressed on this thread, this forum is still a friendlier place for earnest newbies (which I was not that long ago) to learn RapidMiner than the R-help list is for R, and is one of the many things I think is great about RM.
0 -
Hi,
So by using attribute construction it can be done but imagine building new attributes for 60 input variables! so the question is whether some node can be used to calculate all this information for all -say- 60 attributes <operator name="Root" class="Process" expanded="yes">
The code is nothing but the haddock's Aggregation operator being replaced by a set of operators in the end.... Also, as pointed out the same approach of finding the average cannot be taken, say if you are dealing with KMedoids....
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
<parameter key="number_of_attributes" value="20"/>
</operator>
<operator name="IdTagging" class="IdTagging">
</operator>
<operator name="IOStorer" class="IOStorer">
<parameter key="name" value="original"/>
<parameter key="io_object" value="ExampleSet"/>
<parameter key="remove_from_process" value="false"/>
</operator>
<operator name="Normalization" class="Normalization">
<parameter key="return_preprocessing_model" value="true"/>
<parameter key="create_view" value="true"/>
</operator>
<operator name="KMeans" class="KMeans">
</operator>
<operator name="IORetriever" class="IORetriever">
<parameter key="name" value="original"/>
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="ExampleSetJoin" class="ExampleSetJoin">
<parameter key="remove_double_attributes" value="false"/>
</operator>
<operator name="FeatureNameFilter" class="FeatureNameFilter">
<parameter key="skip_features_with_name" value="att[0-9]*"/>
</operator>
<operator name="ChangeAttributeNamesReplace" class="ChangeAttributeNamesReplace">
<parameter key="replace_what" value="_from_ES2"/>
<parameter key="apply_on_special" value="false"/>
</operator>
<operator name="ChangeAttributeRole" class="ChangeAttributeRole">
<parameter key="name" value="cluster"/>
</operator>
<operator name="ValueIterator" class="ValueIterator" expanded="yes">
<parameter key="attribute" value="cluster"/>
<operator name="ExampleFilter" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="cluster=%{loop_value}"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="att.*"/>
<parameter key="apply_on_special" value="true"/>
</operator>
<operator name="ExampleSetTranspose" class="ExampleSetTranspose">
</operator>
<operator name="AttributeAggregation" class="AttributeAggregation">
<parameter key="attribute_name" value="Centroid_%{loop_value}"/>
<parameter key="aggregation_attributes" value="att_.*"/>
<parameter key="aggregation_function" value="average"/>
<parameter key="keep_all" value="false"/>
</operator>
</operator>
<operator name="ExampleSetJoin (2)" class="ExampleSetJoin">
<parameter key="remove_double_attributes" value="false"/>
</operator>
<operator name="ExampleSetTranspose (2)" class="ExampleSetTranspose">
</operator>
</operator>
0 -
--- by KeithIf it was possible to access the centroid values directly and apply the mean/stdev calculations from your first code sample, that would probably be a more scalable solution than joining the data to itself and computing the sum/stdev across the entire data set (depends on how many rows he's dealing with). It would also (I think) handle the case where the cluster centers are calculated by something other than mean (as steffen alludes to).
The below is a tricky (infact, a very tricky) way of extracting the centroid values directly from the model.<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
<parameter key="number_of_attributes" value="20"/>
</operator>
<operator name="KMeans" class="KMeans">
<parameter key="k" value="3"/>
</operator>
<operator name="Model_To_ExampleSet" class="OperatorChain" expanded="yes">
<operator name="ResultWriter" class="ResultWriter">
<parameter key="result_file" value="Z:\Clus.csv"/>
</operator>
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="Z:\clus.csv"/>
<parameter key="read_attribute_names" value="false"/>
<parameter key="column_separators" value=";\s*"/>
<parameter key="trim_lines" value="true"/>
</operator>
<operator name="ChangeAttributeNames2Generic" class="ChangeAttributeNames2Generic">
</operator>
<operator name="ExampleFilter (1)" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="att1=.*\t.*|Cluster \d"/>
</operator>
<operator name="Split (1)" class="Split">
<parameter key="attributes" value="att1"/>
<parameter key="split_pattern" value=" "/>
</operator>
<operator name="NominalNumbers2Numerical (1)" class="NominalNumbers2Numerical">
</operator>
<operator name="AttributeConstruction" class="AttributeConstruction">
<list key="function_descriptions">
<parameter key="mid" value="if(att1_2>1,1,att1_2)"/>
</list>
</operator>
<operator name="CumulateSeries" class="CumulateSeries">
<parameter key="attribute_name" value="mid"/>
<parameter key="keep_original_attribute" value="false"/>
</operator>
<operator name="ExampleFilter (2)" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="att1_1=.*\t.*"/>
</operator>
<operator name="Split (2)" class="Split">
<parameter key="attributes" value="att1_1"/>
<parameter key="split_pattern" value=":\t"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="att1_2"/>
<parameter key="invert_filter" value="true"/>
</operator>
<operator name="NominalNumbers2Numerical (2)" class="NominalNumbers2Numerical">
</operator>
<operator name="ChangeAttributeName (1)" class="ChangeAttributeName">
<parameter key="old_name" value="att1_1_2"/>
<parameter key="new_name" value="Centroid"/>
</operator>
<operator name="ChangeAttributeName (2)" class="ChangeAttributeName">
<parameter key="old_name" value="cumulative(mid)"/>
<parameter key="new_name" value="cluster_num"/>
</operator>
<operator name="Example2AttributePivoting" class="Example2AttributePivoting">
<parameter key="group_attribute" value="cluster_num"/>
<parameter key="index_attribute" value="att1_1_1"/>
<parameter key="consider_weights" value="false"/>
</operator>
</operator>
</operator>
A Note:
1. This method can be applied even for KMedoids....I meant to say, this also eludes the issue of "What if the cluster centers are not the mean?".
2. The centroid values are acurate for three decimal places, because the centroid values are read as it is from the "Text View" of the model. If the "Text view" gave, say five digits after the decimal point, then the same would be the result in the exampleset produced.
Best,
Shubha Karanth0 -
Hi Shubha,
I think there is a problem with your first example, because it only covers the case where there are two clusters, and with the second there is no data by the time of the first split, so I'm not sure why it is here at all . Bemused readers should run to the break, like this ( I've just removed the drive letter and put in a break )...<operator name="Root" class="Process" expanded="yes">
Perhaps you could explain what I've missed ? ;D
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
<parameter key="number_of_attributes" value="20"/>
</operator>
<operator name="KMeans" class="KMeans">
<parameter key="k" value="3"/>
</operator>
<operator name="Model_To_ExampleSet" class="OperatorChain" expanded="yes">
<operator name="ResultWriter" class="ResultWriter">
<parameter key="result_file" value="Clus.csv"/>
</operator>
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="clus.csv"/>
<parameter key="read_attribute_names" value="false"/>
<parameter key="column_separators" value=";\s*"/>
<parameter key="trim_lines" value="true"/>
</operator>
<operator name="ChangeAttributeNames2Generic" class="ChangeAttributeNames2Generic">
</operator>
<operator name="ExampleFilter (1)" class="ExampleFilter" breakpoints="after">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="att1=.*\t.*|Cluster \d"/>
</operator>
<operator name="Split (1)" class="Split">
<parameter key="attributes" value="att1"/>
<parameter key="split_pattern" value=" "/>
</operator>
<operator name="NominalNumbers2Numerical (1)" class="NominalNumbers2Numerical">
</operator>
<operator name="AttributeConstruction" class="AttributeConstruction">
<list key="function_descriptions">
<parameter key="mid" value="if(att1_2>1,1,att1_2)"/>
</list>
</operator>
<operator name="CumulateSeries" class="CumulateSeries">
<parameter key="attribute_name" value="mid"/>
<parameter key="keep_original_attribute" value="false"/>
</operator>
<operator name="ExampleFilter (2)" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="att1_1=.*\t.*"/>
</operator>
<operator name="Split (2)" class="Split">
<parameter key="attributes" value="att1_1"/>
<parameter key="split_pattern" value=":\t"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="att1_2"/>
<parameter key="invert_filter" value="true"/>
</operator>
<operator name="NominalNumbers2Numerical (2)" class="NominalNumbers2Numerical">
</operator>
<operator name="ChangeAttributeName (1)" class="ChangeAttributeName">
<parameter key="old_name" value="att1_1_2"/>
<parameter key="new_name" value="Centroid"/>
</operator>
<operator name="ChangeAttributeName (2)" class="ChangeAttributeName">
<parameter key="old_name" value="cumulative(mid)"/>
<parameter key="new_name" value="cluster_num"/>
</operator>
<operator name="Example2AttributePivoting" class="Example2AttributePivoting">
<parameter key="group_attribute" value="cluster_num"/>
<parameter key="index_attribute" value="att1_1_1"/>
<parameter key="consider_weights" value="false"/>
</operator>
</operator>
</operator>
Good weekend!0 -
It appears that the way that i described my problem was not the right one.
I have seen other users express that my terminology was not correct i have no reason to think otherwise and for that i have to agree. It wasn't.
But since the essence of discussions in this forum is to both solve our problems *and* to draw some insights as to how RM can become better, i feel that even though a JAVA code could be a solution (when the dataset contains MANY attributes) for users that do no have the necessary programing skills the problem cannot be easily fixed.
Since normalization prior any clustering process is usually required, perhaps a De-Normalize node would prove to be very useful. .
Many Thanks!0 -
And I still disagree!0