NaN problems with MinMaxNormalization and precision measure

Username
Username New Altair Community Member
edited November 5 in Community Q&A
Hi,

I noticed two bugs (?) in the MinMaxNormalization and WeightedMultiClassPerformance classes.

MinMaxNormalization:
If an attribute has always the same value, they are normalized to NaN. Is this normalization behaviour really intended? This can result in strange results from Learning operators since some of them don't handle unkown values well (LibSVM). Here's my proposed fix:
### Eclipse Workspace Patch 1.0
#P yale
Index: src/com/rapidminer/operator/preprocessing/normalization/MinMaxNormalizationModel.java
===================================================================
RCS file: /cvsroot/yale/yale/src/com/rapidminer/operator/preprocessing/normalization/MinMaxNormalizationModel.java,v
retrieving revision 1.11
diff -u -r1.11 MinMaxNormalizationModel.java
--- src/com/rapidminer/operator/preprocessing/normalization/MinMaxNormalizationModel.java 14 Jan 2009 13:45:34 -0000 1.11
+++ src/com/rapidminer/operator/preprocessing/normalization/MinMaxNormalizationModel.java 12 Mar 2009 10:56:13 -0000

double value = example.getValue(attribute);
double minA = range.getFirst().doubleValue();
double maxA = range.getSecond().doubleValue();
- example.setValue(attribute, (value - minA) / (maxA - minA) * (max - min) + min);
+ if (maxA == minA || min == max) {
+ example.setValue(attribute, Math.min(Math.max(minA, min), max));
+ } else {
+ example.setValue(attribute, (value - minA) / (maxA - minA) * (max - min) + min);
+ }
}
}
}

WeightedMultiClassPerformance:
The average precision is NaN if there is a class that is never predicted by a model. The reason is that the precision for this class is NaN. Here's another possible fix:
### Eclipse Workspace Patch 1.0
#P yale
Index: src/com/rapidminer/operator/performance/WeightedMultiClassPerformance.java
===================================================================
RCS file: /cvsroot/yale/yale/src/com/rapidminer/operator/performance/WeightedMultiClassPerformance.java,v
retrieving revision 1.6
diff -u -r1.6 WeightedMultiClassPerformance.java
--- src/com/rapidminer/operator/performance/WeightedMultiClassPerformance.java 9 May 2008 19:22:43 -0000 1.6
+++ src/com/rapidminer/operator/performance/WeightedMultiClassPerformance.java 12 Mar 2009 11:02:28 -0000

                 }
                 result = 0.0d;
                 for (int r = 0; r < rowSums.length; r++) {
-                    result += classWeights * (counter / rowSums);
+                double p = counter / rowSums;
+                    result += classWeights * (Double.isNaN(p)? 0 : p) ;
                 }
                 result /= weightSum;
                 return result;

Answers

  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    thanks for sending in those fixes. Both seemed very reasonable to me and we just have incorporated them into the latest CVS developer branch. They will of course also be part of the upcoming new release.

    Thanks again and cheers,
    Ingo