Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
"Bug in MinimalEntropyParitioning?"
Legacy User
Hello everybody,
I get strange results when I apply MinimumEntropyPartitioning on some datasets and wonder whether this is due to a bug in the implementation.
Let me illustrate the problem: I have a dataset with one attribute ("X") and one label with two possible values.
There are 6 possible values for X, 1 to 6. In total, I have 1116 rows, with the following target label distributions:
X-value #negatives #positives #rows
1.0 124 62 186
2.0 124 62 186
3.0 0 186 186
4.0 0 186 186
5.0 124 62 186
6.0 124 62 186
Now of course I would expect a discretization into [-infty,2], ]2,4], ]4,infty] with 372. Instead, I get:
range1 [-∞ - 2] (372), range2 [2 - 5] (558), range3 [5 - ∞] (186)
It seems like there is a bug in the operator that does not correctly distinguish open and closed interval limits.
Does anybody know of a solution or a workaround?
Best,
Henrik
Find more posts tagged with
AI Studio
Bug Report
Accepted answers
All comments
land
Hi Henrik,
this seems to be a problem indeed. Perhabs you could add a tiny litte noise on your values. Resolving the not uniquenes causing your problem.
But to solve it in general I will take a look at the code.
Greetings,
Sebastian
Legacy User
Hi Sebastian,
thanks for the reply, I also thought that the problem could be diminished if I had more continuous values. But of course if would be best if you could fix the problem in general.
Best,
Henrik
Legacy User
Hi,
in the meantime I found the bug and fixed it. The bug is in the function
private Double getMinEntropySplitpoint(LinkedList<double[]> truncatedExamples, Attribute label) {
in the class MinimalEntropyDiscretization. It does not consider the case where a split results in 0 examples of one class. Here is the fix:
// Calculate entropies.
double entropy1 = 0.0d;
for (int i = 0; i < label.getMapping().size(); i++) {
entropy1 -= frequencies1
* MathFunctions.ld(frequencies1
);
}
double entropy2 = 0.0d;
for (int i = 0; i < label.getMapping().size(); i++) {
entropy2 -= frequencies2
* MathFunctions.ld(frequencies2
);
}
Best,
Henrik
IngoRM
Hi Henrik,
thanks for sending this in! We will check and integrate your suggestion as soon as possible.
Cheers,
Ingo
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups