"[SOLVED] Bug: Distinct Values in Advanced Charts"
Q-Dog
New Altair Community Member
Hello,
I think there might be a bug in the new Advanced Charts, more precisely in "Grouping: Distinct Values".
Lets assume I have a dataset of 1000 examples and I want to create a histogram of a certain attribute a1:
- I drag a1 to the "domain" and the "range" dimension
- I select "grouping: distinct values" in the domain dimension
- I select "aggregation: count" in the range dimension
When I now sum up all the count values for the attribute, I get a sum which is by far less than 1000.
Is this a bug, or did I misunderstand "grouping distinct values" ?
If you want, I can either post a process or pictures showing this (or both of course).
Cheers Q-Dog
I think there might be a bug in the new Advanced Charts, more precisely in "Grouping: Distinct Values".
Lets assume I have a dataset of 1000 examples and I want to create a histogram of a certain attribute a1:
- I drag a1 to the "domain" and the "range" dimension
- I select "grouping: distinct values" in the domain dimension
- I select "aggregation: count" in the range dimension
When I now sum up all the count values for the attribute, I get a sum which is by far less than 1000.
Is this a bug, or did I misunderstand "grouping distinct values" ?
If you want, I can either post a process or pictures showing this (or both of course).
Cheers Q-Dog
0
Answers
-
Hi,
does a1 contain missing values? Those are not counted, and thus it is possible that the total count is less than the number of examples.
If you don't have missings, we would be very interested in your process and the data, such that we can reproduce the problem.
Best regards,
Marius0 -
Hi Marius,
no a1 does not contain any missing values. Is it somehow possible to attach the ExampleSet so that you can view it directly in RapidMiner (without importing the logfile first) ?
Will the ".ioo" file do the job?
Anyway, here is a screenshot of my problem:
The example set has 17639 examples, but the plot has by far less.
The values in the x-axis are 0-163. If you assume that each value on the y-axis is 100 (which clearly is not the case), you will end up with 164*100 = 16400 < 17639.
Cheers Q-Dog
// Edit
I just checked, e.g. 0 appears 177 times in my example set, but in the plot, the count of 0 is only 450 -
Hm, the plotters reduce the number of data points by sampling because otherwise drawing an example set with a large number of datapoints would be very slow. However, when using aggregation and grouping, it *should* not sample. Anyways, can you please try to increase the property rapidminer.gui.plotter.rows.maximum in the Gui tab of Tools->Properties in RapidMiner to a value greater than 17000?
Best,
Marius0 -
-
Now we also fixed it in the code: if any of the grouping functions is set for a Plot, no sampling is applied for that Plot. It didn't make it into yesterday's release, though.
Best, Marius0