Visualizing Data Set with tons of small values.
gerby
New Altair Community Member
Hi all, I'm new to rapidminer and just started to use it today. I have an extremely large dataset that I am trying to visualize. However most of the attributes contains lots of zeroes in them, as a result, when I visualize using a histogram, it ends up looking like this:
If I turn on the logarithmic scale in the y-axis it still looks pretty weird to me.
so my question is, are there anything to make the data look better? I thought of removing outliers, but due to the large amounts of smaller values, the outliers end up being most of the data that have higher values. Tried splitting zeroes and non zeroes but since most of them are small data and not just zeroes, it end up looking pretty much the same. Thanks in advance!
If I turn on the logarithmic scale in the y-axis it still looks pretty weird to me.
so my question is, are there anything to make the data look better? I thought of removing outliers, but due to the large amounts of smaller values, the outliers end up being most of the data that have higher values. Tried splitting zeroes and non zeroes but since most of them are small data and not just zeroes, it end up looking pretty much the same. Thanks in advance!
Tagged:
0
Best Answer
-
Hi @gerby,
Unfortunately, this is the sort of answer in that it really depends what your end goal for visualizing the data is. If it's just to produce a visual overview for the data, then I think the logarithmic scale does a reasonable job of this. You'll see each line corresponds to a power of 10, so moving one tick mark up corresponds to a 10x increase. I'd also potentially filter out values above 10k to produce more granularity. What do you think?
Best,
Roland-1
Answers
-
Hi @gerby,
Unfortunately, this is the sort of answer in that it really depends what your end goal for visualizing the data is. If it's just to produce a visual overview for the data, then I think the logarithmic scale does a reasonable job of this. You'll see each line corresponds to a power of 10, so moving one tick mark up corresponds to a 10x increase. I'd also potentially filter out values above 10k to produce more granularity. What do you think?
Best,
Roland-1 -
I see, thanks for the input. I think filtering extreme values seems to be my best bet. Thanks once again!0
-
Thank you for sharing valuable insights.0