Visualizing Data Set with tons of small values.

gerby
gerby New Altair Community Member
edited November 5 in Community Q&A
Hi all, I'm new to rapidminer and just started to use it today. I have an extremely large dataset that I am trying to visualize. However most of the attributes contains lots of zeroes in them, as a result, when I visualize using a histogram, it ends up looking like this:
If I turn on the logarithmic scale in the y-axis it still looks pretty weird to me.

so my question is, are there anything to make the data look better? I thought of removing outliers, but due to the large amounts of smaller values, the outliers end up being most of the data that have higher values. Tried splitting zeroes and non zeroes but since most of them are small data and not just zeroes, it end up looking pretty much the same. Thanks in advance!

Best Answer

  • Roland Jones_21245
    Roland Jones_21245
    Altair Employee
    Answer ✓
    Hi @gerby,

    Unfortunately, this is the sort of answer in that it really depends what your end goal for visualizing the data is. If it's just to produce a visual overview for the data, then I think the logarithmic scale does a reasonable job of this. You'll see each line corresponds to a power of 10, so moving one tick mark up corresponds to a 10x increase. I'd also potentially filter out values above 10k to produce more granularity. What do you think?

    Best,

    Roland 

Answers

  • Roland Jones_21245
    Roland Jones_21245
    Altair Employee
    Answer ✓
    Hi @gerby,

    Unfortunately, this is the sort of answer in that it really depends what your end goal for visualizing the data is. If it's just to produce a visual overview for the data, then I think the logarithmic scale does a reasonable job of this. You'll see each line corresponds to a power of 10, so moving one tick mark up corresponds to a 10x increase. I'd also potentially filter out values above 10k to produce more granularity. What do you think?

    Best,

    Roland 
  • gerby
    gerby New Altair Community Member
    I see, thanks for the input. I think filtering extreme values seems to be my best bet. Thanks once again!
  • nataliarelish
    nataliarelish New Altair Community Member
    Thank you for sharing valuable insights.