"Charts with more than 5000 rows"

Lior17
Lior17 New Altair Community Member
edited November 5 in Community Q&A
Hi there,

I've a data set of roughly 25k rows.
I'm trying to create a chart for it but I get a message in the studio:
"Data set was sampled down to 5000 rows to accelerate chart creation"

Now the facts:
In my license (an Educational one) I have:
 Unlimited data rows
 Unlimited logical processor(s)
I found the following configuration in the studio preferences:
User Interface -> Maximum number of rows in charts
I changed it from 5000 to 50000 and it didn't help, I tried restarting as well, still nothing.

Is this a bug? or am I unaware of something?

Thanks,

Lior.

Answers

  • David_A
    David_A New Altair Community Member
    edited January 2019
    Hi,

    no, in this case it's definitely not a bug, but a feature :wink:

    Under Settings -> User Interface -> Charts  you can find an option "Maximum number of rows in charts", which is set by default to 5k.
    But be aware that 25k rows, depending on your data and machine, it might take a while to render the plots.

    The good news is, with the upcoming version, there will be an updated charting engine, which can better large plots.

    Best,
    David
  • Lior17
    Lior17 New Altair Community Member
    Hi David,

    Thank you for you reply, but as I stated in my question setting this value 50k hasn't helped, even after reloading the data to turboprep & restarting the entire studio.

    Best,

    Lior.
  • IngoRM
    IngoRM New Altair Community Member
    Hi Lior,
    Thanks for pointing this out.  Turbo Prep indeed uses a hard limit of 5,000 data points for the charts and takes a random sample if there are more.  It unfortunately does not use the setting mentioned by others.  This was done to ensure a smooth user experience while you are doing some basic explorations during the data prep phase.  In the long run, the charts in Turbo Prep will be replaced by the new visualization framework David mentioned above.  But for the time being, we decided to offer the old charts on a data sample to give TP users at least some charting options.
    There is a workaround though which allows you to see all the data (up to the limit defined in the setting discussed here).  It is a bit cumbersome, but may be helpful in your case.  You can "export" the data from TP and save it in one of your repositories.  A simple double click on the saved data in the repository panel will bring up the data in the results view where you will get the full charting experience.
    Sorry for this inconvenience.  Best,
    Ingo
  • jithinpaul89
    jithinpaul89 New Altair Community Member
    Hello Lior,
    You can edit the Visualizations row limit modifier to appropriate value like 5.0 to get it working. 
  • jacobcybulski
    jacobcybulski New Altair Community Member
    The simplest solution is if course to downsample your data to 5,000 before plotting. In fact, when using the legacy charts which had no limit on the amount of data of plotting, RapidMiner would often freeze for long periods of time, or Java would crash.