Not normally distributed data

jeroenheijlen
jeroenheijlen New Altair Community Member
edited November 2024 in Community Q&A
Hi,
I'm trying to find a model to make a prediction for the execution time of a process step. I've data from over 200 different recurring process steps from the past 2 years (160.000 rows in excel sheet). When I plot the execution-time data per event, the data is not normally distributed but more like a Poisson distribution. Just loading the data in Rapidminer Studio and applying the models do not return a good fit. What can i do? (for data pre-processing in Python or R I would need a step-by-step guide because I'm pretty new in all of this)
Some help would really be appreciated!
Best regards
Jeroen  

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Hi @jeroenheijlen,

    Have you tried to submit your data to Auto-Model (the AutoML tool of RapidMiner) ?

    Regards,

    Lionel
  • jeroenheijlen
    jeroenheijlen New Altair Community Member
    edited May 2020
    Hi @lionelderkrikor , thanks for your reply.
    Yes sure, I tried auto model but even when I already seriously reduced the variation in the inputdata, no model but do a good job for my data:

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Hi @jeroenheijlen,

    Maybe there are not relationships between your independent features and your label (your target).
    In this case, it is impossible to find a good model and machine learning is of no use...
    In the meantime, you can try to : 
     - enable feature selection / feature generation in the options of AutoModel
     - for your best models, you can tune hyper-parameters to try to increase the accuracy/decrease the error rate.

    Regards,

    Lionel
  • jeroenheijlen
    jeroenheijlen New Altair Community Member
    Hi @lionelderkrikor
    I'm indeed afraid the variation within each of the process step is too large and therefor no model can find a correlation or prediction fit.
    Thanks for your advise.
    I will try a few more things (auto feature selection fails) such as starting with a smaller dataset (info of only a few of the process steps, remove more of the outliers, but still the data will never be normally distributed) and also once create the set like a binomial outcome (more than 2 hours, less than 2 hours, or so).

    If I ever will succeed, I will post the outcome ;-).
    Best regards
    Jeroen 
  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    You're welcome, @jeroenheijlen.

    Good luck ! 

    regards,

    Lionel

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.