🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Not normally distributed data

User: "jeroenheijlen"
New Altair Community Member
Updated by Jocelyn
Hi,
I'm trying to find a model to make a prediction for the execution time of a process step. I've data from over 200 different recurring process steps from the past 2 years (160.000 rows in excel sheet). When I plot the execution-time data per event, the data is not normally distributed but more like a Poisson distribution. Just loading the data in Rapidminer Studio and applying the models do not return a good fit. What can i do? (for data pre-processing in Python or R I would need a step-by-step guide because I'm pretty new in all of this)
Some help would really be appreciated!
Best regards
Jeroen  

Find more posts tagged with

Sort by:
1 - 5 of 51
    User: "lionelderkrikor"
    New Altair Community Member
    Hi @jeroenheijlen,

    Have you tried to submit your data to Auto-Model (the AutoML tool of RapidMiner) ?

    Regards,

    Lionel
    User: "jeroenheijlen"
    New Altair Community Member
    OP
    Updated by jeroenheijlen
    Hi @lionelderkrikor , thanks for your reply.
    Yes sure, I tried auto model but even when I already seriously reduced the variation in the inputdata, no model but do a good job for my data:

    User: "lionelderkrikor"
    New Altair Community Member
    Hi @jeroenheijlen,

    Maybe there are not relationships between your independent features and your label (your target).
    In this case, it is impossible to find a good model and machine learning is of no use...
    In the meantime, you can try to : 
     - enable feature selection / feature generation in the options of AutoModel
     - for your best models, you can tune hyper-parameters to try to increase the accuracy/decrease the error rate.

    Regards,

    Lionel
    User: "jeroenheijlen"
    New Altair Community Member
    OP
    Hi @lionelderkrikor
    I'm indeed afraid the variation within each of the process step is too large and therefor no model can find a correlation or prediction fit.
    Thanks for your advise.
    I will try a few more things (auto feature selection fails) such as starting with a smaller dataset (info of only a few of the process steps, remove more of the outliers, but still the data will never be normally distributed) and also once create the set like a binomial outcome (more than 2 hours, less than 2 hours, or so).

    If I ever will succeed, I will post the outcome ;-).
    Best regards
    Jeroen 
    User: "lionelderkrikor"
    New Altair Community Member
    You're welcome, @jeroenheijlen.

    Good luck ! 

    regards,

    Lionel