Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Not normally distributed data
jeroenheijlen
Hi,
I'm trying to find a model to make a prediction for the execution time of a process step. I've data from over 200 different recurring process steps from the past 2 years (160.000 rows in excel sheet). When I plot the execution-time data per event, the data is not normally distributed but more like a Poisson distribution. Just loading the data in Rapidminer Studio and applying the models do not return a good fit. What can i do? (for data pre-processing in Python or R I would need a step-by-step guide because I'm pretty new in all of this)
Some help would really be appreciated!
Best regards
Jeroen
Find more posts tagged with
AI Studio
Model Management
Accepted answers
All comments
lionelderkrikor
Hi
@jeroenheijlen
,
Have you tried to submit your data to Auto-Model (the AutoML tool of RapidMiner) ?
Regards,
Lionel
jeroenheijlen
Hi
@lionelderkrikor
, thanks for your reply.
Yes sure, I tried auto model but even when I already seriously reduced the variation in the inputdata, no model but do a good job for my data:
lionelderkrikor
Hi
@jeroenheijlen
,
Maybe there are not relationships between your independent features and your label (your target).
In this case, it is impossible to find a good model and machine learning is of no use...
In the meantime, you can try to :
- enable feature selection / feature generation in the options of AutoModel
- for your best models, you can tune hyper-parameters to try to increase the accuracy/decrease the error rate.
Regards,
Lionel
jeroenheijlen
Hi
@lionelderkrikor
,
I'm indeed afraid the variation within each of the process step is too large and therefor no model can find a correlation or prediction fit.
Thanks for your advise.
I will try a few more things (auto feature selection fails) such as starting with a smaller dataset (info of only a few of the process steps, remove more of the outliers, but still the data will never be normally distributed) and also once create the set like a binomial outcome (more than 2 hours, less than 2 hours, or so).
If I ever will succeed, I will post the outcome ;-).
Best regards
Jeroen
lionelderkrikor
You're welcome,
@jeroenheijlen
.
Good luck !
regards,
Lionel
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups