Design a model to do data cleaning
JoeJoe
New Altair Community Member
I have a big data set with over 100 thousands instances, so can someone offer a model to help me do the data cleaning? Thanks!
Tagged:
0
Best Answers
-
Hi @JoeJoe,
Have you access to Turbo Prep inside RapidMiner ?
If Yes, you can go to CLEANSE --> AUTO CLEANSING..
Hope this helps,
Regards,
Lionel1 -
Hi,Probably none of both settings would be best. However, for association rules you would need binary input data so you should first clean the data (without those two settings) and then discretize all numerical into binary bins. Finally, you may need to perform one-hot encoding for nominals with more than two values. Cut-off points for discretization or which value is positive vs. negative will depend on your biz problem you want to solve.Best,Ingo2
Answers
-
Hi @JoeJoe,
Have you access to Turbo Prep inside RapidMiner ?
If Yes, you can go to CLEANSE --> AUTO CLEANSING..
Hope this helps,
Regards,
Lionel1 -
Thx, I'm gonna try it!0
-
Thx!And I notice that the auto cleansing have two options: PCA and normination. Which one should I choose if I want to design a template for association rules?0
-
Hi,Probably none of both settings would be best. However, for association rules you would need binary input data so you should first clean the data (without those two settings) and then discretize all numerical into binary bins. Finally, you may need to perform one-hot encoding for nominals with more than two values. Cut-off points for discretization or which value is positive vs. negative will depend on your biz problem you want to solve.Best,Ingo2