Design a model to do data cleaning

JoeJoe
JoeJoe New Altair Community Member
edited November 2024 in Community Q&A
I have a big data set with over 100 thousands instances, so can someone offer a model to help me do the data cleaning? Thanks!
Tagged:

Best Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓
    Hi @JoeJoe,

    Have you access to Turbo Prep inside RapidMiner ?

    If Yes, you can go to CLEANSE --> AUTO CLEANSING..

    Hope this helps,

    Regards,

    Lionel
  • IngoRM
    IngoRM New Altair Community Member
    Answer ✓
    Hi,
    Probably none of both settings would be best.  However, for association rules you would need binary input data so you should first clean the data (without those two settings) and then discretize all numerical into binary bins.  Finally, you may need to perform one-hot encoding for nominals with more than two values.  Cut-off points for discretization or which value is positive vs. negative will depend on your biz problem you want to solve.
    Best,
    Ingo

Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓
    Hi @JoeJoe,

    Have you access to Turbo Prep inside RapidMiner ?

    If Yes, you can go to CLEANSE --> AUTO CLEANSING..

    Hope this helps,

    Regards,

    Lionel
  • JoeJoe
    JoeJoe New Altair Community Member
    Thx, I'm gonna try it!
  • JoeJoe
    JoeJoe New Altair Community Member
    Thx!And I notice that the auto cleansing have two options: PCA and normination. Which one should I choose if I want to design a template for association rules?
  • IngoRM
    IngoRM New Altair Community Member
    Answer ✓
    Hi,
    Probably none of both settings would be best.  However, for association rules you would need binary input data so you should first clean the data (without those two settings) and then discretize all numerical into binary bins.  Finally, you may need to perform one-hot encoding for nominals with more than two values.  Cut-off points for discretization or which value is positive vs. negative will depend on your biz problem you want to solve.
    Best,
    Ingo