Is it worth it to apply TurboPrep?

tonyboy9
tonyboy9 New Altair Community Member
edited November 5 in Community Q&A
I finished reading RapidMiner "What you need to know about Data Preparation."
In the real world of work, it's good to know how screwed up data can present itself. When I import a data set in RapidMiner Studio, at the bottom right it says, "No problems." To me that says the data set is clean for learning purposes. Is it worth it to apply TurboPrep?

Best Answers

  • David_A
    David_A New Altair Community Member
    Answer ✓

    the words "no problem" can have a lot of meanings. In the case of importing your data, it simply states, that this particular task went well (no corrupted files, wrong data types, unreadable date formats and so on).

    At this stage, RapidMiner can't give you any feedback on the actual data quality or if it's worth to apply TurboPrep (big guess: probably it is). How to prepare and improve a data set always depends on the actual use case. In some cases you might want to keep the data as raw as possible (for education, to define a baseline for improvement, for compliance and so on). In other cases, you might spend 90% of your time on data preparation and understanding to solve a particular problem.

    To summarize:
    "No problems" during import is only a technical statement, that the import will run smoothly. Data preparation is an additional and independent step afterwards.

    Best,
    David
  • tonyboy9
    tonyboy9 New Altair Community Member
    Answer ✓
    Nicely phrased response indeed. Thank you for your time.

Answers

  • David_A
    David_A New Altair Community Member
    Answer ✓

    the words "no problem" can have a lot of meanings. In the case of importing your data, it simply states, that this particular task went well (no corrupted files, wrong data types, unreadable date formats and so on).

    At this stage, RapidMiner can't give you any feedback on the actual data quality or if it's worth to apply TurboPrep (big guess: probably it is). How to prepare and improve a data set always depends on the actual use case. In some cases you might want to keep the data as raw as possible (for education, to define a baseline for improvement, for compliance and so on). In other cases, you might spend 90% of your time on data preparation and understanding to solve a particular problem.

    To summarize:
    "No problems" during import is only a technical statement, that the import will run smoothly. Data preparation is an additional and independent step afterwards.

    Best,
    David
  • tonyboy9
    tonyboy9 New Altair Community Member
    Answer ✓
    Nicely phrased response indeed. Thank you for your time.