Finding an incorrect grading pattern

marketa_vackova
marketa_vackova New Altair Community Member
edited November 5 in Community Q&A

I was given a labelled data set and I was told few of the labels are wrongly assigned, i.e. some of the data were graded inaccurately. I'm supposed to find which ones. Which tool in RapidMiner should I use?

I tried the operator Find Outliers (Density), but somehow I feel that is not the one I'm looking for.

Thank you very much for advice. Markéta

Answers

  • IngoRM
    IngoRM New Altair Community Member

    Here is an idea: you could train a model on the data set which is generalizing well (no overfitting, no k-nn with 1 neighbor only, you get the idea...) and then apply this model to the training data set again.  Whenever the prediction differs from the label, this could be a good candidate for wrongly labeled.

     

    Just my 2c,

    Ingo

  • Telcontar120
    Telcontar120 New Altair Community Member

    Another potenial approach would be to run a clustering analysis on the labeled classes separately and then look for individual outliers that way.