Finding an incorrect grading pattern

marketa_vackova
marketa_vackova New Altair Community Member
edited November 2024 in Community Q&A

I was given a labelled data set and I was told few of the labels are wrongly assigned, i.e. some of the data were graded inaccurately. I'm supposed to find which ones. Which tool in RapidMiner should I use?

I tried the operator Find Outliers (Density), but somehow I feel that is not the one I'm looking for.

Thank you very much for advice. Markéta

Answers

  • IngoRM
    IngoRM New Altair Community Member

    Here is an idea: you could train a model on the data set which is generalizing well (no overfitting, no k-nn with 1 neighbor only, you get the idea...) and then apply this model to the training data set again.  Whenever the prediction differs from the label, this could be a good candidate for wrongly labeled.

     

    Just my 2c,

    Ingo

  • Telcontar120
    Telcontar120 New Altair Community Member

    Another potenial approach would be to run a clustering analysis on the labeled classes separately and then look for individual outliers that way.  

     

     

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.