"discarding attributes with many missing values"

dan_agape
dan_agape New Altair Community Member
edited November 2024 in Community Q&A

Hi there

Just enquiring if there is a pre-processing operator that discards attributes having more missing values than a specified threshold (given as a percentage for instance).

Thanks!
Dan

Answers

  • land
    land New Altair Community Member
    Hi Dan,
    I think you can use the Remove Useless Attributes if the missing values exceed the number of same nominal values.

    Anyway you could post a feature request on our bugtracker, since I think a dedicated "less than x% missing values" filter makes absolutely sense.

    Greetings,
      Sebastian
  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    you are right, such an operator would be nice. I have uploaded a process with our new Community Extension which performs exactly the desired task. It is called "Discard Attribute with More than x% Missing Values (Loops + Macros)" and you can download and execute the process with a few clicks after having installed our new myExperiment Community Extension from the help menu of RapidMiner.

    This process loops over all attributes and calculates the fraction of missings for each attribute. If this fration is larger than the fraction defined in the first "Set Macro" operator (macro: max_unknown), the attribute will be removed from the example set.

    Cheers,
    Ingo
  • dragoljub
    dragoljub New Altair Community Member
    Hey how can we access this operator? Do we have to sign up for any service?



    Thanks,
    -Gagi
  • land
    land New Altair Community Member
    Hi,
    in fact Ingo uploaded a complete process not a single operator. You can download the Community Extension as usual with the update manager and you don't have to sign into the community itself to download public available processes.

    Greetings,
      Sebastian
  • dragoljub
    dragoljub New Altair Community Member
    Thanks Guys,

    For some reason I did not see the list of public processes. This will help a lot.

    While this works it seems very cumbersome, is there any way to extract meta data and filter based on number of missing values?  ;)

    Thanks,
    -Gagi
  • land
    land New Altair Community Member
    Hi,
    I guess Ingo wouldn't have posted this process if an easier way existed without coding either on your or our side. If you find an easier solution or if you extend RapidMiner on your own, please keep the community informed about this issue.
    Greetings,
      Sebastian
  • haddock
    haddock New Altair Community Member
    Greets Seb,

    I must be missing something, would transposing the data and applying Ingo's stuff not work?

    Just a thought.

    Ciao
  • wanglu2014
    wanglu2014 New Altair Community Member

    Thank for your suggestion. However, two problem are met:

    1. community extention is intalled, however, no operator are added.

    2. At https://www.myexperiment.org/workflows/1276/versions/1.html, only txt are downloaded, and can not open as xml.