Replace missing value with many subgroup

XiaoHui_0206
XiaoHui_0206 New Altair Community Member
edited November 2024 in Community Q&A
Hello! ;)

I'm a new user of RapidMiner and I've encountered an issue while working with some packages. Specifically, I'm trying to replace missing values in my data with the average of the values within the same attribute, but grouped by another attribute. I'd appreciate any assistance in solving this problem. For example, i have

Countries Year Value
Malaysia  2015  1
Malaysia  2014  2
Malaysia  2013  3
Malaysia  2012  4
Malaysia  2011  ?
Malaysia  2010  ?
Malaysia  2009  7
Malaysia  2008  ?
Malaysia  2007  8
Malaysia  2006  9
Malaysia  2005  10
Malaysia  2004 ?

Indonesia 2015 1
Indonesia 2014 2
Indonesia 2013 3
Indonesia 2012 ?
Indonesia 2011  5
Indonesia 2010 6
Indonesia 2009 7
Indonesia 2008 ?
Indonesia 2007 8
Indonesia 2006 9
Indonesia 2005 10
Indonesia 2004 ?

I want to find the average of all countries, I have 190+ countries, but when I use the replace missing value operator, it divides the value by all countries' values, which is not accurate. How I can find the average of all countries by only dividing the particular countries? 
Exp:
Malaysia =(1+2+3+4+7+8+9+10)/8

Here is my dataset, thanks for helping me!  :)
Tagged:

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,

    my first intuition would be to use:
    Group into Collection by country
    Loop Collection
    Replace Missing Values inside it

    Best,
    Martin
  • XiaoHui_0206
    XiaoHui_0206 New Altair Community Member
    Hi, can I know how to group by country by using loop collection? :'( After I drag the loop collection into the process. I can't connect my dataset output to the loop collection. There is some error. (Expected IOObjectCollection but received ExampleSet.) And inside the loop collection, I put replace missing value operator, what this operator should connect to?
  • Caperez
    Caperez Altair Community Member
    Hi @XiaoHui_0206,

    Other option is using the loop values operator, filtering by attribute, extracting the average of this attribute and finally replacing the average in each group.

    Please find attached a simple example, 

    Best, 

    Cesar
  • MartinLiebig
    MartinLiebig
    Altair Employee
    you need to use group into collection first. Its an operator in operator toolbox extension.

    @ceaperez your solution works, but it gets slow if you have large data sets with many nominals. Just because you have to filter every time.
    Best,
    Martin
  • Caperez
    Caperez Altair Community Member
    Good point @MartinLiebig,