🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

sum up column value with group by

mengkoon007User: "mengkoon007"
New Altair Community Member
Updated by Jocelyn

Hi,

 

I have the following columns:

a) id

b) month

c) year

d) value_1

e) value_2

with many other columns.

 

All the values in 'value_1' column, I want to sum up with group by id,month,year.

All the values in 'value_2' column, I want to sum up with group by id,month,year.

 

========

Example Set:

id   month  year     value_1    value_2       

1   Jan       2017     5               6

1   Jan       2017     7               8

1   Feb       2017     8               9

============

 

=============

Expected result:

id   month  year     value_1    value_2       

1   Jan       2017     12             14

1   Feb       2017     8               9

===========

 

I tried to use 1 Aggregate operator with the following input:

  • aggregation attribute (value_1) and aggregation functions (sum), then add entry
  • aggregation attribute (value_2) and aggregation functions (sum), 
  • then select group by attribute id,month,year

 

The following is the result I gotten which is incorrect,

1) There is now only 3 columns - id,value_1,value_2

2) It seems to add up all values with the same id => I need id,month,year matches then sum up

Find more posts tagged with

Sort by:
1 - 1 of 11
    landUser: "land"
    New Altair Community Member
    Accepted Answer

    Hi,

    if you aggregate a dataset all other columns need to be removed because their value would be undefined. 

    In your scenario, you will have multiple rows of the original data set that are belonging to one id / month / year combination. Each of these combinations will be shown as one row in the resulting data set. For the defined aggreagtion functions you will get a column, because the average of any number of rows is defined. But what to do with the other columns? You need to drop them, because you don't know which value to assign for this particular group.

    If you want to calculate the average for all of them, you can use the default aggregation parameter in RapidMiner. It will then use one aggregation function for every attribute.

     

    Greetings,

      Sebastian