Adding a column that performs a distinct count

Hi all,
Very new to Rapidminer!
I am needing to add a column into my table that runs a distinct count.
This is a sample of my table currently:
CaseNumber | Date |
1A | 1/1/2015 |
1A | 1/2/2015 |
1A | 1/2/2015 |
2A | 2/1/2015 |
3A | 3/1/2015 |
2A | 4/1/2015 |
The Distinct Count would perform a distinct count on Case Number so:
CaseNumber | Date | DistinctCount |
1A | 1/1/2015 | 1 |
1A | 1/2/2015 | 0 |
1A | 1/2/2015 | 0 |
2A | 2/1/2015 | 1 |
2A | 3/1/2015 | 0 |
3A | 4/1/2015 | 1 |
Basically I just want to count how many unique case numbers there are.
So casenumber 1A occurs 3 times, but it's still just the 1 case number.
Same thing for casenumber 2A, it occurs twice, but still just the 1 case number.
Find more posts tagged with
Hi Thomas,
Thanks for the reply.
I might missing something here, but I don't want just the distinct count. I want to keep all my existing data and add a new column that performs a distinct count on my CaseNumber field.
When I run the Aggregate Operator and select "Only Distinct", I'm give just the distinct count and nothing else.
Why not just use the "Remove Duplicate" operator? Then your dataset will only contain the non-duplicated entries, and the total record count will equal the distinct record count. You can specify the fields that define a unique record in that operator, so you can use both case number and date or any other combination of attributes.
Best,
There are a number of additional fields in this dataset that I need.
In this dataset, "casenumber" refers to a specific survey of about 50 questions, therefore there will be duplicate casenumbers because there are a number of different questions for each casenumber.
Unfortunatley removing duplicates is not an option.
Hi,
That's pretty simple to do if you use the Aggregate Operator and select "Only Distinct"
Here's a sample process attached.