Hi,
What would be the easiest way to remove empty columns / rows in a dataset?
My data is imported from source data that I can't easily modify before load so I prefer to deal with it within rapidminer.
What I'm looking at is a simple way to remove both Col1 and Row 1 in below example, but in reality I can have more than a few empty rows and columns in much more heavy tables. I've tried to use the filter missing / no missing attributes but it basically removes everything as soon as it finds a missing value so not exactly what I need.
|
Col1 |
Col2 |
Col3 |
1 |
? |
? |
? |
2 |
? |
? |
Some data |
3 |
? |
? |
Some data |
4 |
? |
? |
Some data |
5 |
? |
Some data |
? |
For now I have a fairly heavy workflow where I first create an id, then multiply my data and have one set where I replace all missing values with a 0, everything else with 1 and then use aggregates to sum and remove every row where sum = 0. Next I loop through all my attributes, do something similar and remove all columns where the aggregated sum is 0 again.
It does the trick but seems a bit overkill, so I'm wondering if I'm missing some easy way to deal with this.