Transform data into table with every attribute representation

Question

Hey there,

since my data set is too big to analyze it with a clustering algorithm (moreover I don't want to wait as long as it needs), I want to transform it into a smaller set.

The question I have is if it is possible to transform it into a data set that represents every attribute in a representative amount? For example: I have a data set that has 3 columns that all have 5 different, possible values (i.e. 1-5) and 10 million rows. Now I want to have a data set that contains all 3 columns with all types of values but only 100k rows so that I can analyze it. Is there an option to do that automatically in RM? If not I think I have to do it manually somehow.

Thanks and Greetings,

Moritz

SGolbert · Accepted Answer

Hi Moritz,
I haven't found dimension reduction techniques for polinomial variables in RM. Maybe it is possible to use feature selection.
Regarding the rows, these are the examples you are using for training and testing. It is up to you, how many examples you want to use. There is no need to use all the rows, at least while you are not deploying the final model. It of course depends of the kind of data also, if it is a time series the approach should be different.
Regards,Sebastian