Applying attribute elimination to original data
I have a large dataset that has been tokenized. Many of the token attributes capture identical information, so I need to eliminate some variables that have 100% correlation.
Because the dataset is large, I'd like to perform "Remove Correlated Attributes" on a sample, rather than the original, then apply the results from the sample back to the original (eliminating about 1,000 attributes from the original in the process).
What's the best way to do this? I've been messing around with the "Work on Subset" operator, but it seems to only want to pull the sample back without applying the attribute removal to the original.
Thanks for any insight.