"PCA vs PrincipalComponentGenerator?"
Legacy User
New Altair Community Member
Hi,
From what I could see, experiments ExampleSource-PrincipalComponentsGenerator(1) and ExampleSource-PCA-ModelApplier(2) generate the same output data sets in the input set contains a label attribute. If the input does not have a label, experiment (1) crashes at runtime, even though it passes validation. In addition, the experiment (2) outputs the PCA model, and has more controls (number of PCs).
If the PCA operator is clearly superior to the PrincipalComponentsGenerator, why do you keep the PrincipalComponentsGenerator? Or does it have any advantages I missed?
Victor
From what I could see, experiments ExampleSource-PrincipalComponentsGenerator(1) and ExampleSource-PCA-ModelApplier(2) generate the same output data sets in the input set contains a label attribute. If the input does not have a label, experiment (1) crashes at runtime, even though it passes validation. In addition, the experiment (2) outputs the PCA model, and has more controls (number of PCs).
If the PCA operator is clearly superior to the PrincipalComponentsGenerator, why do you keep the PrincipalComponentsGenerator? Or does it have any advantages I missed?
Victor
0
Answers
-
Hi,
you are right. They deliver the same output. There are basically two reasons for keeping the PrincipalComponentsGenerator:
1. backwards compatibility
2. only one operator instead of two in cases where you are interested in the PCA only (without the model)
It is, however, very likely that this operator will be marked as deprecated and will be removed from a future release sometime.
Cheers,
Ingo0 -
Hi,
... there seems to be another reason: Performance!
I have a data set with 20 attributes, 5094 examples.
1. PrincipalComponentsGenerator returns in a matter of a couple of seconds.
2. PCA takes 2900s so far and is still running with 100% CPU load
When I put a sampling operator in front of PCA and sample for 70%, I get a result in ~10s - still slower than PrincipalComponentsGenerator, but at least tolerable.
The dataset is such the PC-1 explains 99.97% of the variance - don't know whether that has any impact.
Kind regards Stefan0 -
hmm.... my dataset contained a line with missing values.
Not very elegant of PCA of course to just go to nirwana with such an input, but if I delete that line, it works.
Kind regards Stefan0 -
Hi,
we will increase the elegance of PCA by throwing an error with the next version.
Thanks for the hint,
Sebastian0