"PCA Operator taking to much time"
Hello, I am performing sentiment analysis on text data in which I examine 1700 tweets. after performing all preprocessing of data I want to visualize it using PCA to check the relationship between the different classes. After generating TF-IDF I am using pca operator with componant=2 and fixed number variance but it taking much much time approx 2 to 3 hour. Even I put a normalize operator before PCA but it doesn't work for me
Answers
-
Did you apply any pruning when you generated your word vector? If not, then you probably have thousands of attributes, many of which have extremely low values, and that is why PCA is taking so long! You should definitely prune your wordlist first, since tokens that have only a handful of occurrences are not going to be meaningful, but they are causing a lot of computational effort on the part of the PCA operator.
1 -
What @Telcontar120 said. Work on your wordlist first before you put it into PCA. Even just 50 attributes could chew up runtime if you don't have a large memory computer.
0