I would like to determine themes of a corpus of tweets using PCA. I created the process using the ff: read excel,nominal to numeric, PCA and connected the ports. There are no errors but I am not sure on how I can identify the hidden themes using PCA with the standard deviation, proportion of variance and cumulative variance. The proportion of variance ranges from 0-.001. I set the variance threshold at .95.
Can you please help me? Thank you
component |
std dev |
proportion of variance |
cumulative variance |
PC 1 |
0.157 |
0.025 |
0.025 |
PC 2 |
0.137 |
0.019 |
0.045 |
PC 3 |
0.123 |
0.016 |
0.06 |
PC 4 |
0.118 |
0.014 |
0.075 |
PC 5 |
0.115 |
0.014 |
0.089 |
PC 6 |
0.112 |
0.013 |
0.101 |
PC 7 |
0.104 |
0.011 |
0.113 |
PC 8 |
0.1 |
0.01 |
0.123 |
PC 9 |
0.098 |
0.01 |
0.133 |
PC 10 |
0.097 |
0.01 |
0.143 |
PC 11 |
0.097 |
0.01 |
0.153 |
PC 12 |
0.093 |
0.009 |
0.161 |
PC 13 |
0.093 |
0.009 |
0.17 |
PC 14 |
0.092 |
0.009 |
0.179 |
PC 15 |
0.09 |
0.008 |
0.187 |
PC 16 |
0.089 |
0.008 |
0.196 |
PC 17 |
0.087 |
0.008 |
0.204 |
PC 18 |
0.087 |
0.008 |
0.211 |
PC 19 |
0.086 |
0.008 |
0.219 |
PC 20 |
0.084 |
0.007 |
0.226 |
PC 21 |
0.083 |
0.007 |
0.234 |
PC 22 |
0.082 |
0.007 |
0.241 |
PC 23 |
0.082 |
0.007 |
0.248 |
PC 24 |
0.081 |
0.007 |
0.254 |
PC 25 |
0.08 |
0.007 |
0.261 |
PC 26 |
0.08 |
0.007 |
0.268 |