SVD Performance on large TF-IDF Matrices
All - I have 25K relatively short survey responses (most < 255 words). I am trying to cluster them into similar groups. My plan was to run the TF-IDF matrix thru SVD and then cluster them. Unfortunately the TF-IDF is very large (25K x 140K). The TDM alone took 15 minutes to process on my machine. SVD locks up after a few minutes of processing. This is an educational application and I am considering running the SVD in the cloud w/ my 100 credits. I fear this will not even come close to being enough. Has anyone got any ideas, suggestions or alternatives? Thanks.