Dimensionality Reduction with SVD

Muhammed_Fatih_
Muhammed_Fatih_ New Altair Community Member
edited November 5 in Community Q&A
Dear RapidMiner Community, 

I am currently conducting a simple Singular Value Decomposition (SVD) process based on a TDM (Term Document Matrix) I've generated consdering Communication data. The matrix has 9662 attribute columns (terms) and in total 72826 rows. My SVD process is running for four days and has not finished yet. Especially the SVD operator itself is still loading (see attached). 

Could you tell me what a "normal" computational time for such a matrix is with regard to dimenisonality reduction with SVD? Do I really need several days to compute SVD?

Thank you in advance for your help & support ! 

Best regards

Muhammed 


Answers

  • rfuentealba
    rfuentealba New Altair Community Member

    Given the 9662x72826 matrix and the nature of the calculations, I would say it's very likely that your process is still working. I would take a look at how the computer memory, processor and swap are behaving, because if the data doesn't fit into RAM, it will begin swapping on disk, making disk access (and the entire process) effectively slower.

    Hope this can help,

    Rod.
  • Muhammed_Fatih_
    Muhammed_Fatih_ New Altair Community Member
    Hello @rfuentealba

    thank you for your answer. The memory-, processor behaviour looks like the following: 



    I have a 32 gb memory on my computer. What would you say? Do you think, that additonal memory is needed to compute the process? Or are there alternatives with regard to SVD computation in order to speed up the process? 

    Thank you in advance for your answer. 

    Muhammed
  • rfuentealba
    rfuentealba New Altair Community Member

    I am not a Microsoft user (haven't used it since 1995) and I don't know the internals of that OS, hence I had a lot of trouble reading what does memory paging means. I don't see that the machine is using swap memory, which is a clear indicative that your computer needs more memory and it's trying to use the hard disk for it. However, I found somewhere else that Windows only uses swap in case of a crash.

    I think @Marco_Boeck or @pschlunder can have a better technical understanding on what's happening here. Quoting @mschmitz too.

    Now, NLP processes always take a lot of time, this is why I can't say "it's normal" or "it's not". If you could provide a little sample, I could test a few things for you on my servers.

    All the best,

    Rodrigo.