BUG REPORT: text mining, the clustering process

YungCheng
YungCheng New Altair Community Member
edited November 2024 in Community Q&A
When I try to run the clustering process of text mining, it came out the error message. Process, error message and csv files are attached below.     

Best Answer

  • jacobcybulski
    jacobcybulski New Altair Community Member
    Answer ✓
    Hi, you have not included the actual RMP file so I am only guessing what may have gone wrong. Your data is over 20K examples and your text has 1000s of unique terms, k-means clustering is not very good deaing with 1000s of attributes. So I assume you have ran out of memory on your computer. To test this out, I suggest to reduce your sample size to 1000 (just for testing). More importantly, you need to reduce the number of terms generated by the parsing process. So I suggest to enable pruning within the Process Documents from Data, make it simple, e.g. percentual from 5% to 30%, which would possibly bring the number of attributes to less than 300. If it works, use all 100% of data. I also note that you have not normalised your data before clustering, so it will be difficult to visually analyse your data. Good luck!
    Jacob

Answers

  • jacobcybulski
    jacobcybulski New Altair Community Member
    Answer ✓
    Hi, you have not included the actual RMP file so I am only guessing what may have gone wrong. Your data is over 20K examples and your text has 1000s of unique terms, k-means clustering is not very good deaing with 1000s of attributes. So I assume you have ran out of memory on your computer. To test this out, I suggest to reduce your sample size to 1000 (just for testing). More importantly, you need to reduce the number of terms generated by the parsing process. So I suggest to enable pruning within the Process Documents from Data, make it simple, e.g. percentual from 5% to 30%, which would possibly bring the number of attributes to less than 300. If it works, use all 100% of data. I also note that you have not normalised your data before clustering, so it will be difficult to visually analyse your data. Good luck!
    Jacob

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.