"Text Mining / Too slow"

User: "veve"
New Altair Community Member
Updated by Jocelyn
Hello,

I'm having a process in which I'm processing text from data(prunning below 3% and above 40% , vector TF-IDF) that implies stemming (snowball), tokenize, uppercase, stop words..

My data is an example set of about 800 000 lines and I'm treating 3 text attributes.

The attributes:
- First one: has several words
- Second one: has none or 2-3 words
- Third one: has about 300 words


I'm having a 15.5 GB for my machine and 12GB for RapidMiner.

My process treated 20 000 lines in 3 hours and a half...so I estimate that the process should take 6 days and a half. (Which is not really acceptable?)

1. Are there any ways in optimizing a text processing process?
2.  Does this seem to you that I have a problem in my process? (normally I followed the tutorials, it doesn’t have anything of really special)
3.  Are there any benchmark studyies on the speed of rapidminer?

Thank you in advance,

Best regards,

Find more posts tagged with