"Text Mining / Too slow"
Hello,
I'm having a process in which I'm processing text from data(prunning below 3% and above 40% , vector TF-IDF) that implies stemming (snowball), tokenize, uppercase, stop words..
My data is an example set of about 800 000 lines and I'm treating 3 text attributes.
The attributes:
- First one: has several words
- Second one: has none or 2-3 words
- Third one: has about 300 words
I'm having a 15.5 GB for my machine and 12GB for RapidMiner.
My process treated 20 000 lines in 3 hours and a half...so I estimate that the process should take 6 days and a half. (Which is not really acceptable?)
1. Are there any ways in optimizing a text processing process?
2. Does this seem to you that I have a problem in my process? (normally I followed the tutorials, it doesn’t have anything of really special)
3. Are there any benchmark studyies on the speed of rapidminer?
Thank you in advance,
Best regards,
I'm having a process in which I'm processing text from data(prunning below 3% and above 40% , vector TF-IDF) that implies stemming (snowball), tokenize, uppercase, stop words..
My data is an example set of about 800 000 lines and I'm treating 3 text attributes.
The attributes:
- First one: has several words
- Second one: has none or 2-3 words
- Third one: has about 300 words
I'm having a 15.5 GB for my machine and 12GB for RapidMiner.
My process treated 20 000 lines in 3 hours and a half...so I estimate that the process should take 6 days and a half. (Which is not really acceptable?)
1. Are there any ways in optimizing a text processing process?
2. Does this seem to you that I have a problem in my process? (normally I followed the tutorials, it doesn’t have anything of really special)
3. Are there any benchmark studyies on the speed of rapidminer?
Thank you in advance,
Best regards,