I have a large number of English texts (online reviews). I need to do a data cleaning, preprocessing on them and then mine which are the effective high-frequency words? For example, "bathroom", "traffic", etc., are likely to appear as valid high-frequency words. Do you have any specific steps to do so?
Yes, you can use Rapidminer for that and you have specific content in Rapidminer Acafemy for free.
please check this thread
please check this thread
https://community.rapidminer.com/discussion/59797/can-process-documents-calculate-term-occurences-of-all-words-without-having-to-give-it-a-word-list#latest
best,
Cesar