"How to process tamil text in rapidminer?"
Hello everybody
I am in the process of mining "tamil" language text in Rapidminer.
Is there is a option to process tamil language in rapidminer? ( I have seen post related to Arabic, Cyliric etc..)
I have used "encodig - UTF-8" in the preference of Rapidminer, the .txt file I encoded in utf-8 for saving.
But I am unable to read the file using Read Document Operator in Textmining Extension.
Any other solution? Kindly suggest
Thank you
Answers
-
Hi Aruna,
To process Tamil words there is no specific operator similar to Filter stop words (German), Stem (German)etc,.
But you could try using the Filter Stopwords (Dictionary), Stem(Dictionary) etc and provide the file containing Tamil (or any other language) words to accomplish your task here.
Hope this helps.
Cheers,
Pavithra
0 -
Hi Aruna,
After checking the error screenshot you had attached in the following post;
https://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Text-mining-in-utf-8/m-p/34525#M24221
Figured out that for "read/write" Tamil text the encoding is TSCII and as of now the "Read Document" operator on RaoidMiner does not support this encoding format.
But there is a workaround. You could try leveraging Python code described in the following blog post within "Execute Python" operator in RM Studio to do the conversions.
https://ezhillang.blog/tag/open-tamil-text-processing/
Hope this helps.
Cheers,
1