"How to process tamil text in rapidminer?"

arunasethupathy
arunasethupathy New Altair Community Member
edited November 5 in Community Q&A

Hello everybody

I am in the process of mining "tamil" language text in Rapidminer.

Is there is a option to process tamil language in rapidminer? ( I have seen post related to Arabic, Cyliric etc..)

I have used "encodig - UTF-8" in the preference of Rapidminer, the .txt file I encoded in utf-8 for saving.

But I am unable to read the file using Read Document Operator in Textmining Extension.

Any other solution? Kindly suggest

Thank you

Answers

  • Pavithra_Rao
    Pavithra_Rao New Altair Community Member

    Hi Aruna,

     

    To process Tamil words there is no specific operator similar to Filter stop words (German), Stem (German)etc,.

     

    But you could try using the Filter Stopwords (Dictionary), Stem(Dictionary) etc and provide the file containing Tamil (or any other language) words to accomplish your task here.

     

     

    Hope this helps.

     

    Cheers,

    Pavithra

  • Pavithra_Rao
    Pavithra_Rao New Altair Community Member

    Hi Aruna,

     

    After checking the error screenshot you had attached in the following post; 

    https://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Text-mining-in-utf-8/m-p/34525#M24221

     

    Figured out that for "read/write" Tamil text the encoding is TSCII and as of now the "Read Document" operator on RaoidMiner does not support this encoding format.

     

    But there is a workaround. You could try leveraging Python code described in the following blog post within "Execute Python" operator in RM Studio to do the conversions.

     

    https://ezhillang.blog/tag/open-tamil-text-processing/

     

    Hope this helps.

     

    Cheers,