🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"How to process tamil text in rapidminer?"

User: "arunasethupathy"
New Altair Community Member
Updated by Jocelyn

Hello everybody

I am in the process of mining "tamil" language text in Rapidminer.

Is there is a option to process tamil language in rapidminer? ( I have seen post related to Arabic, Cyliric etc..)

I have used "encodig - UTF-8" in the preference of Rapidminer, the .txt file I encoded in utf-8 for saving.

But I am unable to read the file using Read Document Operator in Textmining Extension.

Any other solution? Kindly suggest

Thank you

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "Pavithra_Rao"
    New Altair Community Member

    Hi Aruna,

     

    To process Tamil words there is no specific operator similar to Filter stop words (German), Stem (German)etc,.

     

    But you could try using the Filter Stopwords (Dictionary), Stem(Dictionary) etc and provide the file containing Tamil (or any other language) words to accomplish your task here.

     

     

    Hope this helps.

     

    Cheers,

    Pavithra

    User: "Pavithra_Rao"
    New Altair Community Member

    Hi Aruna,

     

    After checking the error screenshot you had attached in the following post; 

    https://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Text-mining-in-utf-8/m-p/34525#M24221

     

    Figured out that for "read/write" Tamil text the encoding is TSCII and as of now the "Read Document" operator on RaoidMiner does not support this encoding format.

     

    But there is a workaround. You could try leveraging Python code described in the following blog post within "Execute Python" operator in RM Studio to do the conversions.

     

    https://ezhillang.blog/tag/open-tamil-text-processing/

     

    Hope this helps.

     

    Cheers,