"Text Mining: How to split data according to language"
Hi there,
I am currently trying to split the text corpus I am working with into the different languages the texts are written in, but I fail and seek help.
First, I classified the languages of each text in my text corpus by using a Naive Bayes based language detector. Thus, I already know which of the texts are e.g. German or English. Now, I want to select only the German or English texts in order to analyze them seperately, but I fail and don't know the correct operators to use. I already tried to use the Filter Examples operator, but it looks like only the different prediction labels for the languages are filtered and the corresponding texts are omitted.
Can anybody help?
Thanks in advance!!
Ute
I am currently trying to split the text corpus I am working with into the different languages the texts are written in, but I fail and seek help.
First, I classified the languages of each text in my text corpus by using a Naive Bayes based language detector. Thus, I already know which of the texts are e.g. German or English. Now, I want to select only the German or English texts in order to analyze them seperately, but I fail and don't know the correct operators to use. I already tried to use the Filter Examples operator, but it looks like only the different prediction labels for the languages are filtered and the corresponding texts are omitted.
Can anybody help?
Thanks in advance!!
Ute