Text Mining
Hello,
I´m writting my Diploma thesis about "structured and unstructured data in Business Intelligence". I need some information about the Text Mining-Plugin! How many languages are supported? Can I export the analyzed text?and who?like a XML? Is there an Text Mining Example?How many formats (pdf xls) are supported?
Thanks
Best regards
Chris
I´m writting my Diploma thesis about "structured and unstructured data in Business Intelligence". I need some information about the Text Mining-Plugin! How many languages are supported? Can I export the analyzed text?and who?like a XML? Is there an Text Mining Example?How many formats (pdf xls) are supported?
Thanks
Best regards
Chris
Find more posts tagged with
Sort by:
1 - 9 of
91
Can I ask you where the language packages? I'm using the Tokenize operator but i don't know where to find the "Italian" tokenizator...
Ingo Mierswa wrote:
Hi,
In principle every language is supported which a) can be represented by characters at all and b) which consists of words which can be detected by some separation character or mechanism. There are, however, some specific operators (preprocessing steps) within the Text Extension which supports a fixed set of languages. For example the step 'stemming' which is supported for German, English, French, Spanish, Portuguese, Italian, Romanian, Dutch, Swedish, Norwegian, Danish, Finnish, Russian, Hungarian, Turkish. But not every process needs stemming and hence there is more often no language restriction at all. That is - in my opinion - one of the major advantages of a statistical approach compared to linguistic approaches.
Thanks!
Hi,
having a look at the operators it seems there is no Italian stopword filter. In this case you will probably have to use the "Filter Stopwords (Dictionary)" operator that allows you to define your own stopwords. Maybe there is some public list available somewhere for common Italian stopwords!?
Regards
Matthias
having a look at the operators it seems there is no Italian stopword filter. In this case you will probably have to use the "Filter Stopwords (Dictionary)" operator that allows you to define your own stopwords. Maybe there is some public list available somewhere for common Italian stopwords!?
Regards
Matthias
You name the format. I will say 'Yes. This is supported.' In rare cases I would have to answer: 'Huh? Never heard of this one...' ;D
Cheers,
Ingo