text mining - linguistic preprocessing (thesaurus, synonyms, concepts, ...)

bentlage
bentlage New Altair Community Member
edited November 5 in Community Q&A
Hello,

I am trying to use RapidMiner for some text mining. In order to improve my classification results I would like to "improve" my raw data.
My raw data consists of short (max 2 sentences) descriptions of machine failures. So the raw data is free text with no regulations at all.

My problem is therefore: Rapidminer can't differentiate between "part is overheated" or "part got too warm". To solve this I have to ideas (probably there are a lot more and a lot better ideas)

first: find and replace similar words
Using some preprocessing to realize, that "overheated" and "too warm" means nearly the same. --> Is it possible to integrate a Thesaurus to identify synonyms (for example: openthesaurus)

second: use categories
Replace word with categories. So "apple" and "pear" are replaced to "fruit". But again, I would need to integrate a tool/ an addon to solve this in Rapidminer.

--> Have you done anything similar with RapidMiner before? Or could you give me a tip how to integrate this linguistic preprocessing into rapidminer?
And to make it a bit more complicated: my raw data is in German...

Thanks a lot for your help and your ideas
Aaron

Tagged:

Answers

  • meskadinf
    meskadinf New Altair Community Member
    Hello Aaron
    I think that the solution of your problem is :
      - Firstly  must  install 'Text Mining Extension' .
      - Secondly install  tools linguistic dictionary like Wordnet Extention.
    to do those,I suggest you follow the following steps:
    In Menu bar of RapidMiner  go  " Help -> updates and Extentions" . search "Text Mining Extension" and install it.
    search also Wordnet Extention and insall it.
  • bentlage
    bentlage New Altair Community Member
    Hello meskadinf,

    thanks for your help. I have akready installed the "Text Mining Extension" and realized a first clustering of my data.
    My wish is now to improve the results.

    The wordnet extension is really great for this porpose. But: My data is a german text.
    --> Do you have an idea how to use wordnet with another language?
    Or is it even possible to programm a similar (but much less powerfull) extension on my own?

    Thanks
    Aaron
  • meskadinf
    meskadinf New Altair Community Member
    Sorry,
    I d'nt have any idea about how to use wordnet in another language.
    My problem is with the use of wordnet in RapidMiner how do detecte negation between two sentences.
    If there is a solution please help me.
    thank you in advance.