"Simple Text Classification - Help"

User2170
User2170 New Altair Community Member
edited November 5 in Community Q&A
Hello,

I am trying to classifiy documents (.txt) [sort into groups].

What I've dont so far:

Process Documents from Files (2 categories / classes) -> Tokenize -> Filter Stopwords ==> Learner ==> Apply Model (the document to classify comes from Read Document -> Process Documents (Tokenize, Filter) as you can see below:

image

There are 6 documents for each class (Process Documents from Files) and a single document to classify.

Is this the right way to classify text / documents in Rapidminer ? I am asking because the results are confusing..just to make sure, I want Rapidminer to tell me "Your single .txt file belongs to class/category A or B".

Thanks in advanced!

Answers

  • B_
    B_ New Altair Community Member
    Search for this post in BI Processes "Example - Classify Text Language" and remove the NGgram operator.  You will have a working text classifier.  I use it for several text classification applications.
  • land
    land New Altair Community Member
    Hi,
    you will have to make sure that in the apply case the same word lists are used! Otherwise there won't be the same attributes and the TF-IDF will differ! So forward them from the process documents operator in training part to the input port of Process Documents on application part.

    We have a Webinar that will introduce you to the text classification tasks more detailed.

    Greetings,
      Sebastian