Textual ETL: Stemming from dictionary

Wanttoknow
Wanttoknow New Altair Community Member
edited November 5 in Community Q&A
Hi,

First of all I have to say that RM5.0 is a wonderful tool. :o Congratulations.

I started with pre processing text for classification and I am having some problems with the "Stem (Dictionary)" component.

I am referring to a textfile for the patterns but I am not sure about the syntax of the entries/records in the textfile. The help is very brief about this

Right now the first line in my designated TXT file looks like this:

"move: moving moved move"

But it is not replacing any of the terms to their stem.

Any idea?

Answers

  • arminmania
    arminmania New Altair Community Member
    Hi,

    I am not sure, but I think you have to write as followed:

    move , moving moved move
  • TobiasMalbrecht
    TobiasMalbrecht New Altair Community Member
    Hi,
    Wanttoknow wrote:

    Right now the first line in my designated TXT file looks like this:

    "move: moving moved move"
    did you try to put a blank before the colon?

    Kind regards,
    Tobias
  • Wanttoknow
    Wanttoknow New Altair Community Member
    Well, after a lot of trail and error this seems to work

    "
    aanleveren:aanlever.*
    aanleveren:aangelever.*
    zorgverzekering:zorgverzeker.*
    "
    But putting multiple patterns on 1 line like this "aanleveren : aanlever* aangelever*" doesn't work.

  • Wanttoknow
    Wanttoknow New Altair Community Member
    Another question:

    Is it possible to use an external list for the ReplaceToken component? That would be more convenient than entering records with the list editor of the component.