"[SOLVED] Stemming: Keep Information {original word, stem}"

Urselinho
Urselinho New Altair Community Member
edited November 5 in Community Q&A
Hi there,
I'm currently doing some text processing using the different stemming operators. Right now I'm wondering if there is a way to keep/show the information which words are conflated to which stem. Without doing any adjustment the results of stemming (wordlist, example set) only contain the stems and the associated information like occurences.

What I primaliry need is something like  {original word, stem}.

I'm sure there is a quite easy task, but as I'm not that familiar with RM yet I don't see it. Any idea how to do this?

Many thanks in advance,
Regards,
Urs

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    Hi Urs,

    actually, the stemming operators dismiss the original tokens, such that it is not possible to see which stem results from which token. The only solution may be to compare the stemmed document with the original document token-wise in a rather complex process and write the mapping manually into an example set.

    Best, Marius
  • Urselinho
    Urselinho New Altair Community Member
    Hi Marius,
    that's quite unpleasent. But OK I do see the workaround. Thanks for your help.

    Best,
    Urs
  • Urselinho
    Urselinho New Altair Community Member
    Hi Marius,
    me once again. I really have to ask. Otherwise it will take me a long time to find the right operators/functions.

    How can I use the Stemming-Operator in a way that words are "replaced" within a given document rather than "conflated". Because right now if I, for example, do have a document with the words "Autos" and "Auto" the wordlist will only contain the stem "auto".

    Thanks in advance,
    Urs