Stuck at LDA process. No results are coming

lambamanika07
lambamanika07 New Altair Community Member
edited November 2024 in Community Q&A
I updated my Rapidminer and from that instant I can not get any result from my LDA process. I am attaching the screenshot for the process and the sub-processes I am trying out for LDA for last 2-3 days but 'NA' as results is showing. Kindly help.

Tagged:

Best Answer

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓

    your file is coded in UTF-8. If you are using windows, you want to change the Encoding of Read Document to UTF-8. Otherwise strange things happend with signs like é.

    Further you should use a tokenize operator before your text mining operators. Operators like 'Stem' or 'n-grams' are working on the tokens. This may have duplicated your data.

    Lastly: Can you quickly confirm that the number of topics you search is < then the number of documents? If you search for 5 topics in 2 documents, that is doomed to fail.

    Best,
    Martin

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,
    can you please check if the collection of documents contains proper documents? I.e there are items in and there is also text?

    Best,
    Martin
  • lambamanika07
    lambamanika07 New Altair Community Member
    Hi Martin

    Yes, I have checked many times. I tried with text files and pdf files both. I tried even with different text samples but I had no luck! The results were coming like in the screenshot as NA.
  • MartinLiebig
    MartinLiebig
    Altair Employee
    edited September 2019
    Hi,
    is this 'western' text? LDA uses a default tokenization on this tokens like spaces and so on. This may totally fail if this is not in latin alphabet?

    Best,
    Martin
  • lambamanika07
    lambamanika07 New Altair Community Member
    Hi Martin

    The text is in English language. I have run the same samples before also for testing few weeks ago and it worked fine. That time I was using the 8 version of Rapidminer. I am facing this problem from the moment I upgraded to the latest 9 version. I do not think the up gradation of the version would be creating any problem but I am telling you just in case. 
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,
    can you maybe share data and processes via private message? I would love to have a look at this.

    BR,
    Martin
  • lambamanika07
    lambamanika07 New Altair Community Member
    Hi Martin

    I have sent you a personal message with the sample text and the process. Thank you for you help in advance.
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓

    your file is coded in UTF-8. If you are using windows, you want to change the Encoding of Read Document to UTF-8. Otherwise strange things happend with signs like é.

    Further you should use a tokenize operator before your text mining operators. Operators like 'Stem' or 'n-grams' are working on the tokens. This may have duplicated your data.

    Lastly: Can you quickly confirm that the number of topics you search is < then the number of documents? If you search for 5 topics in 2 documents, that is doomed to fail.

    Best,
    Martin
  • lambamanika07
    lambamanika07 New Altair Community Member
    It worked! Thank you so much. 
  • MartinLiebig
    MartinLiebig
    Altair Employee
    what was the problem here? UTF or the tokenization?

    BR,
    Martin
  • lambamanika07
    lambamanika07 New Altair Community Member
    Hey Martin

    I made both the changes regarding UTF selection and adding tokenization operator as suggested in the process and then it worked. 

    With regards
    Manika