Stuck at LDA process. No results are coming
lambamanika07
New Altair Community Member
I updated my Rapidminer and from that instant I can not get any result from my LDA process. I am attaching the screenshot for the process and the sub-processes I am trying out for LDA for last 2-3 days but 'NA' as results is showing. Kindly help.
Tagged:
0
Best Answer
-
Hi @lambamanika07 ,your file is coded in UTF-8. If you are using windows, you want to change the Encoding of Read Document to UTF-8. Otherwise strange things happend with signs like é.Further you should use a tokenize operator before your text mining operators. Operators like 'Stem' or 'n-grams' are working on the tokens. This may have duplicated your data.Lastly: Can you quickly confirm that the number of topics you search is < then the number of documents? If you search for 5 topics in 2 documents, that is doomed to fail.Best,Martin1
Answers
-
Hi,can you please check if the collection of documents contains proper documents? I.e there are items in and there is also text?Best,Martin0
-
Hi Martin
Yes, I have checked many times. I tried with text files and pdf files both. I tried even with different text samples but I had no luck! The results were coming like in the screenshot as NA.0 -
Hi,is this 'western' text? LDA uses a default tokenization on this tokens like spaces and so on. This may totally fail if this is not in latin alphabet?Best,Martin0
-
Hi Martin
The text is in English language. I have run the same samples before also for testing few weeks ago and it worked fine. That time I was using the 8 version of Rapidminer. I am facing this problem from the moment I upgraded to the latest 9 version. I do not think the up gradation of the version would be creating any problem but I am telling you just in case.0 -
Hi,can you maybe share data and processes via private message? I would love to have a look at this.BR,Martin1
-
Hi Martin
I have sent you a personal message with the sample text and the process. Thank you for you help in advance.0 -
Hi @lambamanika07 ,your file is coded in UTF-8. If you are using windows, you want to change the Encoding of Read Document to UTF-8. Otherwise strange things happend with signs like é.Further you should use a tokenize operator before your text mining operators. Operators like 'Stem' or 'n-grams' are working on the tokens. This may have duplicated your data.Lastly: Can you quickly confirm that the number of topics you search is < then the number of documents? If you search for 5 topics in 2 documents, that is doomed to fail.Best,Martin1
-
It worked! Thank you so much.0
-
1
-
Hey Martin
I made both the changes regarding UTF selection and adding tokenization operator as suggested in the process and then it worked.
With regards
Manika1