Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Can I conduct LDA model and emotion analysis with Rapidminer in Chinese text?
Polly
Hi everyone,
I am a newbie here and this is my question.
I need to apply
Latent Dirichlet Allocation model and emotion analysis to Chinese text, but I don't know whether I can do these with Rapidminer, or which extensions I need to install further to be able to conduct the analyses.
I have already searched discussions about Chinese/mandarin, and already installed the Hanminer extensions mentioned in a discussion. But I don't think the Hanminer extensions are enough to conduct both analyses, and no one seems to put forward the question before.
Please give me some suggestions. Any ideas would be much appreciated!
Best,
Polly
Find more posts tagged with
AI Studio
Extensions
Text Mining + NLP
Sentiment Analysis
Accepted answers
All comments
MartinLiebig
Hi,
from my understanding, it should work. But
@yyhuang
is or mandarin expert.
Cheers,
Martin
Polly
Hi Martin
@mschmitz
,
Thank you for your reply.
I read other discussions about LDA, and just to make sure, if I want to conduct Latent Dirichlet Allocation model, is 'Linear Discriminant Analysis' the operator that I should use? Is it the 'Extract Topic from Data' operator that most people mentioned in the discussions?
Also, I wonder which operator I should use to conduct emotion analysis? Is it the
Singular Value Decomposition (
SVD)?
Besides, because in a discussion about LDA that no results showed in the process, you asked whether "is this 'western' text? LDA uses a default tokenization on this tokens like spaces and so on. This may totally fail if this is not in latin alphabet?", I guess the text language has a great influence on the results. Thus, to conduct analysis with Chinese text, are there any extensions or operators I need to install or combine to use?
Sorry for the huge amount of questions. I would be much appreciated if you could give me some advice. Thanks in advance!
Regards,
Polly
MartinLiebig
Hi
@Polly
,
the operator you want to use is Extract Topics from Data, not Linear Discreminant analysis.
And yes, LDA uses tokenization inside. And i just realized, that the default tokenization is on \s and not changeable, so i guess it is very hard to be applied on mandarin. As i said - I only speak German and English and am just not an expert on tokenization of mandarin/cantonese. So i don't know if it would even help if I offer the tokenization as an option.
Cheers,
Martin
Polly
Hi Martin,
Thank you for your help
I hope maybe
@yyhuang
can give me some advice on it.
Cheers,
Polly
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups