Summary of comments

jozeftomas_2020
jozeftomas_2020 New Altair Community Member
edited November 2024 in Community Q&A

Hello

I have a question to thank you for answering.
That's what I was looking for, but I did not find it
That
I have a few comments I want to summarize in terms of content in four major categories
Do you know how to do?
This exercise is a data mining course at my university
Thanks a lot

Tagged:

Answers

  • SGolbert
    SGolbert New Altair Community Member

    Hi Jozef,

     

    You can check the last webinar by @sgenzer. Although there are some web scraping and API concepts that maybe you don't need, two techniques for classification of chatbot conversations are introduced: K-Means clustering and LDA. They surely apply to your problem.

     

    https://rapidminer.com/resource/text-mining-online-chats/?utm_content=buffere3fad&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

     

    Regards,

    Sebastian

  • jozeftomas_2020
    jozeftomas_2020 New Altair Community Member

    Hello
    Thank you so much for your answer:heart:

    Did i get it right? Should I kmeans comments on clustering? And then apply any LDA cluster?
    How do I figure out what content is there in each cluster?

    (Is it possible to view the shape of clusters and centers?)
    Thanks if you help me:smileyhappy:
    Waiting...

  • SGolbert
    SGolbert New Altair Community Member

    Hi Jozef,

     

    I'm not sure I understand the questions, but K-means and LDA are two different techniques. Both will assign each sample to one of the clusters. I'm afraid that deciding which to use and with which parameters is problem-dependent and requires a good dose of trial and error.

     

    Regarding the visualization, that would be possible only with two dimensions (like the classic example of the iris dataset).

     

    Regards,

    Sebastian

  • jozeftomas_2020
    jozeftomas_2020 New Altair Community Member

    Hi,

    thanks so much for your friend @SGolbert

    I want to be able to know what content is in each cluster. Can I understand by LDA? How can I use LDA to find the best K? Thanks if you help With respect