🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

What's the best way to determine the number of topics in the Extract Topics from Data (LDA) operator

User: "cmoten"
New Altair Community Member
Updated by Jocelyn
I have a dataset made of thousands of ways users have listed product names. For example, Apple MacBook, MacBook, MacBookPro, etc. There are all sorts of products included, but I'm trying to group similar ways people have described them into clusters. The Extract Topics from Data operator seems to be doing the trick but I'm manually having to choose the number of groups. Is there a way to determine the number of groups based on similarity? I hope this makes sense. 

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "lionelderkrikor"
    New Altair Community Member
    Accepted Answer
    Hi @cmoten,

    In RapidMiner, in first approximation, I see the following method (method to be confirmed by @mschmitzExtract Topics - LDA- operator is Martin's baby ... :) )  : 

    Use an Optimize parameters (grid) operator and plot the "Perplexity" according to the number of topic(s) k : 
    The lower the perplexity, the better the model.
    For example in the example below, the "optimal" number of topics k is 6 : 





    In attached file, an example of process to find the optimal number of topics using Optimize Parameters (Grid) operator.

    Regards,

    Lionel