🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Text Mining - Document Similarity/Clustering"

User: "rahi84"
New Altair Community Member
Updated by Jocelyn
Hello All,

I am trying to perform document similarity/clustering in RapidMiner on a survey text field and having problems so far. The data is saved in an Excel file (.xlsx) and I need to process the documents so that the case is lowered, words are tokenized, stemmed and the stopwords filtered out. Could you please run me through the nodes that I need to assign to the data so that I can perform a document similarity and clustering. I have watched 'el chief' tutorials on YouTube and unfortunately it hasn't worked out. I have tried the following nodes (in order) and I get a blank output:

1. Read Excel
2. Data to Documents
3. Process Documents (+ Tokenize, Filter Stopwords( English), Transform Cases, Stem (Porter))
4. Data Similarity

Find more posts tagged with