Text mining classification with multiple classes

New Altair Community Member

Jan 25, 2018

Updated Nov 5, 2024 by Jocelyn

Hi,

I am relatively new to data science and therefore I have some questions:

I’m working on a text mining multi-class classification problem for a study assignment. The aim of my assignment is to build a model that predicts the ‘score’ attribute of textual reviews of products. The possible ‘score’ attribute values (classes) are 1,2,3,4 or 5, so it is like a star rating of reviews. My dataset contains 6 features:

ReviewerID, ReviewerName, ReviewText, Score, Summary and the length of my textual review.
There are 5000 reviews (rows) in my dataset and a few missing values (ReviewerName)
- 3000 reviews are 5 star reviews, 1000 reviews are 4 star reviews and the rest of the reviews is a 1, 2 or 3 star review. The classes are imbalanced.
I've uploaded the dataset

I have used various classification methods (kNN, naïve Bayes and Logistic regression SVM) but I cannot seem to achieve a higher accuracy of my model that 62%. I don’t know if this is a good accuracy or not, the random guess in 20% but I have the idea that there are things I can do to make a more accurate model. If I try to rebalance the dataset the accuracy drops to max 40%.

The process is: Read CSV (using quotes) -> numerical to polynomial > set role (‘score’ as label) > nominal to text > select attributes (reviewer ID is left out) > split data (70%/30%) > process documents (tokenize, stem, filter stop words, transform cases, generate n-grams (2)) > cross validation 10 fold -> KNN) > performance)

I don’t know if miss steps in my process or that I make mistakes or maybe 62% accuracy is the max. I hope that someone can help me out or give me tips!

Thanks!

Greetings Marijn

Find more posts tagged with

AI Studio

Classification

Text Mining + NLP

🎉Community Raffle - Win $25

Text mining classification with multiple classes

Find more posts tagged with

Quick Links