classification or clustering

Question

Hi,

I am currently busy with a dataset that contains of text. I have questions how to handle this dataset.
- because of the size of the dataset i want to use the filter example for one type of title and sample to decrease the number of items. But how can this be done exactly?
- I want to apply necessary classifications to solve the business problem. I use the operators: Retrieve- nominal to text- process documents and tokenize. Can somebody help me what i do wrong here?

BalazsBaranyRM · Answer

Hi!

The Filter Examples operator has operators for nominal attributes like "contains", "starts with" or "matches". These should help you filter the title. 
Sampling is done with one of the Sample operators. 
Academy video: https://academy.rapidminer.com/learn/video/sampling-weighting-intro

I don't think that you're doing something wrong with the steps you're describing in your document classification. You should have a target (label) attribute for the classification and apply a learner like Naive Bayes or Support Vector Machine in a cross validation to the data.

Text Mining is a large topic. Please check out this course in the Academy:
https://academy.rapidminer.com/courses/text-and-web-mining-with-rapidminer

Regards,
Balázs