Community & Support
Learn
Marketplace
Discussions
Categories
Discussions
General
Platform
Academic
Partner
Regional
User Groups
Documentation
Events
Altair Exchange
Share or Download Projects
Resources
News & Instructions
Programs
YouTube
Employee Resources
This tab can be seen by employees only. Please do not share these resources externally.
Groups
Join a User Group
Support
Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Altair RapidMiner
Text mining: Datacleaning and model ensembling?
kasper2304
Hi guys.
I need some help on elaboration a little on my choice of method and how optimally to do data cleaning and create and apply several trained models.
My case is the following:
Dataset: 2998 cases -> 337 positives & 2661 negatives
Partitioning: 85% for training and validation and 15% for testing -> 2262/286 for train and validation & 399/51 for testing
What i have read is that one can cluster negative cases and then train a model using the separate clusters with the positive cases for combining in the end. Is that a method anyone applied or can anyone explain a variant that can be performed in rapid miner.
I also looked into how to do data cleaning but i have no clue about which one to use for text mining as rapidminer provides several techniques.
Until now my method have simply been to downsample the majority class of my training and validation set providing the best results on my test set. I am using a SVM with linear kernel and the RBF kernel have not yielded better results. I did 3-grams, and stopword removal for preprocessing my text.
Best
Kasper
Find more posts tagged with
AI Studio
Comments
There are no comments yet
Quick Links
All Categories
Recent Discussions
Activity
My Discussions
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups