🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Problem with special preprocessing of texts

User: "gham"
Banned
Updated by Jocelyn
Hi, is there any possibility of preprocessing in the RapidMiner? How?

Removing links/URL and Hash tags Tweet may


contain URL, hash tags and words start with ‘@’
character. We removed these entities since found no
significance in our scoring approach.

Replacing word with contractions Contractions such as
‘didn’t’, ‘ain’t’ ‘couldn’t’ are common in tweets.

Elongation replacer People often use elongation like
‘loooooooove’ to emphasise words. Elongation can be
at the beginning (‘ooooooh’), end (‘toooooo’) or in
between (‘loooove’)
example ooooooooh what a coooooool breeze => ooh what a cool breeze

WordNet Lemmatizing Wordnet lemmatizer is used to
get a valid meaningful root word. Each word (except
slang/abbreviation) is lemmatized after tokenizing.


Explicit negation handling We used an antonym
replacer using WordNet to replace word preceded by
‘not,’ ‘never,’ etc.

thanks

Find more posts tagged with