Problem with special preprocessing of texts
Removing links/URL and Hash tags Tweet may
contain URL, hash tags and words start with ‘@’
character. We removed these entities since found no
significance in our scoring approach.
Replacing word with contractions Contractions such as
‘didn’t’, ‘ain’t’ ‘couldn’t’ are common in tweets.
Elongation replacer People often use elongation like
‘loooooooove’ to emphasise words. Elongation can be
at the beginning (‘ooooooh’), end (‘toooooo’) or in
between (‘loooove’)
example ooooooooh what a coooooool breeze => ooh what a cool breeze
WordNet Lemmatizing Wordnet lemmatizer is used to
get a valid meaningful root word. Each word (except
slang/abbreviation) is lemmatized after tokenizing.
Explicit negation handling We used an antonym
replacer using WordNet to replace word preceded by
‘not,’ ‘never,’ etc.
thanks
Find more posts tagged with
Hello
I want internet links like
http: // jhghjgjh / jhghjgh
Delete from texts.
I do not know which operator to use?
And I want to
Prefixes like words
veeeeryyyHello
I want internet links like
http: // jhghjgjh / jhghjgh
Delete from texts.
I do not know which operator to use?
And I want to
Prefixes like words
veeeeryyy
That word is very
Have been deleted. But I do not know what the operator does.
And I will extend the words of the abbreviation to their original. How?
Please help
That word is very
Have been deleted. But I do not know what the operator does.
And I will extend the words of the abbreviation to their original. How?
Please help
hello @gham - you're likely going to want to learn how to write "Regular Expressions" (also known as "RegEx") to deal with situations like this. RegEx is a language that will take some time to learn and understand but it is well worth it. Some great resources for you are (1) this book, recommended to me by @Telcontar120 and now sits permanently on my desk, and (2) the website https://regexr.com/ which is the go-to reference for many many people.
In RapidMiner, you're going to do this by using the Replace Tokens operator (as mentioned above) -> Parameters -> replace dictionary -> click on "Edit List..." -> click on the "RegEx" button as shown below -> typing in your RegEx expression or using one of the pre-made ones.
Hello
Thank you very much
I wrote this to remove the link but the entire text was deleted!
Please help me master
[http://a-z.az/a-z]
Please tell me how should I write
And
Do you have a dictionary to remove stop words and root? Do i download Thank you for helping me
And
How to write regular expressions to remove the letters?
Like veeerrryyyyy -> very
hi
please help me..