🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Text Mining How to remove particular phrases in pre-processing

User: "mob"
New Altair Community Member
Updated by Jocelyn
Whats the best way to remove repeated sentences from my documents during pre-processing ?

I have a example set that includes a "text" column and some other attributes. The text column was read in from files in a folder. The text itself has a number of repeated phrases that I "think" I should remove before mining as I think they would skew the word frequency.

Given the "Filter Stopwords (Dictionary)" can only remove 1 stopword per line how do I handle a case like wanting to remove "Assessment and Grading" but still keep the word assessment and the word grading if they are located elsewhere in the document and how do I expand it so I can add other sentences I need removed

Find more posts tagged with