🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

How to split strings contained in a text column of csv file into words

User: "Ayushi_Aggarwal"
New Altair Community Member
Updated by Jocelyn
As of now, I am reading a CSV file which has review(text), n1, n2, n3, overall (text) columns.
I am using select attributes to include only review column, which gives me an output in rapidminer of the form:
Row                                   Review
1                                        Poor service
2                                        There were torn seats

What i want to do is split the contents of Review column into individual words like : Poor, service, There, etc.
I am using Process documnets to data > Tokenize but somehow not getting the required output.

Please help.

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "David_A"
    New Altair Community Member
    Accepted Answer
    Hi,

    if you don't necessarily have to use the Text extension. You could also simply use the "Split" Operator (not to confuse with "Split Data") and use a regular expression. I would say something simple like \s+|\W+ should do the trick (to split along spaces or non word characters (letters and numbers).

    Best,
    David

    User: "Telcontar120"
    New Altair Community Member
    Accepted Answer
    Can you be more clear about why Tokenize is not giving you what you expect?  What are you getting?  If you share your process and a data sample it will be easier to troubleshoot.  In general Tokenize should do exactly what you are asking for, take a text column and split it up into individual words.