How to split strings contained in a text column of csv file into words

Ayushi_Aggarwal
Ayushi_Aggarwal New Altair Community Member
edited November 5 in Community Q&A
As of now, I am reading a CSV file which has review(text), n1, n2, n3, overall (text) columns.
I am using select attributes to include only review column, which gives me an output in rapidminer of the form:
Row                                   Review
1                                        Poor service
2                                        There were torn seats

What i want to do is split the contents of Review column into individual words like : Poor, service, There, etc.
I am using Process documnets to data > Tokenize but somehow not getting the required output.

Please help.

Best Answers

  • David_A
    David_A New Altair Community Member
    Answer ✓
    Hi,

    if you don't necessarily have to use the Text extension. You could also simply use the "Split" Operator (not to confuse with "Split Data") and use a regular expression. I would say something simple like \s+|\W+ should do the trick (to split along spaces or non word characters (letters and numbers).

    Best,
    David

Answers

  • David_A
    David_A New Altair Community Member
    Answer ✓
    Hi,

    if you don't necessarily have to use the Text extension. You could also simply use the "Split" Operator (not to confuse with "Split Data") and use a regular expression. I would say something simple like \s+|\W+ should do the trick (to split along spaces or non word characters (letters and numbers).

    Best,
    David