How to split strings contained in a text column of csv file into words

Ayushi_Aggarwal
Ayushi_Aggarwal New Altair Community Member
edited November 2024 in Community Q&A
As of now, I am reading a CSV file which has review(text), n1, n2, n3, overall (text) columns.
I am using select attributes to include only review column, which gives me an output in rapidminer of the form:
Row                                   Review
1                                        Poor service
2                                        There were torn seats

What i want to do is split the contents of Review column into individual words like : Poor, service, There, etc.
I am using Process documnets to data > Tokenize but somehow not getting the required output.

Please help.

Best Answers

  • David_A
    David_A New Altair Community Member
    Answer ✓
    Hi,

    if you don't necessarily have to use the Text extension. You could also simply use the "Split" Operator (not to confuse with "Split Data") and use a regular expression. I would say something simple like \s+|\W+ should do the trick (to split along spaces or non word characters (letters and numbers).

    Best,
    David

Answers

  • David_A
    David_A New Altair Community Member
    Answer ✓
    Hi,

    if you don't necessarily have to use the Text extension. You could also simply use the "Split" Operator (not to confuse with "Split Data") and use a regular expression. I would say something simple like \s+|\W+ should do the trick (to split along spaces or non word characters (letters and numbers).

    Best,
    David

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.