How to split strings contained in a text column of csv file into words

New Altair Community Member

Apr 15, 2019

Updated Nov 5, 2024 by Jocelyn

As of now, I am reading a CSV file which has review(text), n1, n2, n3, overall (text) columns.
I am using select attributes to include only review column, which gives me an output in rapidminer of the form:
Row Review
1 Poor service
2 There were torn seats

What i want to do is split the contents of Review column into individual words like : Poor, service, There, etc.
I am using Process documnets to data > Tokenize but somehow not getting the required output.

Please help.

Find more posts tagged with

Sort by:

1 - 2 of 21

David_A

New Altair Community Member

Accepted Answer

Apr 15, 2019

Hi,

if you don't necessarily have to use the Text extension. You could also simply use the "Split" Operator (not to confuse with "Split Data") and use a regular expression. I would say something simple like \s+|\W+ should do the trick (to split along spaces or non word characters (letters and numbers).

Best,
David

View in context

Telcontar120

New Altair Community Member

Accepted Answer

Apr 15, 2019

Can you be more clear about why Tokenize is not giving you what you expect? What are you getting? If you share your process and a data sample it will be easier to troubleshoot. In general Tokenize should do exactly what you are asking for, take a text column and split it up into individual words.

View in context

🎉Community Raffle - Win $25

How to split strings contained in a text column of csv file into words

Find more posts tagged with

Quick Links