How to split strings contained in a text column of csv file into words
As of now, I am reading a CSV file which has review(text), n1, n2, n3, overall (text) columns.
I am using select attributes to include only review column, which gives me an output in rapidminer of the form:
Row Review
1 Poor service
2 There were torn seats
What i want to do is split the contents of Review column into individual words like : Poor, service, There, etc.
I am using Process documnets to data > Tokenize but somehow not getting the required output.
Please help.
I am using select attributes to include only review column, which gives me an output in rapidminer of the form:
Row Review
1 Poor service
2 There were torn seats
What i want to do is split the contents of Review column into individual words like : Poor, service, There, etc.
I am using Process documnets to data > Tokenize but somehow not getting the required output.
Please help.
Find more posts tagged with
Sort by:
1 - 2 of
21
Sort by:
1 - 2 of
21
Hi,
if you don't necessarily have to use the Text extension. You could also simply use the "Split" Operator (not to confuse with "Split Data") and use a regular expression. I would say something simple like \s+|\W+ should do the trick (to split along spaces or non word characters (letters and numbers).
Best,
David
David
Can you be more clear about why Tokenize is not giving you what you expect? What are you getting? If you share your process and a data sample it will be easier to troubleshoot. In general Tokenize should do exactly what you are asking for, take a text column and split it up into individual words.
David