Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
"Split Text"
fmueller
Hi guys
I have several Text files with difffrent number of words and i want to split it into Text Files with max 500 words?
How can i segment this Text Files in RapidMiner? I try it with the operator SplitSegmenter but i have no idea how i can set the regular expression.
Can anybode help me?
Regards
Florian
Find more posts tagged with
AI Studio
Text Mining + NLP
Accepted answers
All comments
Ryujakk
Hi,
I'm not really sure what you want...
BUT! You can always try this regex:
([^\s]+\s){500}
What it does is search for any number of non whitespace characters, followed by a whitespace character, this pattern repeated 500 times. It works on the site
http://www.regexplanet.com/simple/index.html
at least (credits to Sebastian for the URL
) !
- R
fmueller
Thanks for your answer...
So i will clarify my problem a little bit...for example:
Text1.txt (Total: 110 words)
Text2.txt (Total: 410 words)
Text3.txt (Tota: 50 words)
I need Text Files in 50 words blocks...the result should be:
Text1.txt -> 3 Segments Files: Text1_Seg1.txt (50 words), Text1_Seg2.txt (50 words), Text1_Seg3.txt (10 words) = Total 110 words
......
can i do this with the operator SplitSegmenter or TextSegmenter (TextMining PlugIn)
Thanks for your answers
land
Hi,
did you try the regular expression above? I don't know why it shouldn't work...
Greetings,
Sebastian
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups