Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
"Reading Microsoft word documents (word count)"
SergeMerz
Hi,
I did some searching on this topic and found almost nothing on reading DOC and DOCX documents with 'Read Document' step. Is this possible without converting MS word document to a supported format (e.g. CSV,PDF, RTF, HTML)? I have 1000's of word documents so I would like to read them without pre-processing.
Regards,
Serge
Find more posts tagged with
AI Studio
Text Mining + NLP
Accepted answers
All comments
Marco_Boeck
Hi,
I'm afraid that is currently not possible.
Regards,
Marco
johan_CG
Hi
I have the same problem.
Currently I use a bash script to convert DOC and DOCX but I would like to avoid this pre-processing step.
Please let me know if you find something that can help.
Regards
Johan
MariusHelf
Unfortunately RapidMiner is not capable of dealing with Word documents natively. You have to use a command line tool to extract the text, e.g. antiword:
http://www-stud.rbi.informatik.uni-frankfurt.de/~markus/antiword/
You can run the program from your RapidMiner process with the Execute Program operator.
Best regards,
Marius
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups