"Reading Microsoft word documents (word count)"
Hi,
I did some searching on this topic and found almost nothing on reading DOC and DOCX documents with 'Read Document' step. Is this possible without converting MS word document to a supported format (e.g. CSV,PDF, RTF, HTML)? I have 1000's of word documents so I would like to read them without pre-processing.
Regards,
Serge
I did some searching on this topic and found almost nothing on reading DOC and DOCX documents with 'Read Document' step. Is this possible without converting MS word document to a supported format (e.g. CSV,PDF, RTF, HTML)? I have 1000's of word documents so I would like to read them without pre-processing.
Regards,
Serge
Find more posts tagged with
Sort by:
1 - 3 of
31
Unfortunately RapidMiner is not capable of dealing with Word documents natively. You have to use a command line tool to extract the text, e.g. antiword: http://www-stud.rbi.informatik.uni-frankfurt.de/~markus/antiword/
You can run the program from your RapidMiner process with the Execute Program operator.
Best regards,
Marius
You can run the program from your RapidMiner process with the Execute Program operator.
Best regards,
Marius
I'm afraid that is currently not possible.
Regards,
Marco