🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

[SOLVED] Format input documents

User: "MarcosRL"
New Altair Community Member
Updated by Jocelyn
Hello friends of the community.
I have a question regarding the format of the input documents.
I try the procedure tokenize format files. "txt" and runs smoothly.
The original files I need to work with are in ". Docx" and ". Doc" for Microsoft Word, repeat the procedure for "tokenize" and read me document strange characters.
Is there a way to be able to document format. "Docx" and ". Doc"?

Find more posts tagged with

Sort by:
1 - 3 of 31
    User: "johan_CG"
    New Altair Community Member
    Hi MarcosRL

    Do you find a solution for .docx and .doc? I 've got the same problem.
    Thanks in advance.

    Johan
    User: "MarcosRL"
    New Altair Community Member
    OP
    Hi Joan
    yes, I solved.
    I did was convert documents from ".pdf" format to ".txt" (plain text format) instead of transforming Microsoft Word format (. docx - doc)
    Greetings from Argentina  :)
    User: "johan_CG"
    New Altair Community Member
    Hi Marcos

    Thank you for the tips.

    Greetings from France ;)