When loading textfiles rapid miner is introducing spaces and special chars

lavramu
lavramu New Altair Community Member
edited November 5 in Community Q&A
Hi,

I am trying to do a very simple task of loading text files using the operator "Process documents from files" . After loading I see that there are spaces between each character in the file and also a special character (ÿþ) in the beginning of every file .

example :

b a l a n c e  s h e e t

I am really stuck and would appreciate any help.
I chose the regular options while loading files adn dint see this problem in any of the tutorials and is happening to me

Tagged:

Answers

  • lavramu
    lavramu New Altair Community Member
    adding to my question -- I notice this does not happen to all files but only to the ones I exported from nvivo. But I exported as normal text files and look normal to me but turn up wierd in Rapidminer. Please help.
  • aborg
    aborg New Altair Community Member
    Hello,
    Are you sure those second characters are spaces and not with code 0? (Spaces have code 32.) It seems -assuming those are 0s- that the nvivo files are saved as UTF-16 with byte order mark set. (I guess RM do not try to use the encoding specified by BOMs.)
    Cheers, gabor
  • lavramu
    lavramu New Altair Community Member
    thanks a lot for the reply. They look like spaces to me . I am not sure if they are anything else. In notepad I see them as a white space.
  • MariusHelf
    MariusHelf New Altair Community Member
    Di you try to change the encoding parameter of Process Documents from Files?

    Best regards,
    Marius
  • lavramu
    lavramu New Altair Community Member
    Let me try and post back..thanks!