Process document from data

barthos
barthos New Altair Community Member
edited November 5 in Community Q&A
Hello,
I'm very begginer at Rapid Miner and applying the video tutorials found on http://vancouverdata.blogspot.com/2010/11/text-analytics-with-rapidminer-part-3.html (text mining)
I have a problem at the very basic level.
I want to use the tool "Process document from data" to compute binary word vectors
To do so, I load  an excel file with the embedded read excel tool. My file is a unique columns with 500 rows each containing text data. I then send this to the "exa" input of the Process document from data box. In the box, I make some basic processings (tokenize, single case, word filter and token filter). And I connect the "exa" output of the box to the results connector.
The problem is that I dont get vectors but only a two columns table, first column = row numbers (1,2,etc.), second rows titled "text" but with empty cells. The description of the data is : ExampleSet(437 examples, 1 special attribute, 0 regular attributes). What can I do ????

When I put a break point after the read excel tool, I get (in the results) a two columns table, the first one with Row No. and the second with the rows in my excel file. So it looks like the file is red properly...
Help!
Thanks,
Barthélémy

Answers

  • colo
    colo New Altair Community Member
    Hi Barthélémy,

    you have to tell the "Process Documents from Data" operator which attribute shall be treated as text. Usually if you use the similar operators for files or documents this is clear. The document or file body is used as text, but if you have an example set there are many attributes that can potentially contain the text. You have to set this before the processing starts (even if you only have one single attribute). To do so, use the "Nominal to text" operator after "Read Excel". The attribute with type text is then used as document content for the processing inside the following operator.

    Best regards
    Matthias
  • barthos
    barthos New Altair Community Member
    Fantastic!
    Thanks a lot Mathias, you make me gain about two days of work !
    I'd like to offer you a beer. I'm in paris, what about you?
    Thanks again,
    Barthélémy
  • mrfabrittzio
    mrfabrittzio New Altair Community Member

    You sir are brilliant, thx so much!

  • laurahajnalka
    laurahajnalka New Altair Community Member

    Dear Matthias,

     

    I have a similar problem, but not the same. I have csv files with two columns. One contains words of a document, the other contains the occurrence of the words. I would like to filter the rows I do not need for my model.  I used the "Nominal to text" operator, but I still can not filter the stopwords, because the "Process Documents from Data" operator seems to not working. Whatever I put inside, the result is going to be 0 lines. What can I do?

     

    Thank you in advance!

    Laura