"problems in tokenizing imported Excel text documents via

gk
gk New Altair Community Member
edited November 5 in Community Q&A
I am new to rapidminer, and have been watching videos and reading the postings. Still have the foggist idea, hence this post.

I want to import text from Excel (with each row = a short document), and then tokenize to generate a wordlist. It's simple, but I don't know why I can't get any wordlist by executing the following steps.

1. Read excel (one column with 317 rows, and each row stores a short document)
2. Data to documents
3. Process documents
3a. Tokenize within Process documents

The results:

for the ExampleSet(Process Documents) two columns are shown: first column (Row No.) with 1 to 317, but second column (text), empty.

for the Wordlist (Process Documents) is totally empty.

When I execute steps 1 and 2, it works as I can see the outputs, but not nothing with steps 3 and 3a.

Have been trying in the last two days, so a bit drained, and demoralized as I believe it's a simple problem.

Appreciate if someone can point out my mistake.

Thank you,

George

Answers

  • colo
    colo New Altair Community Member
    Hi George,

    did you convert the attribute's type to text before using the "Data to Documents" operator? Otherwise the former attribute is just added as meta data to the document and not set as the document's content.
    I just replied to a post dealing with such a problem: http://rapid-i.com/rapidforum/index.php/topic,3457.0.html

    This might be the solution for a simple problem (assumed you forgot it). If you already did the type conversion you perhaps might post your process XML here...

    Regards
    Matthias