"Text Input from DB in RM-5"

klerx
klerx New Altair Community Member
edited November 5 in Community Q&A
Hi

I tried to import text fields, to calculate word vectors, from a Mysql DB with the following process chain:

Read Database
Nominal to Text
Data to Documents
Tokenize

and i got the failure message "Expected Document but received IOObjectCollection.

Is there a mistake in my chain?

I tried more or less every other combination from text processing operators but i was not able to calculate the word vector.

With the old plugin I used the StringTextInput Operator but as mentioned in another post this operator is depreciated in RM-5.

Did anyone manage this with RM-5?

bw joachim

Answers

  • land
    land New Altair Community Member
    Hi Joachim.
    you need to include a Process Documents operator for processing single documents. In your case, when you have the data in an example set's text attribute, you must choose the Process Documents from Data operator. All Process Documents operators are Super operators, that have a subprocess. You must put the tokenize Operator into this subprocess.
    The Data to Documents will just generate Documents from an Example Set. This might be needed for arbitrary purpose, but not in this special case. If you have a Collection of Documents anyway (which you can recognize on the doubled line on the document output port), you might process it with the Process Documents operator.

    Greetings,
      Sebastian
  • klerx
    klerx New Altair Community Member


    after a few hours of frustrating search, i discovered, that you can access the subprocess by double click an the super operator ;-) (maybe this is helpfull for other ...)

    now it works ...

    Thank you for your help, ...

    bw Joachim
  • IngoRM
    IngoRM New Altair Community Member
    Glad you found it  ;)

    Without nesting processes, RapidMiner is only worth the half  :D

    The documentation - which of course also covers this - is on its way. Until then, the video tutorials at

    http://rapid-i.com/content/view/189/198/

    might be useful. There you can see how you can access subprocesses (among other nice features...).

    Cheers,
    Ingo
  • guitarslinger
    guitarslinger New Altair Community Member
    Hi there,

    I am facing an issue on a related topic:

    I use an retrieve operator to get a column with text out of an MySQL-DB, having this one connected to an "Process Documents from Files" Operator.

    Here I get the error "The example set must contain at least one text attribute"

    I set an alias in the SQL-Query when building the repository entry for the DB naming it "text" and i set the field type in MySQL to "text" as well but still can't manage to get it coonected.

    What am I doing wrong?

    THx for your help in advance!

    Regards GS
  • land
    land New Altair Community Member
    Hi,
    you must change the attribute type of the attribute that contains the text to "text". Use the Nominal to Text operator on this attribute.

    And I guess you mean Process Documents from Data instead of files? Otherwise you cannot use the ExampleSet at all.

    Greetings,
      Sebastian
  • guitarslinger
    guitarslinger New Altair Community Member
    Hi, worked!

    Thank you very much, such an incredible software you created!
  • guitarslinger
    guitarslinger New Altair Community Member
    Hi, me again:

    I am now trying to create a word list as result of my process showing the occurence and the frequency of the tokenized terms in the texts coming from the database.

    But I don't manage to get the colums "occurence" and "frequency" in the result word list as I have seen in the tutorial video on text mining.
    The only difference seems to be that in the video the text ist loaded from various documents, I load them from a database, convert them to text, and then process them.


    Thx in advance fpr your help
  • land
    land New Altair Community Member
    Hi,
    which version of RapidMiner and Text Processing Extension do you use? If I remember correctly, this feature was added in one of the update releases of final 5.0.

    Greetings,
      Sebastian
  • guitarslinger
    guitarslinger New Altair Community Member
    Rapid Miner 5.0.3
    Text Ext: 5.0.2

    Thx GS
  • land
    land New Altair Community Member
    Hi,
    are there the columns Total Occurrences and Document Occurrences? These are the renamed columns from the tutorial. "occurrence" and "frequency" isn't very meaningful, so we decided to rename them.

    Greetings,
      Sebastian