Textmining with Excel Source

mario_playing_w
mario_playing_w New Altair Community Member
edited November 5 in Community Q&A
Hi,

I am trying to build my first textmining process on an excel example set. The datasource has a scope of around 5000 lines consisting of a label and text comments.

If I run my process on a subset of around 300 lines, everything works fine, if I use the whole dataset following error occurs:

Process failed!
The setup does not seem to contain any obvious errors, but you should check the log messages or activate the debug mode in the settings...

Including breakpoints unveils that the problem lies within the StringTextInput Queue

  <operator name="Nominal2String" class="Nominal2String">
            </operator>
            <operator name="StringTextInput" class="StringTextInput" expanded="yes">
                <parameter key="default_content_language" value="german"/>
                <list key="namespaces">
                </list>
                <operator name="StringTokenizer (2)" class="StringTokenizer">
                </operator>
                <operator name="ToLowerCaseConverter (2)" class="ToLowerCaseConverter">
                </operator>
                <operator name="GermanStopwordFilter" class="GermanStopwordFilter">
                </operator>
                <operator name="TokenLengthFilter" class="TokenLengthFilter">
                    <parameter key="max_chars" value="40"/>
                </operator>
                <operator name="GermanStemmer" class="GermanStemmer">
                </operator>
            </operator>

Did I miss something? What can I do in order to prevent this error. The Log only says the following in case of the whole dataset:

G Mar 3, 2010 11:01:33 AM: [Fatal] NullPointerException occured in 1st application of StringTextInput (StringTextInput)
G Mar 3, 2010 11:01:33 AM: [Fatal] Process failed: operator cannot be executed. Check the log messages...


Thanks in advance for any hints,

Mario
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi Mario,
    I would suggest to switch to RapidMiner 5.0. The new Text Processing Extension will ease your work a lot, for example this example would work :)
    Unfortunately we could not maintain process compatibility for the old Text Mining Plugin of 4.x, so if you are going to switch some day later, you will have to rebuilt your processes completely. So starting directly with 5.0 would really make things easier.

    Greetings,
      Sebastian
  • mario_playing_w
    mario_playing_w New Altair Community Member
    Hi Sebastian,

    thank you for your reply. It seems that Rapid Miner 5 is far more complicated that 4 was, since the processes aint that easy to build. It wont even let me connect the data flow inside the subprocess correctly, though i ve got a text imput. Probably i ve to play around with the tool first before coming back to textmining.

    Funny thing was that the first error which struck me was nearly the same as in version number 4.  ::)

    I ll come back with something as soon as version 5 likes me.

    Mario
  • land
    land New Altair Community Member
    Hi Mario,
    it might surprise you, but you are the first one who says that process design is more complicated in 5.0 than in 4.x. You can bet, I am surprised. I thought it would be more natural to drag and drop operators on the plane and connect wires at in and output ports?

    Greetings,
      Sebastian
  • mario_playing_w
    mario_playing_w New Altair Community Member
    Hi Sebastian,

    probably theres a strong correlation between the fact that i had a training on rapidminer 4 and not on 5. ;)

    In between i got some results at least and a working process. Maybe you could tell me how i can export the distribution table of a naive bayes classification? I tried the report and write csv functionality but it just throws several errors or an empty file. :(

    Thanks,

    Mario