Java Null Pointer exception in simple text process

crcowan
crcowan New Altair Community Member
edited November 5 in Community Q&A
I'm getting a null pointer exception in a very simple text process.  Here is the log message:
Jun 11, 2010 1:02:31 PM INFO: No filename given for result file, using stdout for logging results!
Jun 11, 2010 1:02:31 PM INFO: Loading initial data.
Jun 11, 2010 1:02:31 PM INFO: Process starts
Jun 11, 2010 1:02:31 PM WARNING: Insufficient input for Process.input 1
Jun 11, 2010 1:02:32 PM INFO: Executing process concurrently: Vector Creation
Jun 11, 2010 1:02:32 PM INFO: Executing process concurrently: Vector Creation
Jun 11, 2010 1:02:32 PM INFO: Executing process concurrently: Vector Creation
Jun 11, 2010 1:02:32 PM INFO: Executing process concurrently: Vector Creation
Jun 11, 2010 1:02:32 PM WARNING: Caught exception in concurrent execution of Filter Tokens (by Length) (Filter Tokens (by Length)): java.lang.NullPointerException
Jun 11, 2010 1:02:32 PM WARNING: Caught exception in concurrent execution of Filter Stopwords (English) (Filter Stopwords (English)): java.lang.NullPointerException
Jun 11, 2010 1:02:32 PM WARNING: Caught exception in concurrent execution of Stem (Porter) (Stem (Porter)): java.lang.NullPointerException
Jun 11, 2010 1:02:32 PM INFO: Executing process concurrently: Vector Creation
Jun 11, 2010 1:02:32 PM INFO: Executing process concurrently: Vector Creation
Jun 11, 2010 1:02:32 PM WARNING: Caught exception in concurrent execution of Filter Stopwords (English) (Filter Stopwords (English)): java.lang.NullPointerException
Jun 11, 2010 1:02:32 PM WARNING: Caught exception in concurrent execution of Filter Tokens (by Length) (Filter Tokens (by Length)): java.lang.NullPointerException
Jun 11, 2010 1:02:32 PM INFO: Stem (Porter): Process stopped.
Jun 11, 2010 1:02:32 PM WARNING: Caught exception in concurrent execution of Stem (Porter) (Stem (Porter)): com.rapidminer.operator.ProcessStoppedException: Process stopped in Stem (Porter)
Jun 11, 2010 1:02:32 PM SEVERE: Process failed: operator cannot be executed. Check the log messages...
Jun 11, 2010 1:02:32 PM SEVERE: Here:          Process[1] (Process)
          subprocess 'Main Process'
            +- Process Documents from Files[1] (Process Documents from Files)
          subprocess 'Vector Creation'
                  +- Tokenize[6] (Tokenize)
                  +- Filter Stopwords (English)[8] (Filter Stopwords (English))
      ==>        +- Filter Tokens (by Length)[8] (Filter Tokens (by Length))
                  +- Stem (Porter)[7] (Stem (Porter))
                  +- Generate n-Grams (Terms)[6] (Generate n-Grams (Terms))
Jun 11, 2010 1:02:32 PM SEVERE: java.lang.NullPointerException
Jun 11, 2010 1:02:32 PM INFO: Generate n-Grams (Terms): Process stopped.
Jun 11, 2010 1:02:32 PM WARNING: Caught exception in concurrent execution of Generate n-Grams (Terms) (Generate n-Grams (Terms)): com.rapidminer.operator.ProcessStoppedException: Process stopped in Generate n-Grams (Terms)
And here is the XML for the process flow:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="161" width="765">
      <operator activated="true" breakpoints="after" class="text:process_document_from_file" expanded="true" height="76" name="Process Documents from Files" width="90" x="45" y="30">
        <list key="text_directories">
          <parameter key="Test Input" value="C:\Documents and Settings\ccowan5\Desktop\Mining\TestPP"/>
        </list>
        <parameter key="parallelize_vector_creation" value="true"/>
        <process expanded="true" height="365" width="783">
          <operator activated="true" class="text:tokenize" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
          <operator activated="true" class="text:filter_stopwords_english" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="179" y="30"/>
          <operator activated="true" class="text:filter_by_length" expanded="true" height="60" name="Filter Tokens (by Length)" width="90" x="313" y="30">
            <parameter key="min_chars" value="3"/>
          </operator>
          <operator activated="true" class="text:stem_porter" expanded="true" height="60" name="Stem (Porter)" width="90" x="447" y="30"/>
          <operator activated="true" class="text:generate_n_grams_terms" expanded="true" height="60" name="Generate n-Grams (Terms)" width="90" x="581" y="30"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
          <connect from_op="Filter Stopwords (English)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
          <connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Stem (Porter)" to_port="document"/>
          <connect from_op="Stem (Porter)" from_port="document" to_op="Generate n-Grams (Terms)" to_port="document"/>
          <connect from_op="Generate n-Grams (Terms)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <connect from_port="input 1" to_op="Process Documents from Files" to_port="word list"/>
      <connect from_op="Process Documents from Files" from_port="example set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="source_input 2" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Am I doing something wrong or is there a bug?

Thank you.

      Charles
Tagged:

Answers

  • haddock
    haddock New Altair Community Member
    Hi Charles,

    Looking at the source of the text processing operators not many provide for the possibility that previous operators have stripped the document down to nothing, and looking at the log that seems a reasonable explanation of what could have caused the pop. If you run the process sequentially you should find the file that screws things up. I've taken to catching exceptions when dealing with the great wide world, as you never can tell what stuff is in free text, and life's too short to clean it up...

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="370" width="685">
          <operator activated="true" class="web:crawl_web" expanded="true" height="60" name="FT" width="90" x="45" y="165">
            <parameter key="url" value="http://www.ft.com/servicestools/newstracking/rss"/>
            <list key="crawling_rules">
              <parameter key="0" value="http://www.ft.com/rss.*"/>
              <parameter key="2" value="http://www.ft.com/rss.*"/>
            </list>
            <parameter key="write_pages_into_files" value="false"/>
            <parameter key="output_dir" value="C:\Documents and Settings\Administrator\My Documents\WebCrawler"/>
            <parameter key="max_pages" value="1000"/>
            <parameter key="max_depth" value="1"/>
            <parameter key="delay" value="10"/>
            <parameter key="max_threads" value="12"/>
            <parameter key="user_agent" value="haddock checking rapid-miner-crawler"/>
            <parameter key="obey_robot_exclusion" value="false"/>
            <parameter key="really_ignore_exclusion" value="true"/>
          </operator>
          <operator activated="true" class="web:crawl_web" expanded="true" height="60" name="BBC" width="90" x="45" y="75">
            <parameter key="url" value="http://news.bbc.co.uk/2/hi/help/3223484.stm"/>
            <list key="crawling_rules">
              <parameter key="0" value="http://newsrss.bbc.co.uk/rss/newsonline_world_edition/.*rss.xml"/>
              <parameter key="2" value="http://newsrss.bbc.co.uk/rss/newsonline_world_edition/.*rss.xml"/>
            </list>
            <parameter key="write_pages_into_files" value="false"/>
            <parameter key="output_dir" value="C:\Documents and Settings\Administrator\My Documents\WebCrawler"/>
            <parameter key="max_pages" value="1000"/>
            <parameter key="max_depth" value="1"/>
            <parameter key="delay" value="10"/>
            <parameter key="max_threads" value="12"/>
            <parameter key="user_agent" value="haddock checking rapid-miner-crawler"/>
            <parameter key="obey_robot_exclusion" value="false"/>
            <parameter key="really_ignore_exclusion" value="true"/>
          </operator>
          <operator activated="true" breakpoints="after" class="append" expanded="true" height="94" name="Append (2)" width="90" x="179" y="165"/>
          <operator activated="true" class="loop_values" expanded="true" height="76" name="Loop Values" width="90" x="203" y="50">
            <parameter key="attribute" value="Link"/>
            <parameter key="parallelize_iteration" value="true"/>
            <process expanded="true" height="353" width="809">
              <operator activated="true" class="handle_exception" expanded="true" height="76" name="Handle Exception" width="90" x="227" y="118">
                <process expanded="true" height="353" width="809">
                  <operator activated="true" class="web:read_rss" expanded="true" height="60" name="Read RSS Feed" width="90" x="210" y="99">
                    <parameter key="url" value="%{loop_value}"/>
                  </operator>
                  <connect from_op="Read RSS Feed" from_port="output" to_port="out 1"/>
                  <portSpacing port="source_in 1" spacing="0"/>
                  <portSpacing port="source_in 2" spacing="0"/>
                  <portSpacing port="sink_out 1" spacing="0"/>
                  <portSpacing port="sink_out 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="example set" to_op="Handle Exception" to_port="in 1"/>
              <connect from_op="Handle Exception" from_port="out 1" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" expanded="true" height="76" name="Append" width="90" x="372" y="48"/>
          <connect from_op="FT" from_port="Example Set" to_op="Append (2)" to_port="example set 2"/>
          <connect from_op="BBC" from_port="Example Set" to_op="Append (2)" to_port="example set 1"/>
          <connect from_op="Append (2)" from_port="merged set" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_op="Append" to_port="example set 1"/>
          <connect from_op="Append" from_port="merged set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    But not having the files to check against I could just be speaking through the thing I'm sitting on!


  • land
    land New Altair Community Member
    Hi,
    if you take a look at the preferences of RapidMiner that are available in the menu, you will find one, that is called debug mode.
    If you activate it, you will recieve a detailed error message. If you would post that here, maybe I could conclude, which code caused this error and fix it in one of the next updates.

    Greetings,
      Sebastian