🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Java Heap space ERROR"

User: "noah977"
New Altair Community Member
Updated by Jocelyn
Hello,

I am attempting to do some basic toeknization of text files.  I will then attempt to cluster them

Right now, I am testing with only 200 small text files.  RM processes for a while and then gives me an out of memory error.  I have given 1GIG of memory to RM.

I would eventually like to use RM to cluster batches of 1,000 or even 10,000 files, but am concerned that I can not even do the basic tokenization of only 200.

Please let me know if you have any ideas or suggestions.

Thanks!!

---------------------

Below is the XML of my process

<process version="4.2">

  <operator name="Root" class="Process" expanded="yes">
      <operator name="TextInput" class="TextInput" expanded="yes">
          <parameter key="create_text_visualizer" value="true"/>
          <parameter key="default_content_language" value="english"/>
          <list key="namespaces">
          </list>
          <parameter key="on_the_fly_pruning" value="0"/>
          <parameter key="prune_below" value="10%"/>
          <list key="texts">
            <parameter key="News_Articles" value="/Users/noah/Desktop/test_files"/>
          </list>
          <operator name="StringTokenizer" class="StringTokenizer">
          </operator>
          <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
          </operator>
          <operator name="TokenLengthFilter" class="TokenLengthFilter">
              <parameter key="min_chars" value="3"/>
          </operator>
          <operator name="TermNGramGenerator" class="TermNGramGenerator">
          </operator>
      </operator>
  </operator>

</process>

Find more posts tagged with