"Java Heap space ERROR"

New Altair Community Member

Nov 25, 2008

Updated Nov 5, 2024 by Jocelyn

Hello,

I am attempting to do some basic toeknization of text files. I will then attempt to cluster them

Right now, I am testing with only 200 small text files. RM processes for a while and then gives me an out of memory error. I have given 1GIG of memory to RM.

I would eventually like to use RM to cluster batches of 1,000 or even 10,000 files, but am concerned that I can not even do the basic tokenization of only 200.

Please let me know if you have any ideas or suggestions.

Thanks!!

---------------------

Below is the XML of my process


<process version="4.2">

  <operator name="Root" class="Process" expanded="yes">
      <operator name="TextInput" class="TextInput" expanded="yes">
          <parameter key="create_text_visualizer"	value="true"/>
          <parameter key="default_content_language"	value="english"/>
          <list key="namespaces">
          </list>
          <parameter key="on_the_fly_pruning"	value="0"/>
          <parameter key="prune_below"	value="10%"/>
          <list key="texts">
            <parameter key="News_Articles"	value="/Users/noah/Desktop/test_files"/>
          </list>
          <operator name="StringTokenizer" class="StringTokenizer">
          </operator>
          <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
          </operator>
          <operator name="TokenLengthFilter" class="TokenLengthFilter">
              <parameter key="min_chars"	value="3"/>
          </operator>
          <operator name="TermNGramGenerator" class="TermNGramGenerator">
          </operator>
      </operator>
  </operator>

</process>

Find more posts tagged with

AI Studio

Java

🎉Community Raffle - Win $25

"Java Heap space ERROR"

Find more posts tagged with

Quick Links