"Out of Memory when doing text classification without GUI"

New Altair Community Member

Dec 18, 2011

Updated Nov 5, 2024 by Jocelyn

Hi all,

I want to do some text classification tasks out of a self-written Java program using RapidMiner. I already learned a SVN Classification model and stored it to the repository. In my Java application, I read ids out of a database which points to my HDD where the text data is stored. This data is passed to RapidMiner. In order to save memory, the classification task isn't done for all data at once. Instead, I use block sizes. This is basically my application:

public class ApplyModel {

	static String process_definition_file = "apply_model.xml";
	static int num_of_domains = 100000;
	static int block_size = 100; // determines the number of examples classified at once
	static Boolean debug = true;

	public static void main(String[] args) {

		System.out.println("START OF APPLY MODEL");

		try {

			// set RapidMiner confs
			RapidMiner.setExecutionMode(ExecutionMode.COMMAND_LINE);

			int start = 0;
			int iteration = 1;
			while(start < num_of_domains) {

				// init RapidMiner
				RapidMiner.init();

				// read process definition
				Process rm = new Process(new File(process_definition_file));

				// avoid to fetch block size if limit is smaller than block size
				int current_limit = block_size;

				if(num_of_domains < block_size)

					current_limit = num_of_domains;

				// get data
				ImmutableList<RapidMiner2Row> data = [...]

				// transform to ExampleSet
				ExampleSet ex = new CData2ExampleSet().getExampleSet(data);

				// create IO Object
				IOObject ioo = ex;
				IOContainer ioc = new IOContainer(new IOObject[] {ioo});

				// run RapidMiner process
				IOContainer res_ioc = rm.run(ioc);

				// analyze results
				if(res_ioc.getElementAt(0) instanceof ExampleSet) {

					ExampleSet resultSet = (ExampleSet)res_ioc.getElementAt(0);

					// go through results
					for (Example example : resultSet) {

						[...]

					}

				}

				start += current_limit;
				iteration++;

				// clean up
				cdata = null;
				data = null;
				ex = null;
				ioo = null;
				ioc = null;
				rm = null;

			} // end of while

		}

		catch(Exception e) {

			[...]

		}

		System.out.println("END OF APPLY MODEL");

	}

}

Although the RapidMiner process is reinitiated for every data block, i am running into an OutOfMemory Exception (GC overhead limit exceeded). The memory problem depends on the actual amount of data. It only makes a small difference whether I run 100 iterations with 10 data sets or 10 iterations with 100 data sets. Does anyone have an idea?

Regards
Merlot

Find more posts tagged with

AI Studio

Developer

Text Mining + NLP

Sort by:

1 - 4 of 41

MariusHelf

New Altair Community Member

Dec 19, 2011

Hi Merlot,

can you process the data from within RapidMiner's GUI? Then you probably assigned more memory to the GUI application than to your own program. You can set the maximum amount of memory which is available for the Java Virtual Machine by specifying the -Xmx parameter, e.g. java -Xmx2048m to assign 2GB of RAM.

If you are using eclipse, you can set that parameter in the run configuration of your project.

Best regards,
Marius

Merlot

New Altair Community Member

Dec 19, 2011

Hi Marius,

thanks for your advice. I didn't try to run the process within RM's GUI yet as my data is split into database values (id + label) and files on my HDD (textual content) and I would like to avoid to implement the "logic" into RM.

I already set the -Xms and -Xmx option in Eclipse to 4 GB. As far as I can see, this amount of memory is really in use. Is there a way to destroy RapidMiner objects explicitely (maybe within my while loop) to free all used space after processing each data block?

Regards
Merlot

Marco_Boeck

New Altair Community Member

Dec 19, 2011

Hi,

you can drop a hint to the garbage collector that now might be a good time to do some work via use of


System.gc();

However that is not guarantueed to work.

You could also try to use a dirty hack, though I would not advise using it:


Object obj = new Object();
WeakReference ref = new WeakReference<Object>(obj);
obj = null;
while(ref.get() != null) {
       System.gc();
}

Use at your own risk.

And please remove the RapidMiner.init() from the loop and place it just after


RapidMiner.setExecutionMode(ExecutionMode.COMMAND_LINE);

Regards,
Marco

Merlot

New Altair Community Member

Dec 19, 2011

Hi Marco,

I already tried to call System.gc(); at the end of the while loop. No effect. :-(

I would like to avoid to use your dirty hack because this code will be part of my thesis. So it looks like I'm stuck in my memory problem.

Regards
Merlot

"Out of Memory when doing text classification without GUI"

Find more posts tagged with

Quick Links