"Feature Selection for Text Categorization"
olschimke
New Altair Community Member
Hi,
I use the Brown corpora for an experiment. I try to limit the number of features for the experiment by using feature selection.
I have used the wizard which comes with RapidMiner to setup a process. The data is loaded from a sparse matrix file into a sparse matrix.
How can I prevent that RM is running out of memory?
Thank you
P Jan 12, 2009 2:12:21 PM: Initialising process setup
P Jan 12, 2009 2:12:21 PM: [NOTE] No filename given for result file, using stdout for logging results!
P Jan 12, 2009 2:12:21 PM: Checking properties...
P Jan 12, 2009 2:12:21 PM: Properties are ok.
P Jan 12, 2009 2:12:21 PM: Checking process setup...
P Jan 12, 2009 2:12:21 PM: Inner operators are ok.
P Jan 12, 2009 2:12:21 PM: Checking i/o classes...
P Jan 12, 2009 2:12:21 PM: i/o classes are ok. Process output: ExampleSet, AttributeWeights, PerformanceVector.
P Jan 12, 2009 2:12:21 PM: Process ok.
P Jan 12, 2009 2:12:21 PM: Process initialised
P Jan 12, 2009 2:12:21 PM: [NOTE] Process starts
P Jan 12, 2009 2:12:21 PM: Process:
Root[1] (Process)
+- SparseFormatExampleSource[1] (SparseFormatExampleSource)
+- FS[1] (FeatureSelection)
+- FSChain[0] (OperatorChain)
+- XValidation[0] (XValidation)
| +- Learner[0] (LibSVMLearner)
| +- ApplierChain[0] (OperatorChain)
| +- Applier[0] (ModelApplier)
| +- Evaluator[0] (Performance)
+- ProcessLog[0] (ProcessLog)
P Jan 12, 2009 2:12:21 PM: [NOTE] SparseFormatExampleSource: The ID attribute 'id' is defined with a nominal value type but the possible values are not defined! Although this often does not lead to problems (unlike for labels or regular nominal attributes) you might want to specify the possible values by inner tags <value>first</value><value>second</value>....
G Jan 12, 2009 2:13:23 PM: [Fatal] OutOfMemoryError occured in 1st application of FS (FeatureSelection)
G Jan 12, 2009 2:13:23 PM: [Fatal] Process failed: Java heap space
Root[1] (Process)
+- SparseFormatExampleSource[1] (SparseFormatExampleSource)
here ==> +- FS[1] (FeatureSelection)
+- FSChain[0] (OperatorChain)
+- XValidation[0] (XValidation)
| +- Learner[0] (LibSVMLearner)
| +- ApplierChain[0] (OperatorChain)
| +- Applier[0] (ModelApplier)
| +- Evaluator[0] (Performance)
+- ProcessLog[0] (ProcessLog)
G Jan 12, 2009 2:13:24 PM: [Fatal] Java heap space
java.lang.OutOfMemoryError: Java heap space
at com.rapidminer.operator.features.selection.FeatureSelectionOperator.createInitialPopulation(FeatureSelectionOperator.java:172)
at com.rapidminer.operator.features.FeatureOperator.apply(FeatureOperator.java:264)
at com.rapidminer.operator.features.selection.FeatureSelectionOperator.apply(FeatureSelectionOperator.java:151)
at com.rapidminer.operator.Operator.apply(Operator.java:663)
at com.rapidminer.operator.OperatorChain.apply(OperatorChain.java:377)
at com.rapidminer.operator.Operator.apply(Operator.java:663)
at com.rapidminer.Process.run(Process.java:667)
at com.rapidminer.Process.run(Process.java:637)
at com.rapidminer.Process.run(Process.java:627)
at com.rapidminer.gui.ProcessThread.run(ProcessThread.java:61)
I use the Brown corpora for an experiment. I try to limit the number of features for the experiment by using feature selection.
I have used the wizard which comes with RapidMiner to setup a process. The data is loaded from a sparse matrix file into a sparse matrix.
How can I prevent that RM is running out of memory?
Thank you
P Jan 12, 2009 2:12:21 PM: Initialising process setup
P Jan 12, 2009 2:12:21 PM: [NOTE] No filename given for result file, using stdout for logging results!
P Jan 12, 2009 2:12:21 PM: Checking properties...
P Jan 12, 2009 2:12:21 PM: Properties are ok.
P Jan 12, 2009 2:12:21 PM: Checking process setup...
P Jan 12, 2009 2:12:21 PM: Inner operators are ok.
P Jan 12, 2009 2:12:21 PM: Checking i/o classes...
P Jan 12, 2009 2:12:21 PM: i/o classes are ok. Process output: ExampleSet, AttributeWeights, PerformanceVector.
P Jan 12, 2009 2:12:21 PM: Process ok.
P Jan 12, 2009 2:12:21 PM: Process initialised
P Jan 12, 2009 2:12:21 PM: [NOTE] Process starts
P Jan 12, 2009 2:12:21 PM: Process:
Root[1] (Process)
+- SparseFormatExampleSource[1] (SparseFormatExampleSource)
+- FS[1] (FeatureSelection)
+- FSChain[0] (OperatorChain)
+- XValidation[0] (XValidation)
| +- Learner[0] (LibSVMLearner)
| +- ApplierChain[0] (OperatorChain)
| +- Applier[0] (ModelApplier)
| +- Evaluator[0] (Performance)
+- ProcessLog[0] (ProcessLog)
P Jan 12, 2009 2:12:21 PM: [NOTE] SparseFormatExampleSource: The ID attribute 'id' is defined with a nominal value type but the possible values are not defined! Although this often does not lead to problems (unlike for labels or regular nominal attributes) you might want to specify the possible values by inner tags <value>first</value><value>second</value>....
G Jan 12, 2009 2:13:23 PM: [Fatal] OutOfMemoryError occured in 1st application of FS (FeatureSelection)
G Jan 12, 2009 2:13:23 PM: [Fatal] Process failed: Java heap space
Root[1] (Process)
+- SparseFormatExampleSource[1] (SparseFormatExampleSource)
here ==> +- FS[1] (FeatureSelection)
+- FSChain[0] (OperatorChain)
+- XValidation[0] (XValidation)
| +- Learner[0] (LibSVMLearner)
| +- ApplierChain[0] (OperatorChain)
| +- Applier[0] (ModelApplier)
| +- Evaluator[0] (Performance)
+- ProcessLog[0] (ProcessLog)
G Jan 12, 2009 2:13:24 PM: [Fatal] Java heap space
java.lang.OutOfMemoryError: Java heap space
at com.rapidminer.operator.features.selection.FeatureSelectionOperator.createInitialPopulation(FeatureSelectionOperator.java:172)
at com.rapidminer.operator.features.FeatureOperator.apply(FeatureOperator.java:264)
at com.rapidminer.operator.features.selection.FeatureSelectionOperator.apply(FeatureSelectionOperator.java:151)
at com.rapidminer.operator.Operator.apply(Operator.java:663)
at com.rapidminer.operator.OperatorChain.apply(OperatorChain.java:377)
at com.rapidminer.operator.Operator.apply(Operator.java:663)
at com.rapidminer.Process.run(Process.java:667)
at com.rapidminer.Process.run(Process.java:637)
at com.rapidminer.Process.run(Process.java:627)
at com.rapidminer.gui.ProcessThread.run(ProcessThread.java:61)
0
Answers
-
Hi,
could you please post your process file (*.xml) so that I can see what rapid miner does when it runs out of memory?
Other informations I need are size of RAM, size of the data set, number of attributes.
Greetings,
Sebastian0