"Only minimum data can handled by k-means"
venkat
New Altair Community Member
Hi All,
While trying with k-means algorithm, Rapidminer can able to process only 335mb not more that.
here is my XML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="5.3.013" expanded="true" height="60" name="Read CSV" width="90" x="45" y="75">
<parameter key="csv_file" value="/root/Desktop/rapidsimpletest/337mb.csv"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="normalize" compatibility="5.3.013" expanded="true" height="94" name="Normalize" width="90" x="246" y="255"/>
<operator activated="true" class="k_means" compatibility="5.3.013" expanded="true" height="76" name="Clustering" width="90" x="447" y="120">
<parameter key="k" value="3"/>
<parameter key="measure_types" value="MixedMeasures"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Erro Description:
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Could not parse line 24 in input: com.rapidminer.tools.CSVParseException: Value quote misplaced at position 88. Last characters read: ormation,"
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Could not parse line 24 in input: com.rapidminer.tools.CSVParseException: Value quote misplaced at position 42. Last characters read: Archive,"
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Could not parse line 24 in input: com.rapidminer.tools.CSVParseException: Value quote misplaced at position 67. Last characters read: Home,,,,"
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Maximum number of warnings exceeded. Will display no further warnings.
Apr 09, 2014 4:02:06 PM com.rapidminer.gui.ProcessThread run
SEVERE: Process failed: Java heap space
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
at java.lang.StringBuffer.append(StringBuffer.java:322)
at java.io.BufferedReader.readLine(BufferedReader.java:351)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at com.rapidminer.gui.tools.dialogs.wizards.dataimport.csv.LineReader.readLine(LineReader.java:55)
at com.rapidminer.operator.nio.model.CSVResultSet.readNext(CSVResultSet.java:149)
at com.rapidminer.operator.nio.model.CSVResultSet.next(CSVResultSet.java:195)
at com.rapidminer.operator.nio.model.DataResultSetTranslator.read(DataResultSetTranslator.java:148)
at com.rapidminer.operator.nio.model.AbstractDataResultSetReader.createExampleSet(AbstractDataResultSetReader.java:147)
at com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:52)
at com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:36)
at com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:126)
at com.rapidminer.operator.Operator.execute(Operator.java:867)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:375)
at com.rapidminer.operator.Operator.execute(Operator.java:867)
at com.rapidminer.Process.run(Process.java:949)
at com.rapidminer.Process.run(Process.java:873)
at com.rapidminer.Process.run(Process.java:832)
at com.rapidminer.Process.run(Process.java:827)
at com.rapidminer.Process.run(Process.java:817)
at com.rapidminer.gui.ProcessThread.run(ProcessThread.java:63)
Apr 09, 2014 4:02:06 PM com.rapidminer.gui.ProcessThread run
SEVERE: Here: Process[1] (Process)
subprocess 'Main Process'
==> +- Read CSV[1] (Read CSV)
+- Normalize[0] (Normalize)
+- Clustering[0] (k-Means)
I am trying 3 clusters with k-means. But the results are not proper.
Cluster Model
Cluster 0: 22292 items
Cluster 1: 1 items
Cluster 2: 1 items
Total number of items: 22294
could you please guys help me, where I am doing mistakes?
Thanks.
Venkat
While trying with k-means algorithm, Rapidminer can able to process only 335mb not more that.
here is my XML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="5.3.013" expanded="true" height="60" name="Read CSV" width="90" x="45" y="75">
<parameter key="csv_file" value="/root/Desktop/rapidsimpletest/337mb.csv"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="normalize" compatibility="5.3.013" expanded="true" height="94" name="Normalize" width="90" x="246" y="255"/>
<operator activated="true" class="k_means" compatibility="5.3.013" expanded="true" height="76" name="Clustering" width="90" x="447" y="120">
<parameter key="k" value="3"/>
<parameter key="measure_types" value="MixedMeasures"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Erro Description:
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Could not parse line 24 in input: com.rapidminer.tools.CSVParseException: Value quote misplaced at position 88. Last characters read: ormation,"
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Could not parse line 24 in input: com.rapidminer.tools.CSVParseException: Value quote misplaced at position 42. Last characters read: Archive,"
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Could not parse line 24 in input: com.rapidminer.tools.CSVParseException: Value quote misplaced at position 67. Last characters read: Home,,,,"
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Maximum number of warnings exceeded. Will display no further warnings.
Apr 09, 2014 4:02:06 PM com.rapidminer.gui.ProcessThread run
SEVERE: Process failed: Java heap space
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
at java.lang.StringBuffer.append(StringBuffer.java:322)
at java.io.BufferedReader.readLine(BufferedReader.java:351)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at com.rapidminer.gui.tools.dialogs.wizards.dataimport.csv.LineReader.readLine(LineReader.java:55)
at com.rapidminer.operator.nio.model.CSVResultSet.readNext(CSVResultSet.java:149)
at com.rapidminer.operator.nio.model.CSVResultSet.next(CSVResultSet.java:195)
at com.rapidminer.operator.nio.model.DataResultSetTranslator.read(DataResultSetTranslator.java:148)
at com.rapidminer.operator.nio.model.AbstractDataResultSetReader.createExampleSet(AbstractDataResultSetReader.java:147)
at com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:52)
at com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:36)
at com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:126)
at com.rapidminer.operator.Operator.execute(Operator.java:867)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:375)
at com.rapidminer.operator.Operator.execute(Operator.java:867)
at com.rapidminer.Process.run(Process.java:949)
at com.rapidminer.Process.run(Process.java:873)
at com.rapidminer.Process.run(Process.java:832)
at com.rapidminer.Process.run(Process.java:827)
at com.rapidminer.Process.run(Process.java:817)
at com.rapidminer.gui.ProcessThread.run(ProcessThread.java:63)
Apr 09, 2014 4:02:06 PM com.rapidminer.gui.ProcessThread run
SEVERE: Here: Process[1] (Process)
subprocess 'Main Process'
==> +- Read CSV[1] (Read CSV)
+- Normalize[0] (Normalize)
+- Clustering[0] (k-Means)
I am trying 3 clusters with k-means. But the results are not proper.
Cluster Model
Cluster 0: 22292 items
Cluster 1: 1 items
Cluster 2: 1 items
Total number of items: 22294
could you please guys help me, where I am doing mistakes?
Thanks.
Venkat
Tagged:
0
Answers
-
How much main memory did you assign to RapidMiner?
How many rows and columns do your datasets have?
Best regards,
Marius0 -
Hi Marius,
I have given 1 GB for main memory. My input file contains 143000 rows and 5 columns. The size of my input file is 337 MB.
I can able to process the 335MB.
Reg,
Venkat0 -
1 GB is not much for a data mining process. RapidMiner often expands the read data in main memory to optimize it for the data mining tasks. The algorithms then oftentimes also need more memory than just the size of the pure data to create the models etc.
You should definitely try to increase the amount of avilable memory.
Best regards,
Marius0