Time Optimization
Hi,
I am working with KMedoids clustering with 1.7MB text data.But it has been running for the last 3 and half days.The other operators took only 10 minutes .The KMedoids only taking the remaining time.Is there any way to optimize the process.The process is mentioned below.
<operator name="Root" class="Process" expanded="yes">
<description text="#ylt#h3#ygt#Optimizing vector creation for text classification#ylt#/h3#ygt##ylt#p#ygt#This experiments shows how to apply a cross validation to a classifier that learns to separate two sets of texts.#ylt#/p#ygt#"/>
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\data1.xls"/>
<parameter key="first_row_as_names" value="true"/>
</operator>
<operator name="Nominal2String" class="Nominal2String">
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<list key="namespaces">
</list>
<operator name="ToLowerCaseConverter" class="ToLowerCaseConverter">
</operator>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
</operator>
</operator>
<operator name="KMedoids" class="KMedoids">
<parameter key="k" value="25"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="is_nominal"/>
</operator>
<operator name="ExcelExampleSetWriter" class="ExcelExampleSetWriter">
<parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\cluster1.xls"/>
</operator>
</operator>
Thanks
Ratheesan
I am working with KMedoids clustering with 1.7MB text data.But it has been running for the last 3 and half days.The other operators took only 10 minutes .The KMedoids only taking the remaining time.Is there any way to optimize the process.The process is mentioned below.
<operator name="Root" class="Process" expanded="yes">
<description text="#ylt#h3#ygt#Optimizing vector creation for text classification#ylt#/h3#ygt##ylt#p#ygt#This experiments shows how to apply a cross validation to a classifier that learns to separate two sets of texts.#ylt#/p#ygt#"/>
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\data1.xls"/>
<parameter key="first_row_as_names" value="true"/>
</operator>
<operator name="Nominal2String" class="Nominal2String">
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<list key="namespaces">
</list>
<operator name="ToLowerCaseConverter" class="ToLowerCaseConverter">
</operator>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
</operator>
</operator>
<operator name="KMedoids" class="KMedoids">
<parameter key="k" value="25"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="is_nominal"/>
</operator>
<operator name="ExcelExampleSetWriter" class="ExcelExampleSetWriter">
<parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\cluster1.xls"/>
</operator>
</operator>
Thanks
Ratheesan
Find more posts tagged with
Sort by:
1 - 11 of
111
Hi,
we have parallelized many important operators for the Enterprise Edition, but KMedoids is not part of it. But for the money of an Enterprise Edition, we could write you a parallelized KMedoids. One could even think about optimizing the operator for small example sets with many attributes like it is frequent in text mining tasks.
Greetings,
Sebastian
we have parallelized many important operators for the Enterprise Edition, but KMedoids is not part of it. But for the money of an Enterprise Edition, we could write you a parallelized KMedoids. One could even think about optimizing the operator for small example sets with many attributes like it is frequent in text mining tasks.
Greetings,
Sebastian
Hi Sebastian,
I have tried the above process with Cosine similarity.But always getting the message " There is no obvious error,check the log file".Before applying KMedoids I used Attribute filter operator and selected numeric attributes because in KMedoids Numerical measures only provides Cosine similarity.
Thanks
Ratheesan
I have tried the above process with Cosine similarity.But always getting the message " There is no obvious error,check the log file".Before applying KMedoids I used Attribute filter operator and selected numeric attributes because in KMedoids Numerical measures only provides Cosine similarity.
Thanks
Ratheesan
Hi Sebastian,
Thanks for your valuable help. This is my process
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\data1.xls"/>
<parameter key="first_row_as_names" value="true"/>
</operator>
<operator name="Nominal2String" class="Nominal2String">
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<list key="namespaces">
</list>
<operator name="ToLowerCaseConverter" class="ToLowerCaseConverter">
</operator>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="is_numerical"/>
<parameter key="parameter_string" value="sample"/>
<parameter key="apply_on_special" value="true"/>
</operator>
<operator name="KMedoids" class="KMedoids">
<parameter key="k" value="3"/>
<parameter key="max_runs" value="5"/>
<parameter key="max_optimization_steps" value="10"/>
<parameter key="measure_types" value="NumericalMeasures"/>
<parameter key="numerical_measure" value="CosineSimilarity"/>
</operator>
<operator name="ExcelExampleSetWriter" class="ExcelExampleSetWriter">
<parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\modelcluster.xls"/>
</operator>
</operator>
If am using up to 250 records,its working properly but if going for more than 250 records I am getting the above message.
Thanks
Ratheesan.
Thanks for your valuable help. This is my process
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\data1.xls"/>
<parameter key="first_row_as_names" value="true"/>
</operator>
<operator name="Nominal2String" class="Nominal2String">
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<list key="namespaces">
</list>
<operator name="ToLowerCaseConverter" class="ToLowerCaseConverter">
</operator>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="is_numerical"/>
<parameter key="parameter_string" value="sample"/>
<parameter key="apply_on_special" value="true"/>
</operator>
<operator name="KMedoids" class="KMedoids">
<parameter key="k" value="3"/>
<parameter key="max_runs" value="5"/>
<parameter key="max_optimization_steps" value="10"/>
<parameter key="measure_types" value="NumericalMeasures"/>
<parameter key="numerical_measure" value="CosineSimilarity"/>
</operator>
<operator name="ExcelExampleSetWriter" class="ExcelExampleSetWriter">
<parameter key="excel_file" value="C:\Documents and Settings\ADMIN\Desktop\modelcluster.xls"/>
</operator>
</operator>
If am using up to 250 records,its working properly but if going for more than 250 records I am getting the above message.
Thanks
Ratheesan.
Hi,
the process just runs fine on here. I used 722 texts, but there was no error, at least not at the first few minutes of the KMedoids run.
Of course I don't have exactly the same setup, because I'm using different texts. Uhm. I suggest, you should switch your RapidMiner to debug mode, so that you could post me the detailed error message. Go to the Tools menu and select Preferences. Enable the rapidminer.general.debugmode checkbox in the tab General.
Then please reexecute the process and send me the error message.
Greetings,
Sebastian
the process just runs fine on here. I used 722 texts, but there was no error, at least not at the first few minutes of the KMedoids run.
Of course I don't have exactly the same setup, because I'm using different texts. Uhm. I suggest, you should switch your RapidMiner to debug mode, so that you could post me the detailed error message. Go to the Tools menu and select Preferences. Enable the rapidminer.general.debugmode checkbox in the tab General.
Then please reexecute the process and send me the error message.
Greetings,
Sebastian
Hi Sebastian,
I reexecuted the process after changing to the debug mode.Here I am attaching the error message.
Root[1] (Process)
+- ExcelExampleSource[1] (ExcelExampleSource)
+- Nominal2String[1] (Nominal2String)
+- StringTextInput[1] (StringTextInput)
| +- ToLowerCaseConverter[600] (ToLowerCaseConverter)
| +- StringTokenizer[600] (StringTokenizer)
| +- EnglishStopwordFilter[600] (EnglishStopwordFilter)
| +- TokenLengthFilter[600] (TokenLengthFilter)
+- AttributeFilter (2)[1] (AttributeFilter)
here ==> +- KMedoids[1] (KMedoids)
java.lang.NullPointerException
at com.rapidminer.operator.clustering.clusterer.KMedoids.generateClusterModel(KMedoids.java:176)
at com.rapidminer.operator.clustering.clusterer.AbstractClusterer.apply(AbstractClusterer.java:60)
at com.rapidminer.operator.Operator.apply(Operator.java:671)
at com.rapidminer.operator.OperatorChain.apply(OperatorChain.java:424)
at com.rapidminer.operator.Operator.apply(Operator.java:671)
at com.rapidminer.Process.run(Process.java:735)
at com.rapidminer.Process.run(Process.java:704)
at com.rapidminer.Process.run(Process.java:694)
at com.rapidminer.gui.ProcessThread.run(ProcessThread.java:59)
Thanks
Ratheesan.
I reexecuted the process after changing to the debug mode.Here I am attaching the error message.
Root[1] (Process)
+- ExcelExampleSource[1] (ExcelExampleSource)
+- Nominal2String[1] (Nominal2String)
+- StringTextInput[1] (StringTextInput)
| +- ToLowerCaseConverter[600] (ToLowerCaseConverter)
| +- StringTokenizer[600] (StringTokenizer)
| +- EnglishStopwordFilter[600] (EnglishStopwordFilter)
| +- TokenLengthFilter[600] (TokenLengthFilter)
+- AttributeFilter (2)[1] (AttributeFilter)
here ==> +- KMedoids[1] (KMedoids)
java.lang.NullPointerException
at com.rapidminer.operator.clustering.clusterer.KMedoids.generateClusterModel(KMedoids.java:176)
at com.rapidminer.operator.clustering.clusterer.AbstractClusterer.apply(AbstractClusterer.java:60)
at com.rapidminer.operator.Operator.apply(Operator.java:671)
at com.rapidminer.operator.OperatorChain.apply(OperatorChain.java:424)
at com.rapidminer.operator.Operator.apply(Operator.java:671)
at com.rapidminer.Process.run(Process.java:735)
at com.rapidminer.Process.run(Process.java:704)
at com.rapidminer.Process.run(Process.java:694)
at com.rapidminer.gui.ProcessThread.run(ProcessThread.java:59)
Thanks
Ratheesan.
Hi,
this is a forum, neither this is consulting nor is it a course. I cannot answer EACH question regarding this or that algorithm or measure. Just try it out yourself. In fact, you cannot even say what is a good measure or algorithm, because this always depends on the data, on your data, I don't have.
Greetings,
Sebastian
this is a forum, neither this is consulting nor is it a course. I cannot answer EACH question regarding this or that algorithm or measure. Just try it out yourself. In fact, you cannot even say what is a good measure or algorithm, because this always depends on the data, on your data, I don't have.
Greetings,
Sebastian
unfortunately it takes time to calculate all the distances needed. One hint: It might be useful to switch to CosineSimilarity. That's more suitable for text mining than euclidean distance.
Greetings,
Sebastian