I was trying to build several types of classifiers, including SVM, Naive-Bayes, and Neural Network. The training processes for these models have been finished successfully. However, when I try to apply the built model for testing purposes, some of them are failed. In specific, the trained SVM model can be applied to the testing set as normal. However, the process gets failed when applying trained Naive Bayes model to the test data set. I launched the Rapidminer process as follows, which are the same for different models
java -Xmx30g -jar "C:\Program Files\Rapid-I\RapidMiner5\lib\rapidminer.jar"
The error message was like
Oct 29, 2012 6:26:49 PM com.rapidminer.tools.WrapperLoggingHandler logWarning WARNING: KernelDistribution: The given example set does not contain a regular at tribute with name 'êδÜ_∞'. This might cause problems for some models depending o n this particular attribute. Oct 29, 2012 6:26:49 PM com.rapidminer.tools.WrapperLoggingHandler logWarning WARNING: KernelDistribution: The given example set does not contain a regular at tribute with name 'ê∞'. This might cause problems for some models depending on t his particular attribute. Oct 29, 2012 6:26:49 PM com.rapidminer.tools.WrapperLoggingHandler logWarning WARNING: KernelDistribution: The given example set does not contain a regular at tribute with name 'ê∞_δ'. This might cause problems for some models depending on this particular attribute. Oct 29, 2012 6:26:49 PM com.rapidminer.tools.WrapperLoggingHandler logWarning WARNING: KernelDistribution: The given example set does not contain a regular at tribute with name 'ê∞_∞'. This might cause problems for some models depending on this particular attribute. Oct 29, 2012 6:26:49 PM com.rapidminer.tools.WrapperLoggingHandler logWarning WARNING: KernelDistribution: The given example set does not contain a regular at tribute with name 'ê∞Ü'. This might cause problems for some models depending on this particular attribute. Oct 29, 2012 6:26:49 PM com.rapidminer.tools.WrapperLoggingHandler logWarning WARNING: KernelDistribution: The given example set does not contain a regular at tribute with name 'ê∞Ü╡δ'. This might cause problems for some models depending o n this particular attribute. Oct 29, 2012 6:26:49 PM com.rapidminer.tools.WrapperLoggingHandler logWarning WARNING: KernelDistribution: The given example set does not contain a regular at tribute with name 'ê∞Ü╡δ_êδ'. This might cause problems for some models dependin g on this particular attribute. Oct 29, 2012 6:26:49 PM com.rapidminer.tools.WrapperLoggingHandler logWarning WARNING: KernelDistribution: The given example set does not contain a regular at tribute with name 'êφ'. This might cause problems for some models depending on t his particular attribute. Oct 29, 2012 6:26:49 PM com.rapidminer.tools.WrapperLoggingHandler logWarning WARNING: KernelDistribution: The given example set does not contain a regular at tribute with name 'êφ_δ'. This might cause problems for some models depending on this particular attribute. Oct 29, 2012 6:26:49 PM com.rapidminer.tools.WrapperLoggingHandler logWarning WARNING: KernelDistribution: The given example set does not contain a regular at tribute with name 'ê∩'. This might cause problems for some models depending on t his particular attribute. Oct 29, 2012 6:26:49 PM com.rapidminer.tools.WrapperLoggingHandler logWarning WARNING: KernelDistribution: The given example set does not contain a regular at tribute with name 'ê∩_£'. This might cause problems for some models depending on this particular attribute. Oct 29, 2012 6:26:49 PM com.rapidminer.tools.WrapperLoggingHandler logWarning WARNING: KernelDistribution: The given example set does not contain a regular at tribute with name 'ê∩_£Σ'. This might cause problems for some models depending o n this particular attribute. Oct 29, 2012 6:26:49 PM com.rapidminer.gui.ProcessThread run SEVERE: Process failed: Input example set does not have a label attribute com.rapidminer.operator.UserError: Input example set does not have a label attri bute at com.rapidminer.example.Tools.isLabelled(Tools.java:380) at com.rapidminer.operator.performance.PolynominalClassificationPerforma nceEvaluator.checkCompatibility(PolynominalClassificationPerformanceEvaluator.ja va:103) at com.rapidminer.operator.performance.AbstractPerformanceEvaluator.doWo rk(AbstractPerformanceEvaluator.java:234) at com.rapidminer.operator.Operator.execute(Operator.java:833) at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUn itExecutor.java:51) at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:369) at com.rapidminer.operator.Operator.execute(Operator.java:833) at com.rapidminer.Process.run(Process.java:920) at com.rapidminer.Process.run(Process.java:843) at com.rapidminer.Process.run(Process.java:802) at com.rapidminer.Process.run(Process.java:797) at com.rapidminer.Process.run(Process.java:787) at com.rapidminer.gui.ProcessThread.run(ProcessThread.java:63)
Oct 29, 2012 6:26:49 PM com.rapidminer.gui.ProcessThread run SEVERE: Here: Process[1] (Process) subprocess 'Main Process' +- Retrieve[1] (Retrieve) +- Process Documents from Files (2)[1] (Process Documents from File s) subprocess 'Vector Creation' | +- Tokenize (2)[0] (Tokenize) | +- Transform Cases (2)[0] (Transform Cases) | +- Filter Stopwords (English)[0] (Filter Stopwords (English))
| +- Generate n-Grams (Terms)[0] (Generate n-Grams (Terms)) +- Retrieve (2)[1] (Retrieve) +- Apply Model[1] (Apply Model) ==> +- Performance[1] (Performance (Classification)) +- Select Attributes[0] (Select Attributes) +- Write CSV[0] (Write CSV)
|
The model application workflow are the same for different models, except that we use different models. The workflow is here
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="386" width="711">
<operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
<parameter key="repository_entry" value="nb_Train_F_words"/>
</operator>
<operator activated="true" class="text:process_document_from_file" compatibility="5.2.004" expanded="true" height="76" name="Process Documents from Files (2)" width="90" x="179" y="75">
<list key="text_directories">
<parameter key="Responsive" value="C:\Validation Sets\total responsive"/>
<parameter key="NonResponsive" value="C:\Validation Sets\Not Resp"/>
</list>
<parameter key="extract_text_only" value="false"/>
<parameter key="vector_creation" value="Binary Term Occurrences"/>
<parameter key="prune_method" value="absolute"/>
<parameter key="prune_below_absolute" value="5"/>
<parameter key="prune_above_absolute" value="5000000"/>
<parameter key="prune_below_rank" value="5.0"/>
<parameter key="prune_above_rank" value="5.0"/>
<process expanded="true" height="362" width="674">
<operator activated="true" class="text:tokenize" compatibility="5.2.004" expanded="true" height="60" name="Tokenize (2)" width="90" x="45" y="30"/>
<operator activated="true" class="text:transform_cases" compatibility="5.2.004" expanded="true" height="60" name="Transform Cases (2)" width="90" x="180" y="30"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="5.2.004" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="315" y="73"/>
<operator activated="true" class="text:generate_n_grams_terms" compatibility="5.2.004" expanded="true" height="60" name="Generate n-Grams (Terms)" width="90" x="447" y="165"/>
<connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
<connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
<connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_op="Generate n-Grams (Terms)" to_port="document"/>
<connect from_op="Generate n-Grams (Terms)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve (2)" width="90" x="179" y="300">
<parameter key="repository_entry" value="nb_Train_F_model"/>
</operator>
<operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="313" y="300">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="5.2.008" expanded="true" height="76" name="Performance" width="90" x="447" y="75">
<list key="class_weights"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="447" y="165">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="|confidence(non_res)|confidence(res)|label|prediction(label)"/>
</operator>
<operator activated="true" class="write_csv" compatibility="5.2.008" expanded="true" height="76" name="Write CSV" width="90" x="581" y="210">
<parameter key="csv_file" value="C:\Users\Desktop\rapidminerRepository\Project1\Total responsive - naivebayes\scorevalue_naiveBayesian.csv"/>
<parameter key="column_separator" value=","/>
<parameter key="quote_nominal_values" value="false"/>
<parameter key="format_date_attributes" value="false"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Process Documents from Files (2)" to_port="word list"/>
<connect from_op="Process Documents from Files (2)" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Retrieve (2)" from_port="output" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="result 2"/>
<connect from_op="Performance" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Write CSV" to_port="input"/>
<connect from_op="Write CSV" from_port="through" to_port="result 1"/>