Cannot find the cluster internal validation operator in rapid miner 7

davidtoh
davidtoh New Altair Community Member
edited November 5 in Community Q&A
Hi all,
I am new to rapid miner. However recently while reading of the documentation (Data Mining Use Cases and Business Analytics Applications), I tried following the steps. However, one of the operator, cluster internal validation, is missing from my studio. Is there anyway I could resolve the issue? Thanks.

Answers

  • thapli_64
    thapli_64 New Altair Community Member

    I'm having the same problem. Any recommendations for alternatives?

  • sgenzer
    sgenzer
    Altair Employee

    hello @thapli_64 - yes the processes that are on that website are outdated.  It comes up all the time.  Can you post the process so I can see what it "used" to be called?

     

    Scott

     

     

  • thapli_64
    thapli_64 New Altair Community Member

    @sgenzer Here is the process. The cluster internal validation operator is in the Loop Parameters sub-process.

     <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true" height="539" width="748">
    <operator activated="true" class="read_aml" compatibility="5.2.003" expanded="true" height="60" name="Read AML" width="90" x="45" y="120">
    <parameter key="attributes" value="C:\Users\Vuchko\Desktop\Grouping higher education students with RapidMiner\Grouping higher education students with RapidMiner\Datasets\ClusteringStudents.aml"/>
    <parameter key="sample_ratio" value="1.0"/>
    <parameter key="sample_size" value="-1"/>
    <parameter key="permute" value="false"/>
    <parameter key="decimal_point_character" value="."/>
    <parameter key="column_separators" value=",\s*|;\s*|\s+"/>
    <parameter key="use_comment_characters" value="true"/>
    <parameter key="comment_chars" value="#"/>
    <parameter key="use_quotes" value="true"/>
    <parameter key="quote_character" value="&quot;"/>
    <parameter key="quoting_escape_character" value="\"/>
    <parameter key="trim_lines" value="false"/>
    <parameter key="skip_error_lines" value="false"/>
    <parameter key="datamanagement" value="double_array"/>
    <parameter key="encoding" value="SYSTEM"/>
    <parameter key="use_local_random_seed" value="false"/>
    <parameter key="local_random_seed" value="1992"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="5.2.003" expanded="true" height="76" name="Set Role" width="90" x="179" y="165">
    <parameter key="name" value="Students_success"/>
    <parameter key="target_role" value="batch"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="5.2.003" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="210">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value="Region|Sex"/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="attribute_value"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="time"/>
    <parameter key="block_type" value="attribute_block"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_matrix_row_start"/>
    <parameter key="invert_selection" value="true"/>
    <parameter key="include_special_attributes" value="false"/>
    </operator>
    <operator activated="true" class="replace_missing_values" compatibility="5.2.000" expanded="true" height="94" name="Replace Missing Values" width="90" x="447" y="210">
    <parameter key="return_preprocessing_model" value="false"/>
    <parameter key="create_view" value="false"/>
    <parameter key="attribute_filter_type" value="all"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="attribute_value"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="time"/>
    <parameter key="block_type" value="attribute_block"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_matrix_row_start"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    <parameter key="default" value="average"/>
    <list key="columns"/>
    </operator>
    <operator activated="true" class="normalize" compatibility="5.2.003" expanded="true" height="94" name="Normalize" width="90" x="581" y="255">
    <parameter key="return_preprocessing_model" value="false"/>
    <parameter key="create_view" value="false"/>
    <parameter key="attribute_filter_type" value="all"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="numeric"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="real"/>
    <parameter key="block_type" value="value_series"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_series_end"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    <parameter key="method" value="range transformation"/>
    <parameter key="min" value="0.0"/>
    <parameter key="max" value="1.0"/>
    </operator>
    <operator activated="true" class="loop_parameters" compatibility="5.2.003" expanded="true" height="94" name="Loop Parameters" width="90" x="648" y="120">
    <list key="parameters">
    <parameter key="Select Subprocess.select_which" value="[1.0;4;4;linear]"/>
    </list>
    <parameter key="synchronize" value="false"/>
    <process expanded="true" height="627" width="499">
    <operator activated="true" class="select_subprocess" compatibility="5.2.003" expanded="true" height="94" name="Select Subprocess" width="90" x="112" y="165">
    <parameter key="select_which" value="4"/>
    <process expanded="true" height="627" width="224">
    <operator activated="true" class="k_means" compatibility="5.2.003" expanded="true" height="76" name="K-Means" width="90" x="45" y="165">
    <parameter key="add_cluster_attribute" value="true"/>
    <parameter key="add_as_label" value="false"/>
    <parameter key="remove_unlabeled" value="false"/>
    <parameter key="k" value="3"/>
    <parameter key="max_runs" value="10"/>
    <parameter key="determine_good_start_values" value="false"/>
    <parameter key="measure_types" value="MixedMeasures"/>
    <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
    <parameter key="nominal_measure" value="NominalDistance"/>
    <parameter key="numerical_measure" value="EuclideanDistance"/>
    <parameter key="divergence" value="SquaredEuclideanDistance"/>
    <parameter key="kernel_type" value="radial"/>
    <parameter key="kernel_gamma" value="1.0"/>
    <parameter key="kernel_sigma1" value="1.0"/>
    <parameter key="kernel_sigma2" value="0.0"/>
    <parameter key="kernel_sigma3" value="2.0"/>
    <parameter key="kernel_degree" value="3.0"/>
    <parameter key="kernel_shift" value="1.0"/>
    <parameter key="kernel_a" value="1.0"/>
    <parameter key="kernel_b" value="0.0"/>
    <parameter key="max_optimization_steps" value="100"/>
    <parameter key="use_local_random_seed" value="true"/>
    <parameter key="local_random_seed" value="1992"/>
    </operator>
    <connect from_port="input 1" to_op="K-Means" to_port="example set"/>
    <connect from_op="K-Means" from_port="cluster model" to_port="output 1"/>
    <connect from_op="K-Means" from_port="clustered set" to_port="output 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    <portSpacing port="sink_output 3" spacing="0"/>
    </process>
    <process expanded="true" height="627" width="224">
    <operator activated="true" class="support_vector_clustering" compatibility="5.2.003" expanded="true" height="76" name="Support Vector Clustering" width="90" x="45" y="120">
    <parameter key="add_cluster_attribute" value="true"/>
    <parameter key="add_as_label" value="false"/>
    <parameter key="remove_unlabeled" value="false"/>
    <parameter key="min_pts" value="2"/>
    <parameter key="kernel_type" value="radial"/>
    <parameter key="kernel_gamma" value="1.0"/>
    <parameter key="kernel_degree" value="2"/>
    <parameter key="kernel_a" value="1.0"/>
    <parameter key="kernel_b" value="0.0"/>
    <parameter key="kernel_cache" value="200"/>
    <parameter key="convergence_epsilon" value="0.0010"/>
    <parameter key="max_iterations" value="100000"/>
    <parameter key="p" value="0.0"/>
    <parameter key="r" value="-1.0"/>
    <parameter key="number_sample_points" value="20"/>
    </operator>
    <connect from_port="input 1" to_op="Support Vector Clustering" to_port="example set"/>
    <connect from_op="Support Vector Clustering" from_port="cluster model" to_port="output 1"/>
    <connect from_op="Support Vector Clustering" from_port="clustered set" to_port="output 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    <portSpacing port="sink_output 3" spacing="0"/>
    </process>
    <process expanded="true" height="627" width="145">
    <operator activated="true" class="k_medoids" compatibility="5.2.003" expanded="true" height="76" name="K-Medoids" width="90" x="45" y="165">
    <parameter key="add_cluster_attribute" value="true"/>
    <parameter key="add_as_label" value="false"/>
    <parameter key="remove_unlabeled" value="false"/>
    <parameter key="k" value="3"/>
    <parameter key="max_runs" value="10"/>
    <parameter key="max_optimization_steps" value="100"/>
    <parameter key="use_local_random_seed" value="true"/>
    <parameter key="local_random_seed" value="1992"/>
    <parameter key="measure_types" value="MixedMeasures"/>
    <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
    <parameter key="nominal_measure" value="NominalDistance"/>
    <parameter key="numerical_measure" value="EuclideanDistance"/>
    <parameter key="divergence" value="GeneralizedIDivergence"/>
    <parameter key="kernel_type" value="radial"/>
    <parameter key="kernel_gamma" value="1.0"/>
    <parameter key="kernel_sigma1" value="1.0"/>
    <parameter key="kernel_sigma2" value="0.0"/>
    <parameter key="kernel_sigma3" value="2.0"/>
    <parameter key="kernel_degree" value="3.0"/>
    <parameter key="kernel_shift" value="1.0"/>
    <parameter key="kernel_a" value="1.0"/>
    <parameter key="kernel_b" value="0.0"/>
    </operator>
    <connect from_port="input 1" to_op="K-Medoids" to_port="example set"/>
    <connect from_op="K-Medoids" from_port="cluster model" to_port="output 1"/>
    <connect from_op="K-Medoids" from_port="clustered set" to_port="output 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    <portSpacing port="sink_output 3" spacing="0"/>
    </process>
    <process expanded="true" height="376" width="158">
    <operator activated="true" class="dbscan" compatibility="5.2.003" expanded="true" height="76" name="DBScan" width="90" x="26" y="150">
    <parameter key="epsilon" value="1.0"/>
    <parameter key="min_points" value="5"/>
    <parameter key="add_cluster_attribute" value="true"/>
    <parameter key="add_as_label" value="false"/>
    <parameter key="remove_unlabeled" value="false"/>
    <parameter key="measure_types" value="MixedMeasures"/>
    <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
    <parameter key="nominal_measure" value="NominalDistance"/>
    <parameter key="numerical_measure" value="EuclideanDistance"/>
    <parameter key="divergence" value="GeneralizedIDivergence"/>
    <parameter key="kernel_type" value="radial"/>
    <parameter key="kernel_gamma" value="1.0"/>
    <parameter key="kernel_sigma1" value="1.0"/>
    <parameter key="kernel_sigma2" value="0.0"/>
    <parameter key="kernel_sigma3" value="2.0"/>
    <parameter key="kernel_degree" value="3.0"/>
    <parameter key="kernel_shift" value="1.0"/>
    <parameter key="kernel_a" value="1.0"/>
    <parameter key="kernel_b" value="0.0"/>
    </operator>
    <connect from_port="input 1" to_op="DBScan" to_port="example set"/>
    <connect from_op="DBScan" from_port="cluster model" to_port="output 1"/>
    <connect from_op="DBScan" from_port="clustered set" to_port="output 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    <portSpacing port="sink_output 3" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="whibo:gc_validation" compatibility="0.9.001" expanded="true" height="76" name="Cluster internal validation" width="90" x="380" y="255">
    <parameter key="Distance_Measure" value="rs.fon.whibo.GC.component.DistanceMeasure.Euclidian"/>
    <parameter key="Intra_Cluster_Distance" value="false"/>
    <parameter key="Connectivity" value="false"/>
    <parameter key="NN_Connectivity" value="2"/>
    <parameter key="Global_Silhouette_Index" value="true"/>
    <parameter key="Min_Max_Cut" value="false"/>
    <parameter key="XB_Index" value="false"/>
    <parameter key="DaviesBouldin" value="false"/>
    </operator>
    <connect from_port="input 1" to_op="Select Subprocess" to_port="input 1"/>
    <connect from_op="Select Subprocess" from_port="output 1" to_op="Cluster internal validation" to_port="cluster model"/>
    <connect from_op="Select Subprocess" from_port="output 2" to_op="Cluster internal validation" to_port="training set"/>
    <connect from_op="Cluster internal validation" from_port="performances" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_performance" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Read AML" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
    <connect from_op="Replace Missing Values" from_port="example set output" to_op="Normalize" to_port="example set input"/>
    <connect from_op="Normalize" from_port="example set output" to_op="Loop Parameters" to_port="input 1"/>
    <connect from_op="Loop Parameters" from_port="result 1" to_port="result 1"/>
    <connect from_op="Loop Parameters" from_port="result 2" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

  • sgenzer
    sgenzer
    Altair Employee

    aha.  No wonder I do not recognize that operator.  It's part of the WhiBo extension (https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_whibo) which I have never used before.  You can see the reference in the XML if you look closely :)

     

    <operator activated="true" class="whibo:gc_validation" compatibility="0.9.001" expanded="true" height="76" name="Cluster internal validation" width="90" x="380" y="255">

    So 1) I highly recommend you upgrade from RapidMiner 5.2 (holy cow that's an old version) to 7.6.1 (most current version), and 2) download the WhiBo extension from the marketplace (you can do this by going to "Extensions" in the menu bar of RM Studio).

     

    Good luck.


    Scott

     

     

  • halaalrobassy
    halaalrobassy New Altair Community Member
    i downloaded Whibo but it didn't solve the problem, i couldn't find  Silhouette index parameter which was existed in cluster internal validation , also the design space of Whibo GDT evaluationary search didn't appear. i downloaded whibo extension and updated and approved the licence , but still didn't appear 
  • Telcontar120
    Telcontar120 New Altair Community Member
    I don't think that extension has been updated in a very long time.
    You would probably have to do this via a python or R script now.
  • halaalrobassy
    halaalrobassy New Altair Community Member
    i downloaded whibo extension but i wonder why whibo design space doesn't work, what is the problem?
  • sgenzer
    sgenzer
    Altair Employee
    @halaalrobassy please see note by @Telcontar120. It is still true.