K Nearest for reducing colour pallet used in ML

robin
robin New Altair Community Member
edited November 5 in Community Q&A
We are applying ML in creating images, we want to apply ML to the images to try and determine which colour certain customer clusters respond to. 

The problem is that there are 16 million different colour possibilities when we apply OCR to the final image that is created. 

Colours in Hex code are encode #000000 or #ffffff (black and white). But RM does not know that the RM company logo has primarily four colours #f5e44c, #e26937, #7a7d82 and #32353d. It does not even understand the distances between these colours and that #f5e44c and #e26937 would be nearest neighbours. It would not know to recommend using #ce6033 because it is so close to #e26937.

The calculation of the colour codes follows this methodology: 
White RGB Color

White RGB code = 255*65536+255*256+255 = #FFFFFF

Blue RGB Color

Blue RGB code = 0*65536+0*256+255 = #0000FF

Red RGB Color

Red RGB code = 255*65536+0*256+0 = #FF0000

Green RGB Color

Green RGB code = 0*65536+255*256+0 = #00FF00

Gray RGB Color

Gray RGB code = 128*65536+128*256+128 = #808080

Yellow RGB Color

Yellow RGB code = 255*65536+255*256+0 = #FFFF00

Has anyone used RM to reduce the number of colours used in ML by applying K Nearest Neighbour? I want to reduce that 16 million down to a much more usable number of around 117 colours. 
Tagged:

Best Answer

  • YYH
    YYH
    Altair Employee
    Answer ✓
    Sample process to get color codes and run k-means on that:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Retrieve RGB data" width="90" x="179" y="34">
            <parameter key="repository_entry" value="RGB data"/>
          </operator>
          <operator activated="false" class="radoop:spark_kmeans" compatibility="9.1.000" expanded="true" height="82" name="K-Means" width="90" x="380" y="187">
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <parameter key="number_of_clusters" value="2"/>
            <parameter key="maximum_iterations" value="20"/>
            <parameter key="initialization_mode" value="k-means||"/>
            <parameter key="parallel_runs" value="1"/>
            <parameter key="epsilon" value="1.0E-4"/>
            <parameter key="file_format" value="TEXTFILE"/>
          </operator>
          <operator activated="true" class="concurrency:k_means" compatibility="9.2.000" expanded="true" height="82" name="Clustering (2)" width="90" x="380" y="34">
            <parameter key="add_cluster_attribute" value="true"/>
            <parameter key="add_as_label" value="true"/>
            <parameter key="remove_unlabeled" value="false"/>
            <parameter key="k" value="117"/>
            <parameter key="max_runs" value="10"/>
            <parameter key="determine_good_start_values" value="true"/>
            <parameter key="measure_types" value="NumericalMeasures"/>
            <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
            <parameter key="nominal_measure" value="NominalDistance"/>
            <parameter key="numerical_measure" value="EuclideanDistance"/>
            <parameter key="divergence" value="SquaredEuclideanDistance"/>
            <parameter key="kernel_type" value="radial"/>
            <parameter key="kernel_gamma" value="1.0"/>
            <parameter key="kernel_sigma1" value="1.0"/>
            <parameter key="kernel_sigma2" value="0.0"/>
            <parameter key="kernel_sigma3" value="2.0"/>
            <parameter key="kernel_degree" value="3.0"/>
            <parameter key="kernel_shift" value="1.0"/>
            <parameter key="kernel_a" value="1.0"/>
            <parameter key="kernel_b" value="0.0"/>
            <parameter key="max_optimization_steps" value="100"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
          </operator>
          <connect from_op="Retrieve RGB data" from_port="output" to_op="Clustering (2)" to_port="example set"/>
          <connect from_op="Clustering (2)" from_port="cluster model" to_port="result 1"/>
          <connect from_op="Clustering (2)" from_port="clustered set" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    </code><?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.2.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
            <parameter key="generator_type" value="numeric series"/>
            <parameter key="number_of_examples" value="256"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration">
              <parameter key="r" value="linear.0\.0.255\.0"/>
            </list>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="numerical_to_polynominal" compatibility="9.2.000" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="179" y="34">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="real"/>
            <parameter key="block_type" value="value_series"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_series_end"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="concurrency:loop_values" compatibility="9.2.000" expanded="true" height="82" name="Loop Values" width="90" x="380" y="34">
            <parameter key="attribute" value="r"/>
            <parameter key="iteration_macro" value="Rvalue"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="utility:create_exampleset" compatibility="9.2.000" expanded="true" height="68" name="Create ExampleSet (2)" width="90" x="45" y="187">
                <parameter key="generator_type" value="numeric series"/>
                <parameter key="number_of_examples" value="256"/>
                <parameter key="use_stepsize" value="false"/>
                <list key="function_descriptions"/>
                <parameter key="add_id_attribute" value="false"/>
                <list key="numeric_series_configuration">
                  <parameter key="g" value="linear.0\.0.255\.0"/>
                </list>
                <list key="date_series_configuration"/>
                <list key="date_series_configuration (interval)"/>
                <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
                <parameter key="time_zone" value="SYSTEM"/>
                <parameter key="column_separator" value=","/>
                <parameter key="parse_all_as_nominal" value="false"/>
                <parameter key="decimal_point_character" value="."/>
                <parameter key="trim_attribute_names" value="true"/>
              </operator>
              <operator activated="true" class="numerical_to_polynominal" compatibility="9.2.000" expanded="true" height="82" name="Numerical to Polynominal (2)" width="90" x="179" y="187">
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="numeric"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="real"/>
                <parameter key="block_type" value="value_series"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_series_end"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
              </operator>
              <operator activated="true" class="concurrency:loop_values" compatibility="9.2.000" expanded="true" height="82" name="Loop Values (2)" width="90" x="313" y="187">
                <parameter key="attribute" value="g"/>
                <parameter key="iteration_macro" value="Gvalue"/>
                <parameter key="reuse_results" value="false"/>
                <parameter key="enable_parallel_execution" value="false"/>
                <process expanded="true">
                  <operator activated="true" class="utility:create_exampleset" compatibility="9.2.000" expanded="true" height="68" name="Create ExampleSet (3)" width="90" x="112" y="136">
                    <parameter key="generator_type" value="numeric series"/>
                    <parameter key="number_of_examples" value="256"/>
                    <parameter key="use_stepsize" value="false"/>
                    <list key="function_descriptions"/>
                    <parameter key="add_id_attribute" value="false"/>
                    <list key="numeric_series_configuration">
                      <parameter key="b" value="linear.0\.0.255\.0"/>
                    </list>
                    <list key="date_series_configuration"/>
                    <list key="date_series_configuration (interval)"/>
                    <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
                    <parameter key="time_zone" value="SYSTEM"/>
                    <parameter key="column_separator" value=","/>
                    <parameter key="parse_all_as_nominal" value="false"/>
                    <parameter key="decimal_point_character" value="."/>
                    <parameter key="trim_attribute_names" value="true"/>
                  </operator>
                  <operator activated="true" class="numerical_to_polynominal" compatibility="9.2.000" expanded="true" height="82" name="Numerical to Polynominal (3)" width="90" x="246" y="136">
                    <parameter key="attribute_filter_type" value="all"/>
                    <parameter key="attribute" value=""/>
                    <parameter key="attributes" value=""/>
                    <parameter key="use_except_expression" value="false"/>
                    <parameter key="value_type" value="numeric"/>
                    <parameter key="use_value_type_exception" value="false"/>
                    <parameter key="except_value_type" value="real"/>
                    <parameter key="block_type" value="value_series"/>
                    <parameter key="use_block_type_exception" value="false"/>
                    <parameter key="except_block_type" value="value_series_end"/>
                    <parameter key="invert_selection" value="false"/>
                    <parameter key="include_special_attributes" value="false"/>
                  </operator>
                  <operator activated="true" class="concurrency:loop_values" compatibility="9.2.000" expanded="true" height="82" name="Loop Values (3)" width="90" x="380" y="136">
                    <parameter key="attribute" value="b"/>
                    <parameter key="iteration_macro" value="Bvalue"/>
                    <parameter key="reuse_results" value="false"/>
                    <parameter key="enable_parallel_execution" value="false"/>
                    <process expanded="true">
                      <operator activated="true" class="utility:create_exampleset" compatibility="9.2.000" expanded="true" height="68" name="Create ExampleSet (4)" width="90" x="112" y="85">
                        <parameter key="generator_type" value="comma separated text"/>
                        <parameter key="number_of_examples" value="100"/>
                        <parameter key="use_stepsize" value="false"/>
                        <list key="function_descriptions"/>
                        <parameter key="add_id_attribute" value="false"/>
                        <list key="numeric_series_configuration"/>
                        <list key="date_series_configuration"/>
                        <list key="date_series_configuration (interval)"/>
                        <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
                        <parameter key="time_zone" value="SYSTEM"/>
                        <parameter key="input_csv_text" value="R, G, B&#10;%{Rvalue}, %{Gvalue}, %{Bvalue}"/>
                        <parameter key="column_separator" value=","/>
                        <parameter key="parse_all_as_nominal" value="false"/>
                        <parameter key="decimal_point_character" value="."/>
                        <parameter key="trim_attribute_names" value="true"/>
                      </operator>
                      <connect from_op="Create ExampleSet (4)" from_port="output" to_port="output 1"/>
                      <portSpacing port="source_input 1" spacing="0"/>
                      <portSpacing port="source_input 2" spacing="0"/>
                      <portSpacing port="sink_output 1" spacing="0"/>
                      <portSpacing port="sink_output 2" spacing="0"/>
                    </process>
                  </operator>
                  <operator activated="true" class="append" compatibility="9.2.000" expanded="true" height="82" name="Append" width="90" x="514" y="136">
                    <parameter key="datamanagement" value="double_array"/>
                    <parameter key="data_management" value="auto"/>
                    <parameter key="merge_type" value="all"/>
                  </operator>
                  <connect from_op="Create ExampleSet (3)" from_port="output" to_op="Numerical to Polynominal (3)" to_port="example set input"/>
                  <connect from_op="Numerical to Polynominal (3)" from_port="example set output" to_op="Loop Values (3)" to_port="input 1"/>
                  <connect from_op="Loop Values (3)" from_port="output 1" to_op="Append" to_port="example set 1"/>
                  <connect from_op="Append" from_port="merged set" to_port="output 1"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="append" compatibility="9.2.000" expanded="true" height="82" name="Append (2)" width="90" x="447" y="187">
                <parameter key="datamanagement" value="double_array"/>
                <parameter key="data_management" value="auto"/>
                <parameter key="merge_type" value="all"/>
              </operator>
              <connect from_op="Create ExampleSet (2)" from_port="output" to_op="Numerical to Polynominal (2)" to_port="example set input"/>
              <connect from_op="Numerical to Polynominal (2)" from_port="example set output" to_op="Loop Values (2)" to_port="input 1"/>
              <connect from_op="Loop Values (2)" from_port="output 1" to_op="Append (2)" to_port="example set 1"/>
              <connect from_op="Append (2)" from_port="merged set" to_port="output 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="9.2.000" expanded="true" height="82" name="Append (3)" width="90" x="581" y="34">
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="merge_type" value="all"/>
          </operator>
          <operator activated="true" class="store" compatibility="9.2.000" expanded="true" height="68" name="Store" width="90" x="715" y="34">
            <parameter key="repository_entry" value="RGB data"/>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Numerical to Polynominal" to_port="example set input"/>
          <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Loop Values" to_port="input 1"/>
          <connect from_op="Loop Values" from_port="output 1" to_op="Append (3)" to_port="example set 1"/>
          <connect from_op="Append (3)" from_port="merged set" to_op="Store" to_port="input"/>
          <connect from_op="Store" from_port="through" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    </pre><div><br></div>K-means applied on 16million data would take a while...<br><br><p><pre class="CodeBlock"><code>


Answers

  • YYH
    YYH
    Altair Employee
    Hi @robin,

    Thanks for sharing this interesting use case. 

    The grouping of 16 million of color codes into 117 clusters could be solved with K-means clustering. The questions is what is the best way to represent the colors quantitatively. RGB, or CMYK, or else...

    For instances, I simulated 16 million (256*256*256) colors in RGB codes. rgb(206, 96, 51) represented #ce6033, rgb(226, 105, 55) represented RapidMiner orange. We could use Euclidean distances as measurements in clustering model, k-means where k=117.



    YY

  • YYH
    YYH
    Altair Employee
    Answer ✓
    Sample process to get color codes and run k-means on that:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Retrieve RGB data" width="90" x="179" y="34">
            <parameter key="repository_entry" value="RGB data"/>
          </operator>
          <operator activated="false" class="radoop:spark_kmeans" compatibility="9.1.000" expanded="true" height="82" name="K-Means" width="90" x="380" y="187">
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <parameter key="number_of_clusters" value="2"/>
            <parameter key="maximum_iterations" value="20"/>
            <parameter key="initialization_mode" value="k-means||"/>
            <parameter key="parallel_runs" value="1"/>
            <parameter key="epsilon" value="1.0E-4"/>
            <parameter key="file_format" value="TEXTFILE"/>
          </operator>
          <operator activated="true" class="concurrency:k_means" compatibility="9.2.000" expanded="true" height="82" name="Clustering (2)" width="90" x="380" y="34">
            <parameter key="add_cluster_attribute" value="true"/>
            <parameter key="add_as_label" value="true"/>
            <parameter key="remove_unlabeled" value="false"/>
            <parameter key="k" value="117"/>
            <parameter key="max_runs" value="10"/>
            <parameter key="determine_good_start_values" value="true"/>
            <parameter key="measure_types" value="NumericalMeasures"/>
            <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
            <parameter key="nominal_measure" value="NominalDistance"/>
            <parameter key="numerical_measure" value="EuclideanDistance"/>
            <parameter key="divergence" value="SquaredEuclideanDistance"/>
            <parameter key="kernel_type" value="radial"/>
            <parameter key="kernel_gamma" value="1.0"/>
            <parameter key="kernel_sigma1" value="1.0"/>
            <parameter key="kernel_sigma2" value="0.0"/>
            <parameter key="kernel_sigma3" value="2.0"/>
            <parameter key="kernel_degree" value="3.0"/>
            <parameter key="kernel_shift" value="1.0"/>
            <parameter key="kernel_a" value="1.0"/>
            <parameter key="kernel_b" value="0.0"/>
            <parameter key="max_optimization_steps" value="100"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
          </operator>
          <connect from_op="Retrieve RGB data" from_port="output" to_op="Clustering (2)" to_port="example set"/>
          <connect from_op="Clustering (2)" from_port="cluster model" to_port="result 1"/>
          <connect from_op="Clustering (2)" from_port="clustered set" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    </code><?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.2.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
            <parameter key="generator_type" value="numeric series"/>
            <parameter key="number_of_examples" value="256"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration">
              <parameter key="r" value="linear.0\.0.255\.0"/>
            </list>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="numerical_to_polynominal" compatibility="9.2.000" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="179" y="34">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="real"/>
            <parameter key="block_type" value="value_series"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_series_end"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="concurrency:loop_values" compatibility="9.2.000" expanded="true" height="82" name="Loop Values" width="90" x="380" y="34">
            <parameter key="attribute" value="r"/>
            <parameter key="iteration_macro" value="Rvalue"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="utility:create_exampleset" compatibility="9.2.000" expanded="true" height="68" name="Create ExampleSet (2)" width="90" x="45" y="187">
                <parameter key="generator_type" value="numeric series"/>
                <parameter key="number_of_examples" value="256"/>
                <parameter key="use_stepsize" value="false"/>
                <list key="function_descriptions"/>
                <parameter key="add_id_attribute" value="false"/>
                <list key="numeric_series_configuration">
                  <parameter key="g" value="linear.0\.0.255\.0"/>
                </list>
                <list key="date_series_configuration"/>
                <list key="date_series_configuration (interval)"/>
                <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
                <parameter key="time_zone" value="SYSTEM"/>
                <parameter key="column_separator" value=","/>
                <parameter key="parse_all_as_nominal" value="false"/>
                <parameter key="decimal_point_character" value="."/>
                <parameter key="trim_attribute_names" value="true"/>
              </operator>
              <operator activated="true" class="numerical_to_polynominal" compatibility="9.2.000" expanded="true" height="82" name="Numerical to Polynominal (2)" width="90" x="179" y="187">
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="numeric"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="real"/>
                <parameter key="block_type" value="value_series"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_series_end"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
              </operator>
              <operator activated="true" class="concurrency:loop_values" compatibility="9.2.000" expanded="true" height="82" name="Loop Values (2)" width="90" x="313" y="187">
                <parameter key="attribute" value="g"/>
                <parameter key="iteration_macro" value="Gvalue"/>
                <parameter key="reuse_results" value="false"/>
                <parameter key="enable_parallel_execution" value="false"/>
                <process expanded="true">
                  <operator activated="true" class="utility:create_exampleset" compatibility="9.2.000" expanded="true" height="68" name="Create ExampleSet (3)" width="90" x="112" y="136">
                    <parameter key="generator_type" value="numeric series"/>
                    <parameter key="number_of_examples" value="256"/>
                    <parameter key="use_stepsize" value="false"/>
                    <list key="function_descriptions"/>
                    <parameter key="add_id_attribute" value="false"/>
                    <list key="numeric_series_configuration">
                      <parameter key="b" value="linear.0\.0.255\.0"/>
                    </list>
                    <list key="date_series_configuration"/>
                    <list key="date_series_configuration (interval)"/>
                    <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
                    <parameter key="time_zone" value="SYSTEM"/>
                    <parameter key="column_separator" value=","/>
                    <parameter key="parse_all_as_nominal" value="false"/>
                    <parameter key="decimal_point_character" value="."/>
                    <parameter key="trim_attribute_names" value="true"/>
                  </operator>
                  <operator activated="true" class="numerical_to_polynominal" compatibility="9.2.000" expanded="true" height="82" name="Numerical to Polynominal (3)" width="90" x="246" y="136">
                    <parameter key="attribute_filter_type" value="all"/>
                    <parameter key="attribute" value=""/>
                    <parameter key="attributes" value=""/>
                    <parameter key="use_except_expression" value="false"/>
                    <parameter key="value_type" value="numeric"/>
                    <parameter key="use_value_type_exception" value="false"/>
                    <parameter key="except_value_type" value="real"/>
                    <parameter key="block_type" value="value_series"/>
                    <parameter key="use_block_type_exception" value="false"/>
                    <parameter key="except_block_type" value="value_series_end"/>
                    <parameter key="invert_selection" value="false"/>
                    <parameter key="include_special_attributes" value="false"/>
                  </operator>
                  <operator activated="true" class="concurrency:loop_values" compatibility="9.2.000" expanded="true" height="82" name="Loop Values (3)" width="90" x="380" y="136">
                    <parameter key="attribute" value="b"/>
                    <parameter key="iteration_macro" value="Bvalue"/>
                    <parameter key="reuse_results" value="false"/>
                    <parameter key="enable_parallel_execution" value="false"/>
                    <process expanded="true">
                      <operator activated="true" class="utility:create_exampleset" compatibility="9.2.000" expanded="true" height="68" name="Create ExampleSet (4)" width="90" x="112" y="85">
                        <parameter key="generator_type" value="comma separated text"/>
                        <parameter key="number_of_examples" value="100"/>
                        <parameter key="use_stepsize" value="false"/>
                        <list key="function_descriptions"/>
                        <parameter key="add_id_attribute" value="false"/>
                        <list key="numeric_series_configuration"/>
                        <list key="date_series_configuration"/>
                        <list key="date_series_configuration (interval)"/>
                        <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
                        <parameter key="time_zone" value="SYSTEM"/>
                        <parameter key="input_csv_text" value="R, G, B&#10;%{Rvalue}, %{Gvalue}, %{Bvalue}"/>
                        <parameter key="column_separator" value=","/>
                        <parameter key="parse_all_as_nominal" value="false"/>
                        <parameter key="decimal_point_character" value="."/>
                        <parameter key="trim_attribute_names" value="true"/>
                      </operator>
                      <connect from_op="Create ExampleSet (4)" from_port="output" to_port="output 1"/>
                      <portSpacing port="source_input 1" spacing="0"/>
                      <portSpacing port="source_input 2" spacing="0"/>
                      <portSpacing port="sink_output 1" spacing="0"/>
                      <portSpacing port="sink_output 2" spacing="0"/>
                    </process>
                  </operator>
                  <operator activated="true" class="append" compatibility="9.2.000" expanded="true" height="82" name="Append" width="90" x="514" y="136">
                    <parameter key="datamanagement" value="double_array"/>
                    <parameter key="data_management" value="auto"/>
                    <parameter key="merge_type" value="all"/>
                  </operator>
                  <connect from_op="Create ExampleSet (3)" from_port="output" to_op="Numerical to Polynominal (3)" to_port="example set input"/>
                  <connect from_op="Numerical to Polynominal (3)" from_port="example set output" to_op="Loop Values (3)" to_port="input 1"/>
                  <connect from_op="Loop Values (3)" from_port="output 1" to_op="Append" to_port="example set 1"/>
                  <connect from_op="Append" from_port="merged set" to_port="output 1"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="append" compatibility="9.2.000" expanded="true" height="82" name="Append (2)" width="90" x="447" y="187">
                <parameter key="datamanagement" value="double_array"/>
                <parameter key="data_management" value="auto"/>
                <parameter key="merge_type" value="all"/>
              </operator>
              <connect from_op="Create ExampleSet (2)" from_port="output" to_op="Numerical to Polynominal (2)" to_port="example set input"/>
              <connect from_op="Numerical to Polynominal (2)" from_port="example set output" to_op="Loop Values (2)" to_port="input 1"/>
              <connect from_op="Loop Values (2)" from_port="output 1" to_op="Append (2)" to_port="example set 1"/>
              <connect from_op="Append (2)" from_port="merged set" to_port="output 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="9.2.000" expanded="true" height="82" name="Append (3)" width="90" x="581" y="34">
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="merge_type" value="all"/>
          </operator>
          <operator activated="true" class="store" compatibility="9.2.000" expanded="true" height="68" name="Store" width="90" x="715" y="34">
            <parameter key="repository_entry" value="RGB data"/>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Numerical to Polynominal" to_port="example set input"/>
          <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Loop Values" to_port="input 1"/>
          <connect from_op="Loop Values" from_port="output 1" to_op="Append (3)" to_port="example set 1"/>
          <connect from_op="Append (3)" from_port="merged set" to_op="Store" to_port="input"/>
          <connect from_op="Store" from_port="through" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    </pre><div><br></div>K-means applied on 16million data would take a while...<br><br><p><pre class="CodeBlock"><code>


  • robin
    robin New Altair Community Member
    Thank you, a lot easier than I anticipated. I have included the conversion from Hex to RGB below for anyone in the future that may encounter this:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Retrieve" width="90" x="313" y="136"/>
          <operator activated="true" class="generate_attributes" compatibility="9.2.000" expanded="true" height="82" name="Generate Attributes (5)" width="90" x="447" y="136">
            <list key="function_descriptions">
              <parameter key="red_1" value="cut(hex,1,1)"/>
              <parameter key="red_2" value="cut(hex,2,1)"/>
              <parameter key="green_1" value="cut(hex,3,1)"/>
              <parameter key="green_2" value="cut(hex,4,1)"/>
              <parameter key="blue_1" value="cut(hex,5,1)"/>
              <parameter key="blue_2" value="cut(hex,6,1)"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="map" compatibility="9.2.000" expanded="true" height="82" name="Map (4)" width="90" x="581" y="136">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="blue_1|blue_2|green_1|green_2|red_1|red_2"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <list key="value_mappings">
              <parameter key="a" value="10"/>
              <parameter key="b" value="11"/>
              <parameter key="c" value="12"/>
              <parameter key="d" value="13"/>
              <parameter key="e" value="14"/>
              <parameter key="f" value="15"/>
            </list>
            <parameter key="consider_regular_expressions" value="false"/>
            <parameter key="add_default_mapping" value="false"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.2.000" expanded="true" height="82" name="Generate Attributes (6)" width="90" x="715" y="136">
            <list key="function_descriptions">
              <parameter key="red" value="parse(red_1) * 16 + parse(red_2)"/>
              <parameter key="green" value="parse(green_1) * 16 + parse(green_2)"/>
              <parameter key="blue" value="parse(blue_1) * 16 + parse(blue_2)"/>
            </list>
            <parameter key="keep_all" value="false"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.2.000" expanded="true" height="82" name="Select Attributes (6)" width="90" x="849" y="136">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value="blue"/>
            <parameter key="attributes" value="|red|green|blue"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="concurrency:k_means" compatibility="9.2.000" expanded="true" height="82" name="Clustering (2)" width="90" x="983" y="136">
            <parameter key="add_cluster_attribute" value="true"/>
            <parameter key="add_as_label" value="true"/>
            <parameter key="remove_unlabeled" value="false"/>
            <parameter key="k" value="24"/>
            <parameter key="max_runs" value="10"/>
            <parameter key="determine_good_start_values" value="true"/>
            <parameter key="measure_types" value="NumericalMeasures"/>
            <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
            <parameter key="nominal_measure" value="NominalDistance"/>
            <parameter key="numerical_measure" value="EuclideanDistance"/>
            <parameter key="divergence" value="SquaredEuclideanDistance"/>
            <parameter key="kernel_type" value="radial"/>
            <parameter key="kernel_gamma" value="1.0"/>
            <parameter key="kernel_sigma1" value="1.0"/>
            <parameter key="kernel_sigma2" value="0.0"/>
            <parameter key="kernel_sigma3" value="2.0"/>
            <parameter key="kernel_degree" value="3.0"/>
            <parameter key="kernel_shift" value="1.0"/>
            <parameter key="kernel_a" value="1.0"/>
            <parameter key="kernel_b" value="0.0"/>
            <parameter key="max_optimization_steps" value="100"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Generate Attributes (5)" to_port="example set input"/>
          <connect from_op="Generate Attributes (5)" from_port="example set output" to_op="Map (4)" to_port="example set input"/>
          <connect from_op="Map (4)" from_port="example set output" to_op="Generate Attributes (6)" to_port="example set input"/>
          <connect from_op="Generate Attributes (6)" from_port="example set output" to_op="Select Attributes (6)" to_port="example set input"/>
          <connect from_op="Select Attributes (6)" from_port="example set output" to_op="Clustering (2)" to_port="example set"/>
          <connect from_op="Clustering (2)" from_port="cluster model" to_port="result 1"/>
          <connect from_op="Clustering (2)" from_port="clustered set" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>

  • YYH
    YYH
    Altair Employee
    Awesome.Thank you @robin for sharing the mapping process!
  • robin
    robin New Altair Community Member
    Pleasure and thank you to @sgenzer for helping on how to post code on the new forum.
  • sgenzer
    sgenzer
    Altair Employee
    note to self: when @yyhuang says that something will "take a while", she's not kidding