How to divide ZIP Code into a cluster analysis?

a_trunk
a_trunk New Altair Community Member
edited November 5 in Community Q&A

Hello,

sorry for my simple question, but i work not so long with rapidminer and i need it for education. I have a simple case but i do not right solve the problem: I have a dataset of 100.000 Zip Code and Customers numbers and want to analyse the best selling areas in my country. So i decided to use the cluster analyse. The ZIP Code in Germany is from 00001 to 99999 and i want to build clusters for example 00001 to 00500 and for example 70000 to 75000.

My question: How can i tell rapidminer how they build the cluster by this range?

 

Many many thanks for help.

 

Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member

    Hi @a_trunk

     

    You can try to use the Split Data operator to create some partitions of your data, like in this process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.1.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="34">
    <parameter key="generator_type" value="comma_separated_text"/>
    <parameter key="use_stepsize" value="true"/>
    <list key="function_descriptions"/>
    <list key="numeric_series_configuration">
    <parameter key="zip_code" value="linear.0\.0.1\.0"/>
    </list>
    <list key="date_series_configuration"/>
    <list key="date_series_configuration (interval)"/>
    <parameter key="input_csv_text" value="Id,att1&#10;1,&quot;0001&quot;&#10;2,&quot;0002&quot;&#10;3,&quot;0003&quot;&#10;4,&quot;0004&quot;&#10;5,&quot;0005&quot;&#10;6,&quot;0006&quot;&#10;7,&quot;0007&quot;&#10;8,&quot;0008&quot;&#10;9,&quot;0009&quot;&#10;10,&quot;0010&quot;&#10;11,&quot;0011&quot;&#10;12,&quot;0012&quot;&#10;13,&quot;0013&quot;&#10;14,&quot;0014&quot;&#10;15,&quot;0015&quot;&#10;16,&quot;0016&quot;&#10;17,&quot;0017&quot;&#10;18,&quot;0018&quot;&#10;19,&quot;0019&quot;&#10;20,&quot;0020&quot;&#10;"/>
    </operator>
    <operator activated="true" class="split_data" compatibility="8.2.000" expanded="true" height="124" name="Split Data" width="90" x="514" y="34">
    <enumeration key="partitions">
    <parameter key="ratio" value="0.1"/>
    <parameter key="ratio" value="0.1"/>
    <parameter key="ratio" value="0.8"/>
    </enumeration>
    <parameter key="sampling_type" value="linear sampling"/>
    </operator>
    <connect from_op="Create ExampleSet" from_port="output" to_op="Split Data" to_port="example set"/>
    <connect from_op="Split Data" from_port="partition 1" to_port="result 1"/>
    <connect from_op="Split Data" from_port="partition 2" to_port="result 2"/>
    <connect from_op="Split Data" from_port="partition 3" to_port="result 3"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    </process>
    </operator>
    </process>

    I hope it helps,

     

    Regards,

     

    Lionel

  • Telcontar120
    Telcontar120 New Altair Community Member

    You might also want to create a new attribute (using Generate Attributes) that corresponds to some higher level groupings of postal codes.  Using the prefix function, you can create aggregated groups at the 1 digit level, the 2 digit level, etc.  These can then be made available to the clustering algorithm rather than the raw zip code.  The problem with the raw zip code is that RapidMiner has no idea it is a hierarchical relationship---it just interprets it as a set of distinct nominal values.