"ERROR using K-means clustering algorithm with text data"

basel_deeb
basel_deeb New Altair Community Member
edited November 5 in Community Q&A
Hello,

I'm using text data that contains three attribute (NAME, LABEL, DOMAIN), this is a sample of the data:

NAME                           LABEL                         DOMAIN
------------------------------------------------------------------
origin                            from                             string
destination                    to                                string
departure day              day                              date
departure month          month                           date

I want to use k-means clustering operator in order to cluster the data, but unfortunately I got this ERROR before the execution:

" The setup does not seem to contain any obvious error, but you should check the log messages or activate the debug mode in the setting dialog in order to get more information about this problem"

Here it is the Log Messages:

Dec 26, 2012 1:23:44 AM INFO: Process //NewLocalRepository/IOS/EM starts
Dec 26, 2012 1:23:44 AM INFO: Loading initial data.
Dec 26, 2012 1:23:45 AM SEVERE: Process failed: operator cannot be executed. Check the log messages...
Dec 26, 2012 1:23:45 AM SEVERE: Here:           Process[1] (Process)
          subprocess 'Main Process'
            +- Retrieve[1] (Retrieve)
      ==>   +- Clustering[1] (k-Means)
Dec 26, 2012 1:23:45 AM SEVERE: java.lang.NullPointerException


and here it is the XML :
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
   <process expanded="true" height="341" width="480">
     <operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve" width="90" x="126" y="140">
       <parameter key="repository_entry" value="../EXPIO/DDP"/>
     </operator>
     <operator activated="true" class="k_means" compatibility="5.2.008" expanded="true" height="76" name="Clustering" width="90" x="313" y="120">
       <parameter key="k" value="10"/>
       <parameter key="measure_types" value="NominalMeasures"/>
       <parameter key="nominal_measure" value="DiceSimilarity"/>
     </operator>
     <connect from_op="Retrieve" from_port="output" to_op="Clustering" to_port="example set"/>
     <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>

Any advice would be greatly appreciated.  Thanks!

Answers

  • Skirzynski
    Skirzynski New Altair Community Member
    I have executed your process with your short sample of the data, but couldn't reproduce the error. Can you provide a minimal amount of data (CSV) which does not work?

    P.S.: Please use the code-tags in this forum for your processes and data.
  • basel_deeb
    basel_deeb New Altair Community Member
    Thank you so much Mr. Marcin for your reply,
    Actually I've surprised when i uninstalled RapidMiner then reinstalled it, it's worked

    However, I've got a question if you don't mind, after generating the centroids clusters by K-means how can i know them because it is generating them as follow:

    Cluster_0
    Cluster_1
    Cluster_2

    Again thanks a lot
  • Skirzynski
    Skirzynski New Altair Community Member
    If you take a look at the cluster model in the result view, you can see several different views. For instance, in the "Folder View" all cluster which actually contain any examples are displayed as a folder. If you click on an item inside the cluster you can see the details. What is interesting for you, is the "Centroid Table". All cluster centroids are listed with their values. If a cluster was created, but does not contain any example (because your k was too high), this centroids will have question marks instead of values.