"Java iterate over Parameters and write out clusterModels"

shadrigo
shadrigo New Altair Community Member
edited November 5 in Community Q&A
Hi together,

i want to evaluate different parameters for a clustering algorithm (dbscan with epsilon and minPoints)

I wrote a little java code that includes two for loops (for each parameter)
outside of the loops i initialize rapidminer and  i get the Operators from an Processfile

From there i get a  reference to the DBScanOperator in the Processfile
on this reference I change the parameters epsilon and minpoints in the loops.

first i set the minpoints and loop over different epsilon values.
with the resulting ClusterModel i need the average number of examples in the cluster and how many clusters are generated
these values i write out to a file (for plotting in Excel) for each epsilon value.
then a new file is created for a new minPoints parameter and i loop again over the epsilons.

My problem is, that all data that i write out is exactly the same..
Must i reinitalize Rapidminer or clear a cache or something in the loop  in order to get the new data and no leftovers?

here is my code:

//Method
private void runClusterNumberTest(String processFile) {

                RapidMiner.initRM();

Process dbScanRootProcess = setProcessFile(processFile);

ArffExampleSource arffSource = (ArffExampleSource) dbScanRootProcess
.getOperator("ArffExampleSource");
DBScan dbscanClusterAlg = (DBScan) dbScanRootProcess
.getOperator("DBScanClustering");
ClusterModelWriter clusterModWriter = (ClusterModelWriter) dbScanRootProcess
.getOperator("ClusterModelWriter");
                //do the clustering for just a little subset
arffSource.setParameter("sample_ratio", Double
.toString(this.percentageOfData));

// loop over all epsilons and minPoints that need to be evaluated
for (double mPts = this.minPtsStart; mPts <= this.minPtsMax; mPts += this.stepmPts) {
                         
                        // get a new File for writing the data into
BufferedWriter resultWriter = setupResultFile(this.outputFolder,
mPts);
                        //set the min Points parameter
dbscanClusterAlg.setParameter("min_Points", Double.toString(mPts));

for (double eps = this.epsilonStart; eps <= this.epsilonMax; eps += this.stepEps) {


System.out.println("Bearbeite Konfiguration mit Eps="+eps+" und mPts="+mPts);


resultWriter.append( eps + ";");

dbscanClusterAlg.setParameter("epsilon", Double.toString(eps));

String clusModelOutpf = this.outputFolder + File.separator
+ "ClusterModels" + File.separator + "cmDBSCAN_Eps_"
+ eps + "mPts_" + mPts +"_PercData_"+this.percentageOfData +".clm";


clusterModWriter.setParameter("cluster_model_file",
clusModelOutpf);

// RUN the Process
IOContainer rootIOContainer = new IOContainer();

System.out.println("Rufe RM-Clusteringprozess auf...");

rootIOContainer = dbScanRootProcess.run();

ClusterModel clusterModel=null;

clusterModel = rootIOContainer.get(ClusterModel.class);

System.out.println("Verarbeite Daten aus Clustering...");
int clustercount= clusterModel.getNumberOfClusters();

Collection<Cluster> clusters = clusterModel.getClusters();

                                //... read only data from clusters then the new loop begins
do I access rapidMiner API or some objects wrong? Is there no direct reference to the objects so that a change results in new parameters in the processfile?

thanks in advance
Tagged:

Answers

  • shadrigo
    shadrigo New Altair Community Member
    I left the upper approach and
    tried to achieve the parameter iteration
    with the GUI  by using the ParameterIteration Operator and I want to write out the clusterModels
    but when I try to set the model_file  Value I cannot set an absolute Path with : in it (C:\test\test.clm)
    the system throws an parsing Error of the String.

    Is there a way to get the current parametervalues into the filename of the clusterModelWriter?
    for example ClusterModel_Param1_Param1value.clm ?
  • land
    land New Altair Community Member
    Hi,
    your code should work. Perhabs you should test it with another parameter variation like the number of clusters. DBScan does only change its behavior in a very small window of possible parameter values. Thats might be the reason, why all clustermodels are the same.
    As far as I know there is no possibility to get the parameter values into the filename if you don't use macros. Although you could simply set the macros and use their values as parameters, this might become unhandy if numerical values should be used, because macro definition is always nominal and every value has to be inserted.

    There is the predefined macro %{a}, which is filled with the applycount of the current operator. Perhabs this is enough in combination with a process log, which could save the table translating the number into parameter values.

    Greetings,
      Sebastian
  • shadrigo
    shadrigo New Altair Community Member
    thanks for your answer,

    i wrote out the data by using the iteration macro %{a} and set the parameters in the filename manually (luckily there was only a few).

    I have seen that dbscan has that behaviour. very difficult to determine good values for epsilon and minPoints when the input dimension is high.