Hi together,
i want to evaluate different parameters for a clustering algorithm (dbscan with epsilon and minPoints)
I wrote a little java code that includes two for loops (for each parameter)
outside of the loops i initialize rapidminer and i get the Operators from an Processfile
From there i get a reference to the DBScanOperator in the Processfile
on this reference I change the parameters epsilon and minpoints in the loops.
first i set the minpoints and loop over different epsilon values.
with the resulting ClusterModel i need the average number of examples in the cluster and how many clusters are generated
these values i write out to a file (for plotting in Excel) for each epsilon value.
then a new file is created for a new minPoints parameter and i loop again over the epsilons.
My problem is, that all data that i write out is exactly the same..
Must i reinitalize Rapidminer or clear a cache or something in the loop in order to get the new data and no leftovers?
here is my code:
//Method
private void runClusterNumberTest(String processFile) {
RapidMiner.initRM();
Process dbScanRootProcess = setProcessFile(processFile);
ArffExampleSource arffSource = (ArffExampleSource) dbScanRootProcess
.getOperator("ArffExampleSource");
DBScan dbscanClusterAlg = (DBScan) dbScanRootProcess
.getOperator("DBScanClustering");
ClusterModelWriter clusterModWriter = (ClusterModelWriter) dbScanRootProcess
.getOperator("ClusterModelWriter");
//do the clustering for just a little subset
arffSource.setParameter("sample_ratio", Double
.toString(this.percentageOfData));
// loop over all epsilons and minPoints that need to be evaluated
for (double mPts = this.minPtsStart; mPts <= this.minPtsMax; mPts += this.stepmPts) {
// get a new File for writing the data into
BufferedWriter resultWriter = setupResultFile(this.outputFolder,
mPts);
//set the min Points parameter
dbscanClusterAlg.setParameter("min_Points", Double.toString(mPts));
for (double eps = this.epsilonStart; eps <= this.epsilonMax; eps += this.stepEps) {
System.out.println("Bearbeite Konfiguration mit Eps="+eps+" und mPts="+mPts);
resultWriter.append( eps + ";");
dbscanClusterAlg.setParameter("epsilon", Double.toString(eps));
String clusModelOutpf = this.outputFolder + File.separator
+ "ClusterModels" + File.separator + "cmDBSCAN_Eps_"
+ eps + "mPts_" + mPts +"_PercData_"+this.percentageOfData +".clm";
clusterModWriter.setParameter("cluster_model_file",
clusModelOutpf);
// RUN the Process
IOContainer rootIOContainer = new IOContainer();
System.out.println("Rufe RM-Clusteringprozess auf...");
rootIOContainer = dbScanRootProcess.run();
ClusterModel clusterModel=null;
clusterModel = rootIOContainer.get(ClusterModel.class);
System.out.println("Verarbeite Daten aus Clustering...");
int clustercount= clusterModel.getNumberOfClusters();
Collection<Cluster> clusters = clusterModel.getClusters();
//... read only data from clusters then the new loop begins
do I access rapidMiner API or some objects wrong? Is there no direct reference to the objects so that a change results in new parameters in the processfile?
thanks in advance