Clustering with labels?

New Altair Community Member

Nov 15, 2016

Updated Nov 5, 2024 by Jocelyn

Hi,

is there any way to do clustering with labels to control performance (in classification)? what operator can I use to do that (e.g with k-means?)

and is there some way to cluster the data with the "help" from labels if the class is known, so I mean clustering based on given labels (e.g find out which class label is clustered together, and then get the centroid of that local cluster and so on... ?)

Is there some operator existent that uses labels for clustering? I just want to find out some more properties about my dataset and my classes (e.g local cluster labels centroid tables... etc.)

Find more posts tagged with

AI Studio

Clustering

Sort by:

1 - 9 of 91

dang

New Altair Community Member

Nov 15, 2016

If you have labeled data, most of the time clustering is bring owls to Athens....

Of course you can use 'set role' to make lable column to normal regular attributes and pretend to not have any label information. Use the data without special attribute 'label' you can do any clustering you want.

Hope that makes senses...

Fred12

New Altair Community Member

Nov 15, 2016

I know the purpose of clustering, but I want to compare the found clusters with labeled "clusters" if you know what I mean, to find the "goodness" of clusters by comparing them with some ground truth...

any sophisticated way to do so? any ideas?

MartinLiebig

Altair Employee

Accepted Answer

Nov 15, 2016

did you try Map Clustering on Labels and then the performance operators?

Fred12

New Altair Community Member

Nov 15, 2016

yeah thanks, that seemed to work, but I still don't know how that operator works,

how is it choosing which cluster is what label?

MartinLiebig

Altair Employee

Nov 16, 2016

Mh, good question. The important code is in ClusterToPrediction.java - but it's quite a chunk.

	@Override
	public void doWork() throws OperatorException {
		ExampleSet exampleSet = exampleSetInput.getData(ExampleSet.class);
		ClusterModel model = clusterModelInput.getData(ClusterModel.class);

		// generate the predicted attribute
		Attribute labelAttribute = exampleSet.getAttributes().getLabel();
		PredictionModel.createPredictedLabel(exampleSet, labelAttribute);
		Attribute predictedLabel = exampleSet.getAttributes().getPredictedLabel();

		HashMap<Integer, String> intToClusterMapping = new HashMap<Integer, String>();
		int[][] mappingTable = new int[model.getNumberOfClusters()][model.getNumberOfClusters()];

		// count the occurrence of each label with every cluster
		int a = 0;
		for (int i = 0; i < model.getNumberOfClusters(); i++) {
			HashMap<String, Integer> labelOccurrence = new HashMap<String, Integer>();
			for (Example example : exampleSet) {
				String label = example.getValueAsString(labelAttribute);
				if (!labelOccurrence.containsKey(label)) {
					labelOccurrence.put(label, 0);
					if (i == 0) {
						intToClusterMapping.put(a, label);
						a++;
					}
				}
				if (example.getValue(example.getAttributes().getCluster()) == i) {
					labelOccurrence.put(label, labelOccurrence.get(label) + 1);
				}
			}

			if (i == 0 && model.getNumberOfClusters() != labelOccurrence.size()) {
				throw new UserError(this, 943, labelOccurrence.size(), model.getNumberOfClusters());
			}

			for (int j = 0; j < mappingTable[i].length; j++) {
				String clusterName = intToClusterMapping.get(j);
				int occ = labelOccurrence.get(clusterName);
				mappingTable[i][j] = occ;
			}
		}
		/*
		 * Munkres-algorithm or the hungarian method
		 */
		// find the maximum
		int maxValue = -1;
		for (int i = 0; i < mappingTable.length; i++) {
			for (int j = 0; j < mappingTable[i].length; j++) {
				if (mappingTable[i][j] > maxValue) {
					maxValue = mappingTable[i][j];
				}
			}
		}

		// compute the new (inverted) table (and column-minima)
		for (int i = 0; i < mappingTable.length; i++) {
			int minimum = Integer.MAX_VALUE;
			for (int j = 0; j < mappingTable[i].length; j++) {
				mappingTable[i][j] = maxValue - mappingTable[i][j];
				if (mappingTable[i][j] < minimum) {
					minimum = mappingTable[i][j];
				}
			}
			// subtract the column-minima
			if (minimum > 0) {
				for (int j = 0; j < mappingTable[i].length; j++) {
					mappingTable[i][j] = mappingTable[i][j] - minimum;
				}
			}
		}
		// compute and subtract the row-minima
		for (int i = 0; i < mappingTable[0].length; i++) {
			int minimum = Integer.MAX_VALUE;
			for (int j = 0; j < mappingTable.length; j++) {
				if (mappingTable[j][i] < minimum) {
					minimum = mappingTable[j][i];
				}
			}
			// subtract the row-minima
			if (minimum > 0) {
				for (int j = 0; j < mappingTable.length; j++) {
					mappingTable[j][i] = mappingTable[j][i] - minimum;
				}
			}
		}
		while (!assignmentAvailable(mappingTable)) {
			Vector<Integer> markedRows = new Vector<Integer>();
			Vector<Integer> markedColumns = new Vector<Integer>();

			// mark all rows which have no marked zero (start labeling)
			for (int i = 0; i < mappingTable[0].length; i++) {
				boolean markedZero = false;
				for (int j = 0; j < mappingTable.length; j++) {
					if (mappingTable[j][i] == Integer.MIN_VALUE) {
						markedZero = true;
						break;
					}
				}
				if (!markedZero) {
					markedRows.add(i);
				}
			}

			boolean newMarked = true;
			while (newMarked) {
				newMarked = false;
				// mark all columns with a slashed zero in a marked row
				for (int i = 0; i < mappingTable.length; i++) {
					for (int j = 0; j < mappingTable[i].length; j++) {
						if (mappingTable[i][j] == Integer.MAX_VALUE) {
							if (markedRows.contains(j) && !markedColumns.contains(i)) {
								newMarked = true;
								markedColumns.add(i);
							}
						}
					}
				}
				// mark all rows with a marked zero in a marked column
				for (int i = 0; i < mappingTable[0].length; i++) {
					for (int j = 0; j < mappingTable.length; j++) {
						if (mappingTable[j][i] == Integer.MIN_VALUE) {
							if (markedColumns.contains(j) && !markedRows.contains(i)) {
								newMarked = true;
								markedRows.add(i);
							}
						}
					}
				}
			} // end while (newMarked)

			// inverting of the marked columns
			for (int i = 0; i < mappingTable.length; i++) {
				if (!markedColumns.contains(i)) {
					markedColumns.add(i);
				} else {
					markedColumns.removeElement(i);
				}
			}

			// find the minimum in the marked range
			int minimum = Integer.MAX_VALUE;
			for (int i = 0; i < markedRows.size(); i++) {
				for (int j = 0; j < markedColumns.size(); j++) {
					if (mappingTable[markedColumns.get(j)][markedRows.get(i)] < minimum) {
						minimum = mappingTable[markedColumns.get(j)][markedRows.get(i)];
					}
				}
			}
			// substract the minimum from all elements in the marked range
			for (int i = 0; i < markedRows.size(); i++) {
				for (int j = 0; j < markedColumns.size(); j++) {
					mappingTable[markedColumns.get(j)][markedRows.get(i)] = mappingTable[markedColumns.get(j)][markedRows
							.get(i)] - minimum;
				}
			}

			// add the minimum to all elements which are neither marked in a row nor in a column
			for (int i = 0; i < mappingTable.length; i++) {
				if (!markedColumns.contains(i)) {
					for (int j = 0; j < mappingTable[i].length; j++) {
						if (!markedRows.contains(j)) {
							mappingTable[i][j] = mappingTable[i][j] + minimum;
						}
					}
				}
			}
			// reset the Integer.MIN_VALUE and Integer.MAX_VALUE to zero
			for (int i = 0; i < mappingTable.length; i++) {
				for (int j = 0; j < mappingTable[i].length; j++) {
					if (mappingTable[i][j] == Integer.MAX_VALUE) {
						mappingTable[i][j] = 0;
					}
					if (mappingTable[i][j] == Integer.MIN_VALUE) {
						mappingTable[i][j] = 0;
					}
				}
			}
		} // end while(!assignmentAvailable)

		// compute the mapping (there must be a possible assignment)
		HashMap<Integer, String> clusterToPrediction = new HashMap<Integer, String>();
		for (int i = 0; i < mappingTable.length; i++) {
			int result = -1;
			for (int j = 0; j < mappingTable[i].length; j++) {
				if (mappingTable[i][j] == Integer.MIN_VALUE) {
					result = j;
					break;
				}
			}
			String resultCluster = intToClusterMapping.get(result);
			clusterToPrediction.put(i, resultCluster);
		}

		// insert the result in the predicted attribute
		HashMap<String, Integer> predictionToCluster = new HashMap<String, Integer>();
		// set the preditedLabel in the example table and compute to each prediction the cluster
		int i = 0;
		Attribute clusterAttribute = exampleSet.getAttributes().getCluster();
		for (Example example : exampleSet) {
			String resultLabel = clusterToPrediction.get((int) example.getValue(example.getAttributes().getCluster()));
			example.setValue(predictedLabel, resultLabel);
			if (predictionToCluster.size() < model.getNumberOfClusters()) {
				if (!predictionToCluster.containsKey(example.getValueAsString(example.getAttributes().getPredictedLabel()))) {
					String clusterNumber = example.getValueAsString(clusterAttribute).replaceAll("[^\\d]+", "");
					try {
						int number = Integer.parseInt(clusterNumber);
						predictionToCluster.put(example.getValueAsString(example.getAttributes().getPredictedLabel()),
								number);
					} catch (NumberFormatException e) {
						throw new UserError(this, 145, clusterAttribute.getName());
					}
				}
			}
			i++;
		}

		// set the confidence in the example table
		i = 0;
		for (Example example : exampleSet) {
			if (model.getClass() == FlatFuzzyClusterModel.class) {
				FlatFuzzyClusterModel fuzzyModel = (FlatFuzzyClusterModel) model;
				for (int j = 0; j < clusterToPrediction.size(); j++) {
					String label = clusterToPrediction.get(j);
					example.setConfidence(label,
							fuzzyModel.getExampleInClusterProbability(i, predictionToCluster.get(label)));
				}
			} else {
				example.setConfidence(clusterToPrediction.get((int) example.getValue(example.getAttributes().getCluster())),
						1);
			}
			i++;
		}

		exampleSetOutput.deliver(exampleSet);
		clusterModelOutput.deliver(model);
	}

	/* Returns true, if there is a solution availble. */
	private boolean assignmentAvailable(int[][] mappingTable) {
		int markedZeros = 0;
		boolean modificationDone = true;

		while (modificationDone) {
			while (modificationDone) {
				modificationDone = false;
				// column by column
				for (int i = 0; i < mappingTable.length; i++) {
					int position = -1;
					for (int j = 0; j < mappingTable[i].length; j++) {
						if (mappingTable[i][j] == 0) {
							if (position == -1) {
								position = j;
							} else {
								position = -1;
								break;
							}
						}
					}
					if (position != -1) {
						modificationDone = true;
						mappingTable[i][position] = Integer.MIN_VALUE; // marked zero
						for (int k = 0; k < mappingTable.length; k++) {
							if (mappingTable[k][position] == 0) {
								mappingTable[k][position] = Integer.MAX_VALUE; // slashed zeros
							}
						}
						markedZeros++;
					}
				}
				if (markedZeros == mappingTable.length) {
					return true;
				}

				// line by line
				for (int i = 0; i < mappingTable[0].length; i++) {
					int position = -1;
					for (int j = 0; j < mappingTable.length; j++) {
						if (mappingTable[j][i] == 0) {
							if (position == -1) {
								position = j;
							} else {
								position = -1;
								break;
							}
						}
					}
					if (position != -1) {
						modificationDone = true;
						mappingTable[position][i] = Integer.MIN_VALUE;// marked zero
						for (int k = 0; k < mappingTable[0].length; k++) {
							if (mappingTable[position][k] == 0) {
								mappingTable[position][k] = Integer.MAX_VALUE; // slashed zeros
							}
						}
						markedZeros++;
					}
				}
				if (markedZeros == mappingTable.length) {
					return true;
				}
			}
			// modificationDone is here always false
			// ambiguous zeros
			int aktMarkedZeros = markedZeros;
			for (int i = 0; i < mappingTable.length; i++) {
				for (int j = 0; j < mappingTable[i].length; j++) {
					if (mappingTable[i][j] == 0) {
						mappingTable[i][j] = Integer.MIN_VALUE;// marked zero
						for (int k = j + 1; k < mappingTable[i].length; k++) {
							if (mappingTable[i][k] == 0) {
								mappingTable[i][k] = Integer.MAX_VALUE; // slashed zeros in the same
																		// column
							}
						}
						for (int k = 0; k < mappingTable.length; k++) {
							if (mappingTable[k][j] == 0) {
								mappingTable[k][j] = Integer.MAX_VALUE; // slashed zeros
							}
						}
						modificationDone = true;
						markedZeros++;
						break;
					}
				}
				if (aktMarkedZeros != markedZeros) {
					break;
				}
			}
			if (markedZeros == mappingTable.length) {
				return true;
			}
		}

		return false;
	}

student_compute

New Altair Community Member

Jun 22, 2018

Hi, how should I use this code in the program? Where should I copy and use?
Thankful
Sorry i'm asking

student_compute

New Altair Community Member

Jun 26, 2018

sorry

please help me

thanks

Muhammed_Fatih_

New Altair Community Member

Jun 24, 2020

Hi @mschmitz,

one further question in this connection. Which classification model does the "Map Clustering on Labels" operator consider with regard to the subsequent calculation of performance values?

Thank you in advance for your response!

Best regards!

Telcontar120

New Altair Community Member

Jun 24, 2020

The Map Clustering on Labels "model" simply chooses a cluster for each class and maps to that, by minimizing the total number of errors produced by the mapping. Assignments by cluster are exclusive. It then calculates the performance metrics by looking at "predictions" (based on the mapped clusters) and the "actual" (the label). You need to have the same number of clusters as you have label classes for this operator to work.

Sort by:

1 - 1 of 11

MartinLiebig

Altair Employee

Accepted Answer

Nov 15, 2016

did you try Map Clustering on Labels and then the performance operators?

View in context

Clustering with labels?

Find more posts tagged with

Quick Links