Clustering methods that are less sensitive to scale than others?
keith
New Altair Community Member
Hi,
I'm working with an unsupervised clustering problem for the first time, and had a conceptual question about how different algorithms handle attributes that are on different scales.
Given the different clustering methods and distance calculations available in RM, are there some that are less sensitive than others to the attributes being on different scales (or normalized, but with applying different attribute weights)? That is, they tend to give the same cluster results whether the attributes are normalized (or weighted) or not.
You probably can't get complete scale-insensitivity because you ultimately have to compute a distance measure of some kind, but in practice do certain approaches give more stable results when given unnormalized (or weighted-attribute) data?
Thanks,
Keith
I'm working with an unsupervised clustering problem for the first time, and had a conceptual question about how different algorithms handle attributes that are on different scales.
Given the different clustering methods and distance calculations available in RM, are there some that are less sensitive than others to the attributes being on different scales (or normalized, but with applying different attribute weights)? That is, they tend to give the same cluster results whether the attributes are normalized (or weighted) or not.
You probably can't get complete scale-insensitivity because you ultimately have to compute a distance measure of some kind, but in practice do certain approaches give more stable results when given unnormalized (or weighted-attribute) data?
Thanks,
Keith
Tagged:
0
Answers
-
Hi Keith,
you basically already said it yourself: All clustering techniques depend on a distance measure and this distance metric again depends on whether the attributes are normalized or not and on whether they are weighted or not. So, to the best of my knowledge, the results of all clustering techniques heavily depend on these data preprocessing steps. As a consequence, I would highly recommend to use normalization before clustering and, if the attributes are not equally important, I would also recommend attribute weighting.
If somebody has another view on this, I would also be interested in learning more about it.
Best regards,
Ralf0 -
Thanks Ralf. Your answer is what my intuition was telling me.
FWIW, some poking around online with the famous search engine did uncover a few papers that seem to be addressing the same (or similar) problems:
http://jorlin.scripts.mit.edu/docs/publications/115-ScaleInvariantClustering.pdf
http://www.robots.ox.ac.uk/~vgg/publications/papers/fitzgibbon02.pdf
http://www.autonlab.org/autonweb/14651/version/3/part/5/data/wong-efficient.pdf?branch=main&;language=en
http://bioinformatics.oxfordjournals.org/cgi/reprint/19/7/818
http://www.gene-quantification.de/tichopad-bioinf-2006.pdf
http://dimacs.rutgers.edu/Workshops/Depth/meer.pdf
0 -
Hi Keith,
thanks for the links to these research papers:
The cited scale-invariant clustering techniques are only invariant to affine transformations, but not invariant to normalization and arbitrary weighitng of attributes, because these are non-affine data transformations.keith wrote:
FWIW, some poking around online with the famous search engine did uncover a few papers that seem to be addressing the same (or similar) problems:
http://jorlin.scripts.mit.edu/docs/publications/115-ScaleInvariantClustering.pdf
http://www.robots.ox.ac.uk/~vgg/publications/papers/fitzgibbon02.pdf
...
Best regards,
Ralf0 -
Thanks Ralf. I was not familiar with the concept of affine transformations, and missed that implication when I came across those articles. I'm trying to wrap my head around it now. Something isn't making sense to me still as to why normalization isn't an affine transformation.Ralf Klinkenberg wrote:
The cited scale-invariant clustering techniques are only invariant to affine transformations, but not invariant to normalization and arbitrary weighitng of attributes, because these are non-affine data transformations.
After consulting this geometry book, http://books.google.com/books?id=q49lhAzXTFEC, I see the following definition of an affine transformation:
So if I had two attributes X and Y, and transformed them using:
An affine transformation of R(2) is a function t:R(2) -> R(2) of the form
t(x) = A x + b
where A is an invertible 2x2 matrix and b is in R(2)
(Affine geometry can be defined in R(n) for any n>=2; we restrict our attention here to the case when n=2)
A = [ 1/sigmaX 0
0 1/sigmaY]
b = [-meanX/sigmaX -meanY/sigmaY]
If I did my algebra right (never a sure thing), I think I get the normalized values: (X-meanX)/sigmaX and (Y-meanY)/sigmaY
Matrix A is invertible, having a nonzero determinant = 1/(sigmaX*sigmaY)
Similarly, attribute weights would seemingly be just a diagonal matrix A with no b vector, which would be invertible if all the weights were nonzero.
So my naive guess would have been that normalization is affine. But I trust your knowledge much more than my own, and I'm clearly in over my head. Where did I go astray?
Thanks,
Keith
0 -
Hi Keith,
thanks for pointing this out. The cited definition of affine transformations as well as your example are both correct. Hence both, normalization and attribute weighting, are indeed affine transformations. This makes the cited scaling invariant clustering techniques quite worth-while to look at.
Sorry for the confusion my previous reply may have caused. I had an error in my thinking. I guess I have done to much text mining lately.
In text mining, examples (document vectors, i.e. rows in the data table) are typically normalized to unit lenght, i.e. the normalization there is row-wise, while usually normalizations refer to one attribute (column, variable) at a time over all examples (rows), i.e. in most non-text-mining cases, the normalization is column-wise. While the standard column-wise (attribute-wise) normalization is an affine transformation, the row-wise document length normalization to unit length is a non-affine transformation of the data. Or in other words: For the standard column-wise normalization, each example (row) is re-scaled with the same normalizing matrix A, while in the row-based normalization, each row is re-scaled with its own individual factor, which does not preserve co-linearities across data points (document vectors) and hence is no affine transformation. So, I mixed up a few things in my head...
Sorry for that.
So, the publications you cited are really worth a look. Thanks again for the links and for pointing this out.
Best regards,
Ralf0 -
That's very interesting and unexpected. I'll be very curious to see whether your further investigation determines whether this is a useful future addition to RM.Ralf Klinkenberg wrote:
thanks for pointing this out. The cited definition of affine transformations as well as your example are both correct. Hence both, normalization and attribute weighting, are indeed affine transformations. This makes the cited scaling invariant clustering techniques quite worth-while to look at.
Not a problem. You'll have to make a lot more errors to catch up to me! :-)
Sorry for the confusion my previous reply may have caused. I had an error in my thinking.
That makes sense. Two rather different computations that are both called "normalization", but one is affine, and the other isn't.
In text mining, examples (document vectors, i.e. rows in the data table) are typically normalized to unit lenght, i.e. the normalization there is row-wise, while usually normalizations refer to one attribute (column, variable) at a time over all examples (rows), i.e. in most non-text-mining cases, the normalization is column-wise. While the standard column-wise (attribute-wise) normalization is an affine transformation, the row-wise document length normalization to unit length is a non-affine transformation of the data. Or in other words: For the standard column-wise normalization, each example (row) is re-scaled with the same normalizing matrix A, while in the row-based normalization, each row is re-scaled with its own individual factor, which does not preserve co-linearities across data points (document vectors) and hence is no affine transformation. So, I mixed up a few things in my head...
Thanks,
Keith
0