A program to recognize and reward our most engaged community members
keith wrote:FWIW, some poking around online with the famous search engine did uncover a few papers that seem to be addressing the same (or similar) problems:http://jorlin.scripts.mit.edu/docs/publications/115-ScaleInvariantClustering.pdfhttp://www.robots.ox.ac.uk/~vgg/publications/papers/fitzgibbon02.pdf...
Ralf Klinkenberg wrote:The cited scale-invariant clustering techniques are only invariant to affine transformations, but not invariant to normalization and arbitrary weighitng of attributes, because these are non-affine data transformations.
An affine transformation of R(2) is a function t:R(2) -> R(2) of the formt(x) = A x + bwhere A is an invertible 2x2 matrix and b is in R(2)(Affine geometry can be defined in R(n) for any n>=2; we restrict our attention here to the case when n=2)
Ralf Klinkenberg wrote:thanks for pointing this out. The cited definition of affine transformations as well as your example are both correct. Hence both, normalization and attribute weighting, are indeed affine transformations. This makes the cited scaling invariant clustering techniques quite worth-while to look at.
Sorry for the confusion my previous reply may have caused. I had an error in my thinking.
In text mining, examples (document vectors, i.e. rows in the data table) are typically normalized to unit lenght, i.e. the normalization there is row-wise, while usually normalizations refer to one attribute (column, variable) at a time over all examples (rows), i.e. in most non-text-mining cases, the normalization is column-wise. While the standard column-wise (attribute-wise) normalization is an affine transformation, the row-wise document length normalization to unit length is a non-affine transformation of the data. Or in other words: For the standard column-wise normalization, each example (row) is re-scaled with the same normalizing matrix A, while in the row-based normalization, each row is re-scaled with its own individual factor, which does not preserve co-linearities across data points (document vectors) and hence is no affine transformation. So, I mixed up a few things in my head...