Vector attributes

bazeusz · July 2009

Hey,

I've got one question regarding attribute representation. I'm doing some image preprocessing (feature extraction) and some of extracted attributes (features) are vectors. I mean e.g.

Attribute 1 (Density): [12, 23, 23, 54, 2, 43, 6]
Attribute 2 (OffsetN): [32, 45, 3]
Attribute 3 (OffsetS): [3, 5, 2, 1, 43, 1, 2]
Attribute 4: 12
.
.
.
Attribute N ...

How to deal with this kind of attributes? E.g. I don't want a tree learner to split by a single value but by a whole attribute (vector). I thought it has something to do with value_series but either I cannot set it up properlu or it's not what I need ???

Thanks,
baze

land · July 2009

Hi,
a tree learner compares the values of one single attribute and splits up in two groups: Greater or Smaller/Equal. But this unfortunately implies, that it cannot cope with vector valued data. Which Vector is greater than another? You cannot say in more than 1 dimension...
Even if you would provide it using a value series, it wont work at all. For most learners, you will have to find a transformation into a tabular format, with single values. If you store the vector values as single attributes, for example Attribute1_1 .. Attribute1_7, you might use the AttributeConstruction in order to calculate complex measures for comparing the vectors. For example the distance to a hyperplane. This value then could be used for the decision tree later on, while the original data attributes could be filtered out, so that they don't disturb the learning process...

Greetings,
Sebastian

bazeusz · July 2009

Hey Sebastian,

thanks for your reply.

In terms of trees I was thinking about calculating a mean vector for every feature and split based on a distance from the mean. Well, not sure if it's much of a sense in doing this.

Anyway, what would be the aml representetion for this kind of attributes? Can vector valued data be treated as one when it goes to plots (variance, std.dev. etc.)?

br,
Piotr

fischer · July 2009

Hi,

I'm not absolutely sure what you are trying to achieve. If you are trying to compute the "mean vector" over all examples in you set, you can have one attribute per vector dimension and use the Aggregation operator to compute the mean and the means as new attributes to the old example set using a Cartesian operator. Then you can compute the distance from this mean using an AttributeConstruction. Whether or not this distance from the mean vector is a useful attribute very much depends on your domain.

Cheers,
Simon

Vector attributes

Answers

Categories