remove uncorrelated attributes with respect to label attribute
wessel
New Altair Community Member
Hello,
How do I remove uncorrelated attributes with respect to my label attribute?
RemoveCorrelatedFeatures seems to remove intercorrelated features, instead of features related to the label attribute.
Also when I make a CorrelationMatrix the label attribute doesn't show up.
I guess I don't want to make Matrix, just 1 single row, which has pairwise correlation with my label attribute.
Regards,
Wessel
How do I remove uncorrelated attributes with respect to my label attribute?
RemoveCorrelatedFeatures seems to remove intercorrelated features, instead of features related to the label attribute.
Also when I make a CorrelationMatrix the label attribute doesn't show up.
I guess I don't want to make Matrix, just 1 single row, which has pairwise correlation with my label attribute.
Regards,
Wessel
Tagged:
0
Answers
-
Hi Wessel,
Is this the sort of thing?<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
</operator>
<operator name="CorrelationMatrix" class="CorrelationMatrix">
<parameter key="create_weights" value="true"/>
</operator>
<operator name="AttributeWeightSelection" class="AttributeWeightSelection">
</operator>
</operator>0 -
Maybe.
But how can I see if its working properly?
How is CorrelationMatrix ranking attributes?
I think intercorrelation, but I might be wrong. (A lot of redundancy a is bad)
I guess what I want it correlation with respect to label. (Predictive power is good)
Because I'm working with weather data, I have some expectations of the outcome.
I expect wind-23, wind-47, wind-71, wind-94 to have the biggest auto correlation.
But 47 and 71 are not in the top 10!
So I think its calculating inter correlation,
because its returning attributes that are on the sides of my attribute interval 23-95.
Obviously they have less inter correlation (redundancy) that attributes in the middle.
wind-23 0.8861533285392513
wind-95 0.8828218809552825
wind-24 0.8738616064506365
wind-94 0.8707726262127967
wind-25 0.8634980953805299
wind-93 0.8607096030931946
wind-26 0.855195678341873
wind-92 0.8526740358567213
wind-27 0.8488826077290765
wind-91 0.8465315199805834
0 -
Hi,
unfortunately CorrelationMatrix does not incorporate the label column. I think we will add an correlation based weighting in the next version.
From a data miners perspective, the choice of a correlation for removing attributes is not always suitable. Take a look at the image at http://en.wikipedia.org/wiki/Correlation for getting an impression, why correlation might be a bad thing. Most of these clear dependencies can be discovered and used by a learner, although the correlation is 0!
Greetings,
Sebastian0 -
Yes, but ...
I want to use correlation on 1 single attribute.
And this single attribute gets multiplied 100 times, when I use a history of 100.
att1-0, att1-1, ..., att-100
Now correlation is a good measure to find out att1-.* that have high auto correlation, predictive poewr0 -
Hi,
sorry if I missunderstood you, but then a constant attribute would be the best attribute? I mean, if it's the label attribute, then data mining becomes really easy If not, this attribute doesn't say anything about the label?
Curious,
Sebastian0 -
Yes, in Time Series Data this is a bit confusing.
If you have a better suggestion for names please.
You have multiple things you measure, multiple attributes.
So lets you measure 2 things, x and y:
x y
------
x0 y0
x1 y1
x2 y2
x3 y3
x4 y4
But after you convert this time series data into windowed examples:
So now you have 4 attributes, and 1 label attribute "x-0"
x-0 x-2 y-2 x-3 y-3
------------------------
x3 x1 x1 x0 y0
x4 x2 x2 x1 y1
now you can learn the function:
x-0 = f(x-2, y-2, x-3, y-3)
but when you take a really big window
you get a lot more attributes
and it becomes infeasible to do CFS att selection on them all
so then I want to grab all x-.* attributes
learn some kind of auto regression function, preferably also use moving average smoothing
x-0 = f(x-2, x-3, ..., x-10000)
Then use this auto regression function which has found a trend and seasonal probably and construct a new att in my database.
So then I can learn
x-0 = f(x-2, y-2, x-3, y-3, seasonal)0 -
Hi again,
I think I have understood now, what you are aiming at. But on the other hand, I don't see now, where you need the correlation...
The way you are proposing seems to me to be equivalent to an additive regression with the first model learned only on the past label values. Although the AdditiveRegression in RM would not cope with that, you could easily simulate it using an AttributeConstruction.
Greetings,
Sebastian0 -
I found this nice picture here:
http://upload.wikimedia.org/wikipedia/commons/8/84/Acf.svg
Sin with noise signal on top
Auto correlation on the bottom
And in R, the function acf and pacf can be used to produce such a plot.0