Trying to understand MLP output
herbert12345
New Altair Community Member
Hi,
I am currently trying to understand the output of the W-MultilayerPerceptron operator. Let us consider a toy model without hidden layers. Output might look like this.
Thanks for your reply
I am currently trying to understand the output of the W-MultilayerPerceptron operator. Let us consider a toy model without hidden layers. Output might look like this.
From my understanding this should be equivalent to a linear regression. So I train a LinearRegression model with the same input data using the results from the above "MLP" as label (in order to rule out differences in the fitting algorithm). Results show that the model indeed reproduces the results from the "MLP" perfectly. The coefficients however are completely different:
Linear Node 0
Inputs Weights
Threshold 0.4052907755005098
Attrib O3 -0.2617907901506467
Attrib NO2 -0.05083306647141619
Attrib Altitude -0.14881316186685326
Attrib z 0.35660878655615114
Attrib sza_rad -0.44846864905805994
Class
Input
Node 0
I assume that this is because of the normalization done in the MLP operator. So here's the question: Assume I want to implement the above "MLP" into my own code: How must I process my data and the results?
- 0.0000070221 * O3
- 0.0000717637 * NO2
- 0.0004435178 * Altitude
+ 0.0003188475 * z
- 0.0040543204 * SZA*pi/180.
+ 0.0145570907
Thanks for your reply
Tagged:
0
Answers
-
From my understanding Linear Regression and a Single Layer Perceptron should produce different weight values.
A single layer perceptron starts with random weights.
Takes a single data points.
Propagates the input forward in the network.
Calculates the error.
Finds the weight gradient that minimizes the error.
Moves the weights in the direction of the gradient according to the learning speed.
Repeat.
Linear regression calculates the optimal weights in closed form.
At data normalisation.
The Neural Net has the option to turn of the data normalisation.
I think you could also normalise your data, so nothing changes.
using: (value - min) / (max - min)
0 -
Thank you for your reply.
I understand that they might go different ways to obtain their weights. But assuming a fair amount of convergence, the weights should end up being about the same. Up to normalization that is. Indeed I manage to make them the same by turning on the "I" and "C"-options in the W-MLP operator.
I think I have managed to understand how things work by now. The problem was in part caused by a misunderstanding of mine as to how things work. Still it troubles me that the W-MLP output is not complete in the sense that the normalization employed is not documented. (I believe now that it normalizes both attributes and labels to the interval [-1,1] using 2*(value-min)/(max-min)-1).
What bothers me though is that my final model (i.e. with hidden layers) appears to have a certain bias. Well, I guess I can fix that.
Thanks for helping0 -
I believe this is standard when tanh sigmoid functions are used:
2*(value-min)/(max-min)-1 [-1,1]
When the normal sigoid, which is 1 / 1 + exp(-x) is used, its normalised to
(value-min)/(max-min) [0, 1]
This is indeed poorly documented.
Should I take a look in WEKA's source code? Or the RM source code?
What you mean that final model have a certain bias?
Don't all learners have a certain bias?
edit:
this link very shortly mentions normalisation:
http://en.wikiversity.org/wiki/Learning_and_neural_networks0 -
This kind of makes sense. Although through exerimentation I found that the only way to get things right is to normalize to [-1,1] and use standard sigmoid nodes as in 1/(1+exp(-x)). Maybe a look into the source code might help to clear things up.
About the bias: Looking closer I see that for some reason the prediction is actually wrong by a linear map, that is I get good correlations (as in 0.999...) but scatter plots show that the model is rather off. This could easily be fixed by applying a linear model in post of course but I think it is strange. d
Edit: My fault. Shouldn't wonder about offsets if training data and validation data are processed in different ways ... :-[0