Logistic Regression - Normalization does not change Attribute Weights
Hello,
I am new here and in general with statistics and data mining. Apologies if I am asking a really stupid question.
My question is about logistic regression and normalizing data. I have a data set with some columns skewed and have different scales. So I wanted to apply normalization (including centering, scaling and Box Cox transformation for skewness) prior to logistic regression. But instead I wanted to check to what extent normalization changes the results.
I see that normalization prior to logistic regression changes the coefficients however attribute weights are exactly same with and without normalization. Am I missing something here?
Attached you can find my design for the analysis. (Logistic Regression and Normalization added with default settings)
Find more posts tagged with
By default the operator Logistic Regression normalizes the data (but uses the word standardize instead of normalize). Uncheck the option 'standardize'. It does make a difference to the coefficients whether you normalize or not. Check the process below
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve Sonar" width="90" x="246" y="187">
<parameter key="repository_entry" value="//Samples/data/Sonar"/>
</operator>
<operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="447" y="187"/>
<operator activated="true" class="normalize" compatibility="8.0.001" expanded="true" height="103" name="Normalize" width="90" x="648" y="340"/>
<operator activated="true" class="h2o:logistic_regression" compatibility="7.6.001" expanded="true" height="124" name="Logistic Regression (2)" width="90" x="849" y="340">
<parameter key="standardize" value="false"/>
</operator>
<operator activated="true" class="h2o:logistic_regression" compatibility="7.6.001" expanded="true" height="124" name="Logistic Regression" width="90" x="849" y="187">
<parameter key="standardize" value="false"/>
</operator>
<connect from_op="Retrieve Sonar" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Logistic Regression" to_port="training set"/>
<connect from_op="Multiply" from_port="output 2" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Logistic Regression (2)" to_port="training set"/>
<connect from_op="Logistic Regression (2)" from_port="model" to_port="result 2"/>
<connect from_op="Logistic Regression" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Try outputting the PRE port on the Normalization operator, that will tell you how it's normalizing the data.
By default the operator Logistic Regression normalizes the data (but uses the word standardize instead of normalize). Uncheck the option 'standardize'. It does make a difference to the coefficients whether you normalize or not. Check the process below
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve Sonar" width="90" x="246" y="187">
<parameter key="repository_entry" value="//Samples/data/Sonar"/>
</operator>
<operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="447" y="187"/>
<operator activated="true" class="normalize" compatibility="8.0.001" expanded="true" height="103" name="Normalize" width="90" x="648" y="340"/>
<operator activated="true" class="h2o:logistic_regression" compatibility="7.6.001" expanded="true" height="124" name="Logistic Regression (2)" width="90" x="849" y="340">
<parameter key="standardize" value="false"/>
</operator>
<operator activated="true" class="h2o:logistic_regression" compatibility="7.6.001" expanded="true" height="124" name="Logistic Regression" width="90" x="849" y="187">
<parameter key="standardize" value="false"/>
</operator>
<connect from_op="Retrieve Sonar" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Logistic Regression" to_port="training set"/>
<connect from_op="Multiply" from_port="output 2" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Logistic Regression (2)" to_port="training set"/>
<connect from_op="Logistic Regression (2)" from_port="model" to_port="result 2"/>
<connect from_op="Logistic Regression" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Try outputting the PRE port on the Normalization operator, that will tell you how it's normalizing the data.