GLM: weights vs. coefficients

kypexin
New Altair Community Member
Hi miners,
I am training GLM model for binary classification, so basically I perform logistic regression.
My question is, how do I interpret the relation between GLM model weights output and regression coefficients?
In many cases, they are exactly the same, but some differ, and some on a very high magnitude. For example, for one feature weight and regression coefficient both equal 1.841; then for another feature I observe weight 0.328 while regression coefficient is 0.0002; yet for another feature weight is -0.617 and coefficient is -0.001.
(I use regularisation so the whole coefficients / weights range is not that big, let's say roughly between 2 and -2).
I am training GLM model for binary classification, so basically I perform logistic regression.
My question is, how do I interpret the relation between GLM model weights output and regression coefficients?
In many cases, they are exactly the same, but some differ, and some on a very high magnitude. For example, for one feature weight and regression coefficient both equal 1.841; then for another feature I observe weight 0.328 while regression coefficient is 0.0002; yet for another feature weight is -0.617 and coefficient is -0.001.
(I use regularisation so the whole coefficients / weights range is not that big, let's say roughly between 2 and -2).
Tagged:
1
Best Answers
-
Hi,Where are the weights coming from? I assume from the weights port of the GLM operator? And are you looking at the "standardized" coefficients? The weights are simply the standardized coefficients and should be the same if you use the weights port of the GLM...Hope this helps,Ingo1
-
@kypexin Sorry for the confusion! I definitely misunderstood your initial question. Hopefully I can be more helpful here
If you are using the score on new raw data, then you will want to use the normal coefficients. The standardized coefficients are adjusted so they are comparable but they won't work to generate a score (unless you have normalized all the input data based on standard errors, which is very unlikely).
2
Answers
-
@kypexin Very interesting question!
In theory the GLM with the binomial link function/IRLSM and the logistic regression with IRLSM are the same, but only if all the other parameters are the same. See the attached simple example where you can confirm this:<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process" origin="GENERATED_TUTORIAL"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve Deals" origin="GENERATED_TUTORIAL" width="90" x="45" y="34"> <parameter key="repository_entry" value="//Samples/data/Deals"/> </operator> <operator activated="true" class="multiply" compatibility="9.2.001" expanded="true" height="103" name="Multiply" width="90" x="179" y="34"/> <operator activated="true" class="h2o:logistic_regression" compatibility="9.2.000" expanded="true" height="124" name="Logistic Regression" origin="GENERATED_TUTORIAL" width="90" x="313" y="34"> <parameter key="solver" value="AUTO"/> <parameter key="reproducible" value="true"/> <parameter key="maximum_number_of_threads" value="4"/> <parameter key="use_regularization" value="false"/> <parameter key="lambda_search" value="false"/> <parameter key="number_of_lambdas" value="0"/> <parameter key="lambda_min_ratio" value="0.0"/> <parameter key="early_stopping" value="true"/> <parameter key="stopping_rounds" value="3"/> <parameter key="stopping_tolerance" value="0.001"/> <parameter key="standardize" value="true"/> <parameter key="non-negative_coefficients" value="false"/> <parameter key="add_intercept" value="true"/> <parameter key="compute_p-values" value="true"/> <parameter key="remove_collinear_columns" value="true"/> <parameter key="missing_values_handling" value="MeanImputation"/> <parameter key="max_iterations" value="0"/> <parameter key="max_runtime_seconds" value="0"/> </operator> <operator activated="true" class="apply_model" compatibility="9.2.001" expanded="true" height="82" name="Apply Model" origin="GENERATED_TUTORIAL" width="90" x="447" y="34"> <list key="application_parameters"/> <parameter key="create_view" value="false"/> </operator> <operator activated="true" class="performance_classification" compatibility="9.2.001" expanded="true" height="82" name="Performance" origin="GENERATED_TUTORIAL" width="90" x="581" y="34"> <parameter key="main_criterion" value="first"/> <parameter key="accuracy" value="true"/> <parameter key="classification_error" value="false"/> <parameter key="kappa" value="false"/> <parameter key="weighted_mean_recall" value="false"/> <parameter key="weighted_mean_precision" value="false"/> <parameter key="spearman_rho" value="false"/> <parameter key="kendall_tau" value="false"/> <parameter key="absolute_error" value="false"/> <parameter key="relative_error" value="false"/> <parameter key="relative_error_lenient" value="false"/> <parameter key="relative_error_strict" value="false"/> <parameter key="normalized_absolute_error" value="false"/> <parameter key="root_mean_squared_error" value="false"/> <parameter key="root_relative_squared_error" value="false"/> <parameter key="squared_error" value="false"/> <parameter key="correlation" value="false"/> <parameter key="squared_correlation" value="false"/> <parameter key="cross-entropy" value="false"/> <parameter key="margin" value="false"/> <parameter key="soft_margin_loss" value="false"/> <parameter key="logistic_loss" value="false"/> <parameter key="skip_undefined_labels" value="true"/> <parameter key="use_example_weights" value="true"/> <list key="class_weights"/> </operator> <operator activated="true" class="h2o:generalized_linear_model" compatibility="9.2.000" expanded="true" height="124" name="Generalized Linear Model" width="90" x="313" y="187"> <parameter key="family" value="AUTO"/> <parameter key="link" value="family_default"/> <parameter key="solver" value="AUTO"/> <parameter key="reproducible" value="false"/> <parameter key="maximum_number_of_threads" value="4"/> <parameter key="use_regularization" value="false"/> <parameter key="lambda_search" value="false"/> <parameter key="number_of_lambdas" value="0"/> <parameter key="lambda_min_ratio" value="0.0"/> <parameter key="early_stopping" value="true"/> <parameter key="stopping_rounds" value="3"/> <parameter key="stopping_tolerance" value="0.001"/> <parameter key="standardize" value="true"/> <parameter key="non-negative_coefficients" value="false"/> <parameter key="add_intercept" value="true"/> <parameter key="compute_p-values" value="true"/> <parameter key="remove_collinear_columns" value="true"/> <parameter key="missing_values_handling" value="MeanImputation"/> <parameter key="max_iterations" value="0"/> <parameter key="specify_beta_constraints" value="false"/> <list key="beta_constraints"/> <parameter key="max_runtime_seconds" value="0"/> <list key="expert_parameters"/> </operator> <operator activated="true" class="apply_model" compatibility="9.2.001" expanded="true" height="82" name="Apply Model (2)" origin="GENERATED_TUTORIAL" width="90" x="447" y="187"> <list key="application_parameters"/> <parameter key="create_view" value="false"/> </operator> <operator activated="true" class="performance_classification" compatibility="9.2.001" expanded="true" height="82" name="Performance (2)" origin="GENERATED_TUTORIAL" width="90" x="581" y="187"> <parameter key="main_criterion" value="first"/> <parameter key="accuracy" value="true"/> <parameter key="classification_error" value="false"/> <parameter key="kappa" value="false"/> <parameter key="weighted_mean_recall" value="false"/> <parameter key="weighted_mean_precision" value="false"/> <parameter key="spearman_rho" value="false"/> <parameter key="kendall_tau" value="false"/> <parameter key="absolute_error" value="false"/> <parameter key="relative_error" value="false"/> <parameter key="relative_error_lenient" value="false"/> <parameter key="relative_error_strict" value="false"/> <parameter key="normalized_absolute_error" value="false"/> <parameter key="root_mean_squared_error" value="false"/> <parameter key="root_relative_squared_error" value="false"/> <parameter key="squared_error" value="false"/> <parameter key="correlation" value="false"/> <parameter key="squared_correlation" value="false"/> <parameter key="cross-entropy" value="false"/> <parameter key="margin" value="false"/> <parameter key="soft_margin_loss" value="false"/> <parameter key="logistic_loss" value="false"/> <parameter key="skip_undefined_labels" value="true"/> <parameter key="use_example_weights" value="true"/> <list key="class_weights"/> </operator> <connect from_op="Retrieve Deals" from_port="output" to_op="Multiply" to_port="input"/> <connect from_op="Multiply" from_port="output 1" to_op="Logistic Regression" to_port="training set"/> <connect from_op="Multiply" from_port="output 2" to_op="Generalized Linear Model" to_port="training set"/> <connect from_op="Logistic Regression" from_port="model" to_op="Apply Model" to_port="model"/> <connect from_op="Logistic Regression" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/> <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/> <connect from_op="Apply Model" from_port="model" to_port="result 2"/> <connect from_op="Performance" from_port="performance" to_port="result 1"/> <connect from_op="Generalized Linear Model" from_port="model" to_op="Apply Model (2)" to_port="model"/> <connect from_op="Generalized Linear Model" from_port="exampleSet" to_op="Apply Model (2)" to_port="unlabelled data"/> <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/> <connect from_op="Apply Model (2)" from_port="model" to_port="result 4"/> <connect from_op="Performance (2)" from_port="performance" to_port="result 3"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> <portSpacing port="sink_result 4" spacing="0"/> <portSpacing port="sink_result 5" spacing="0"/> </process> </operator> </process>
However, for these to match exactly, you do need to make sure all other options are the same (e.g., using the same lambda value if you are using regularization).
Additionally, since neither are direct solutions but involve iterative approximation, if you have a lot of predictors with shared covariance, it is also conceivable that you could get different coefficients due to random effects. Setting a random seed for both will ensure you are getting reproducible results (but still might not completely solve this issue). IIRC, the more shared covariance between predictor sets, the more unstable the coefficients will be (keep in mind the multicollinearity issues from linear regression which cause coefficient inflation, the same basic dynamic is at work here).5 -
Hi,Where are the weights coming from? I assume from the weights port of the GLM operator? And are you looking at the "standardized" coefficients? The weights are simply the standardized coefficients and should be the same if you use the weights port of the GLM...Hope this helps,Ingo1
-
Hi @Telcontar120
Thanks for the example; it makes a clear point, however this is not exactly what I was asking for. I am not trying to compare GLM and LR, but actually I have just one GLM model where I am comparing model coefficients with feature weights. I think Ingo's answer cleared it pretty well.1 -
Hi @IngoRM
Thanks for an advise, I was looking at the first column of coefficients (not standardized). In fact, std. coefficients and weights from GLM weights output port are the same, so I have my question answered.
However, I have now the second question: if I use derived coefficients for a regression equation (which for example I then put into code to make predictions on new data), should I actually use normal or standardized coefficients, or it won't make a difference? What I exactly mean, I am using the following formulas to calculate probability on new data:<div>p = exp(y*)/(1 + exp(y*)), where </div><div>y* = Log ( p / (1-p) ) = b0 + b1*x1 + b2*x2 + ... + bk*xk</div>
1 -
@kypexin Sorry for the confusion! I definitely misunderstood your initial question. Hopefully I can be more helpful here
If you are using the score on new raw data, then you will want to use the normal coefficients. The standardized coefficients are adjusted so they are comparable but they won't work to generate a score (unless you have normalized all the input data based on standard errors, which is very unlikely).
2 -
Thanks @Telcontar120 - pretty clear.1