"Understanding Linear Regression Model"
aborg
New Altair Community Member
Hello,
In Linear Regression operator there are the following columns in the resulting model:
Here is my simple process for investigation:
My problem is in the Wikipedia article the std error for example Height is 3.1539, and for the constant it is 8.63185, while in the RM results I see 0.961 and 1.558 respectively. I was curious whether I set some parameters wrong (I have changed the ridge to 0, so I think the Tikhonov regularization [http://en.wikipedia.org/wiki/Tikhonov_regularization] becomes normal linear regression, also no feature elimination.)
As I see the standard coefficient is computed like this https://github.com/aborg0/RapidMiner-Unuk/blob/master/src/com/rapidminer/operator/learner/functions/linear/LinearRegression.java#L329:
coeff*stddev/mean
What does this mean? When this is useful?
(I have also checked the code for the std. error, but it is much harder to interpret, and it seems it has no obvious connection to the wikipedia definition. In the Tikhonov regularization I could not find the formula for std error.)
Could you help me understanding these results?
Thanks, gabor
In Linear Regression operator there are the following columns in the resulting model:
- Attribute
- Coeffiicient
- Std. Error
- Std. Coefficient
- Tolerance
- t-Stat
- p-Value
- Code
Here is my simple process for investigation:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>The data (from Wikipedia http://en.wikipedia.org/wiki/Simple_linear_regression#Numerical_example):
<process version="5.2.008">
<context>
<input>
<location>//NewLocalRepository/wiki_regression_example_mass_height</location>
</input>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="427" width="675">
<operator activated="true" class="linear_regression" compatibility="5.2.008" expanded="true" height="94" name="Linear Regression" width="90" x="179" y="30">
<parameter key="feature_selection" value="none"/>
<parameter key="eliminate_colinear_features" value="false"/>
<parameter key="ridge" value="0.0"/>
</operator>
<connect from_port="input 1" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_port="result 1"/>
<connect from_op="Linear Regression" from_port="weights" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
<object-stream>(You can store it in your repository and use it as an input.)
<com.rapidminer.example.set.SimpleExampleSet id="1" serialization="custom">
<com.rapidminer.operator.AbstractIOObject>
<default>
<source>Linear Regression</source>
</default>
</com.rapidminer.operator.AbstractIOObject>
<com.rapidminer.operator.ResultObjectAdapter>
<default>
<annotations id="2">
<keyValueMap id="3"/>
</annotations>
</default>
</com.rapidminer.operator.ResultObjectAdapter>
<com.rapidminer.example.set.AbstractExampleSet>
<default>
<idMap id="4"/>
<statisticsMap id="5">
<entry>
<string>Mass</string>
<linked-list id="6">
<NumericalStatistics id="7">
<sum>931.1700000000001</sum>
<squaredSum>58498.5439</squaredSum>
<valueCounter>15</valueCounter>
</NumericalStatistics>
<WeightedNumericalStatistics id="8">
<sum>931.1700000000001</sum>
<squaredSum>58498.5439</squaredSum>
<totalWeight>15.0</totalWeight>
<count>15.0</count>
</WeightedNumericalStatistics>
<com.rapidminer.example.MinMaxStatistics id="9">
<minimum>52.21</minimum>
<maximum>74.46</maximum>
</com.rapidminer.example.MinMaxStatistics>
<UnknownStatistics id="10">
<unknownCounter>0</unknownCounter>
</UnknownStatistics>
</linked-list>
</entry>
<entry>
<string>Height</string>
<linked-list id="11">
<NumericalStatistics id="12">
<sum>24.759999999999998</sum>
<squaredSum>41.0532</squaredSum>
<valueCounter>15</valueCounter>
</NumericalStatistics>
<WeightedNumericalStatistics id="13">
<sum>24.759999999999998</sum>
<squaredSum>41.0532</squaredSum>
<totalWeight>15.0</totalWeight>
<count>15.0</count>
</WeightedNumericalStatistics>
<com.rapidminer.example.MinMaxStatistics id="14">
<minimum>1.47</minimum>
<maximum>1.83</maximum>
</com.rapidminer.example.MinMaxStatistics>
<UnknownStatistics id="15">
<unknownCounter>0</unknownCounter>
</UnknownStatistics>
</linked-list>
</entry>
</statisticsMap>
</default>
</com.rapidminer.example.set.AbstractExampleSet>
<com.rapidminer.example.set.SimpleExampleSet>
<default>
<attributes class="SimpleAttributes" id="16">
<attributes class="linked-list" id="17">
<AttributeRole id="18">
<special>false</special>
<attribute class="NumericalAttribute" id="19" serialization="custom">
<com.rapidminer.example.table.AbstractAttribute>
<default>
<annotations id="20">
<keyValueMap id="21"/>
</annotations>
<attributeDescription id="22">
<name>Height</name>
<valueType>2</valueType>
<blockType>1</blockType>
<defaultValue>0.0</defaultValue>
<index>0</index>
</attributeDescription>
<constructionDescription>Height</constructionDescription>
<statistics class="linked-list" id="23">
<NumericalStatistics id="24">
<sum>24.759999999999998</sum>
<squaredSum>41.0532</squaredSum>
<valueCounter>15</valueCounter>
</NumericalStatistics>
<WeightedNumericalStatistics id="25">
<sum>24.759999999999998</sum>
<squaredSum>41.0532</squaredSum>
<totalWeight>15.0</totalWeight>
<count>15.0</count>
</WeightedNumericalStatistics>
<com.rapidminer.example.MinMaxStatistics id="26">
<minimum>1.47</minimum>
<maximum>1.83</maximum>
</com.rapidminer.example.MinMaxStatistics>
<UnknownStatistics id="27">
<unknownCounter>0</unknownCounter>
</UnknownStatistics>
</statistics>
<transformations id="28"/>
</default>
</com.rapidminer.example.table.AbstractAttribute>
</attribute>
</AttributeRole>
<AttributeRole id="29">
<special>true</special>
<specialName>label</specialName>
<attribute class="NumericalAttribute" id="30" serialization="custom">
<com.rapidminer.example.table.AbstractAttribute>
<default>
<annotations id="31">
<keyValueMap id="32"/>
</annotations>
<attributeDescription id="33">
<name>Mass</name>
<valueType>2</valueType>
<blockType>1</blockType>
<defaultValue>0.0</defaultValue>
<index>1</index>
</attributeDescription>
<constructionDescription>Mass</constructionDescription>
<statistics class="linked-list" id="34">
<NumericalStatistics id="35">
<sum>931.1700000000001</sum>
<squaredSum>58498.5439</squaredSum>
<valueCounter>15</valueCounter>
</NumericalStatistics>
<WeightedNumericalStatistics id="36">
<sum>931.1700000000001</sum>
<squaredSum>58498.5439</squaredSum>
<totalWeight>15.0</totalWeight>
<count>15.0</count>
</WeightedNumericalStatistics>
<com.rapidminer.example.MinMaxStatistics id="37">
<minimum>52.21</minimum>
<maximum>74.46</maximum>
</com.rapidminer.example.MinMaxStatistics>
<UnknownStatistics id="38">
<unknownCounter>0</unknownCounter>
</UnknownStatistics>
</statistics>
<transformations id="39"/>
</default>
</com.rapidminer.example.table.AbstractAttribute>
</attribute>
</AttributeRole>
</attributes>
</attributes>
<exampleTable class="com.rapidminer.example.table.MemoryExampleTable" id="40">
<attributes id="41">
<NumericalAttribute id="42" serialization="custom">
<com.rapidminer.example.table.AbstractAttribute>
<default>
<annotations id="43">
<keyValueMap id="44"/>
</annotations>
<attributeDescription reference="22"/>
<constructionDescription>Height</constructionDescription>
<statistics class="linked-list" id="45">
<NumericalStatistics id="46">
<sum>0.0</sum>
<squaredSum>0.0</squaredSum>
<valueCounter>0</valueCounter>
</NumericalStatistics>
<WeightedNumericalStatistics id="47">
<sum>0.0</sum>
<squaredSum>0.0</squaredSum>
<totalWeight>0.0</totalWeight>
<count>0.0</count>
</WeightedNumericalStatistics>
<com.rapidminer.example.MinMaxStatistics id="48">
<minimum>Infinity</minimum>
<maximum>-Infinity</maximum>
</com.rapidminer.example.MinMaxStatistics>
<UnknownStatistics id="49">
<unknownCounter>0</unknownCounter>
</UnknownStatistics>
</statistics>
<transformations id="50"/>
</default>
</com.rapidminer.example.table.AbstractAttribute>
</NumericalAttribute>
<NumericalAttribute id="51" serialization="custom">
<com.rapidminer.example.table.AbstractAttribute>
<default>
<annotations id="52">
<keyValueMap id="53"/>
</annotations>
<attributeDescription reference="33"/>
<constructionDescription>Mass</constructionDescription>
<statistics class="linked-list" id="54">
<NumericalStatistics id="55">
<sum>0.0</sum>
<squaredSum>0.0</squaredSum>
<valueCounter>0</valueCounter>
</NumericalStatistics>
<WeightedNumericalStatistics id="56">
<sum>0.0</sum>
<squaredSum>0.0</squaredSum>
<totalWeight>0.0</totalWeight>
<count>0.0</count>
</WeightedNumericalStatistics>
<com.rapidminer.example.MinMaxStatistics id="57">
<minimum>Infinity</minimum>
<maximum>-Infinity</maximum>
</com.rapidminer.example.MinMaxStatistics>
<UnknownStatistics id="58">
<unknownCounter>0</unknownCounter>
</UnknownStatistics>
</statistics>
<transformations id="59"/>
</default>
</com.rapidminer.example.table.AbstractAttribute>
</NumericalAttribute>
</attributes>
<unusedColumnList class="linked-list" id="60"/>
<dataList id="61">
<com.rapidminer.example.table.DoubleArrayDataRow id="62">
<data id="63">
<double>1.47</double>
<double>52.21</double>
</data>
</com.rapidminer.example.table.DoubleArrayDataRow>
<com.rapidminer.example.table.DoubleArrayDataRow id="64">
<data id="65">
<double>1.5</double>
<double>53.12</double>
</data>
</com.rapidminer.example.table.DoubleArrayDataRow>
<com.rapidminer.example.table.DoubleArrayDataRow id="66">
<data id="67">
<double>1.52</double>
<double>54.48</double>
</data>
</com.rapidminer.example.table.DoubleArrayDataRow>
<com.rapidminer.example.table.DoubleArrayDataRow id="68">
<data id="69">
<double>1.55</double>
<double>55.84</double>
</data>
</com.rapidminer.example.table.DoubleArrayDataRow>
<com.rapidminer.example.table.DoubleArrayDataRow id="70">
<data id="71">
<double>1.57</double>
<double>57.2</double>
</data>
</com.rapidminer.example.table.DoubleArrayDataRow>
<com.rapidminer.example.table.DoubleArrayDataRow id="72">
<data id="73">
<double>1.6</double>
<double>58.57</double>
</data>
</com.rapidminer.example.table.DoubleArrayDataRow>
<com.rapidminer.example.table.DoubleArrayDataRow id="74">
<data id="75">
<double>1.63</double>
<double>59.93</double>
</data>
</com.rapidminer.example.table.DoubleArrayDataRow>
<com.rapidminer.example.table.DoubleArrayDataRow id="76">
<data id="77">
<double>1.65</double>
<double>61.29</double>
</data>
</com.rapidminer.example.table.DoubleArrayDataRow>
<com.rapidminer.example.table.DoubleArrayDataRow id="78">
<data id="79">
<double>1.68</double>
<double>63.11</double>
</data>
</com.rapidminer.example.table.DoubleArrayDataRow>
<com.rapidminer.example.table.DoubleArrayDataRow id="80">
<data id="81">
<double>1.7</double>
<double>64.47</double>
</data>
</com.rapidminer.example.table.DoubleArrayDataRow>
<com.rapidminer.example.table.DoubleArrayDataRow id="82">
<data id="83">
<double>1.73</double>
<double>66.28</double>
</data>
</com.rapidminer.example.table.DoubleArrayDataRow>
<com.rapidminer.example.table.DoubleArrayDataRow id="84">
<data id="85">
<double>1.75</double>
<double>68.1</double>
</data>
</com.rapidminer.example.table.DoubleArrayDataRow>
<com.rapidminer.example.table.DoubleArrayDataRow id="86">
<data id="87">
<double>1.78</double>
<double>69.92</double>
</data>
</com.rapidminer.example.table.DoubleArrayDataRow>
<com.rapidminer.example.table.DoubleArrayDataRow id="88">
<data id="89">
<double>1.8</double>
<double>72.19</double>
</data>
</com.rapidminer.example.table.DoubleArrayDataRow>
<com.rapidminer.example.table.DoubleArrayDataRow id="90">
<data id="91">
<double>1.83</double>
<double>74.46</double>
</data>
</com.rapidminer.example.table.DoubleArrayDataRow>
</dataList>
<columns>2</columns>
</exampleTable>
</default>
</com.rapidminer.example.set.SimpleExampleSet>
</com.rapidminer.example.set.SimpleExampleSet>
</object-stream>
My problem is in the Wikipedia article the std error for example Height is 3.1539, and for the constant it is 8.63185, while in the RM results I see 0.961 and 1.558 respectively. I was curious whether I set some parameters wrong (I have changed the ridge to 0, so I think the Tikhonov regularization [http://en.wikipedia.org/wiki/Tikhonov_regularization] becomes normal linear regression, also no feature elimination.)
As I see the standard coefficient is computed like this https://github.com/aborg0/RapidMiner-Unuk/blob/master/src/com/rapidminer/operator/learner/functions/linear/LinearRegression.java#L329:
coeff*stddev/mean
What does this mean? When this is useful?
(I have also checked the code for the std. error, but it is much harder to interpret, and it seems it has no obvious connection to the wikipedia definition. In the Tikhonov regularization I could not find the formula for std error.)
Could you help me understanding these results?
Thanks, gabor
Tagged:
0