"Details on Vector Linear Regression"

New Altair Community Member
Updated by Jocelyn
Hi
I'm doing a very simple regression with RapidMiner. I have tried several regression-models, but the 'Vector Linear Regression' outperforms all of them significantly. So I am now wondering why. I have looked up the docs on https://docs.rapidminer.com/latest/studio/operators/modeling/predictive/functions/vector_linear_regr , but I don't really understand the idea. Even on Google I coulndt find any valuable information about a 'Vector Linear Regression'. So can you share some details on how this algorithm works? I would be interested in a bit more detailed info, e.g. pseudo-code...
BR
Alex
I'm doing a very simple regression with RapidMiner. I have tried several regression-models, but the 'Vector Linear Regression' outperforms all of them significantly. So I am now wondering why. I have looked up the docs on https://docs.rapidminer.com/latest/studio/operators/modeling/predictive/functions/vector_linear_regr , but I don't really understand the idea. Even on Google I coulndt find any valuable information about a 'Vector Linear Regression'. So can you share some details on how this algorithm works? I would be interested in a bit more detailed info, e.g. pseudo-code...
BR
Alex
Find more posts tagged with
Sort by:
1 - 9 of
91

SGolbert
New Altair Community Member
Accepted Answer
Hi @castmonkeys
Are you talking about SVM or about Vector Linear Regression?
Vector Linear Regression is just linear regression applied to multiple labels. It is equivalent to regressing each label separately. It is not comparable to most model types in RapidMiner, which take only one label.
Regards
Sebastian

New Altair Community Member
OPUpdated by castmonkeys
Hi @SGolbert
Thanks for the response!
I am talking about a Vector Linear Regression, sry for the confusion in the title. (I corrected it)
Okay so now I get the idea of a Vector Linear Regression. But what I still don't get is why it performs much better than a simple Linear Regression, although there's only ONE label in my dataset.
BR
Alex
Thanks for the response!
I am talking about a Vector Linear Regression, sry for the confusion in the title. (I corrected it)
Okay so now I get the idea of a Vector Linear Regression. But what I still don't get is why it performs much better than a simple Linear Regression, although there's only ONE label in my dataset.
BR
Alex
Hello
Ignore this comment as I compared Simple linear regression and Support vector regression. The reason it is performing better is it is more flexible compared to a linear regression algorithm. It takes non-linearity in the distribution of data and overfitting while building model which linear regression does not.
Ignore this comment as I compared Simple linear regression and Support vector regression. The reason it is performing better is it is more flexible compared to a linear regression algorithm. It takes non-linearity in the distribution of data and overfitting while building model which linear regression does not.
Hi @varunm1
I didn't find a reference to the non-linearity part (are you refering to SVM?). I ran a sample process comparing Vector Linear Regression with Linear Regression without feature selection and covariable elimination, and I obtain the same predictions:
<?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Polynomial" origin="GENERATED_TUTORIAL" width="90" x="45" y="187">
<parameter key="repository_entry" value="//Samples/data/Polynomial"/>
</operator>
<operator activated="true" class="split_data" compatibility="9.2.000" expanded="true" height="103" name="Split Data" origin="GENERATED_TUTORIAL" width="90" x="112" y="340">
<enumeration key="partitions">
<parameter key="ratio" value="0.5"/>
<parameter key="ratio" value="0.5"/>
</enumeration>
<parameter key="sampling_type" value="automatic"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
<operator activated="true" class="multiply" compatibility="9.2.000" expanded="true" height="103" name="Multiply" width="90" x="346" y="263"/>
<operator activated="true" class="vector_linear_regression" compatibility="9.2.000" expanded="true" height="82" name="Vector Linear Regression" origin="GENERATED_TUTORIAL" width="90" x="581" y="187">
<parameter key="use_bias" value="true"/>
<parameter key="ridge" value="1.0E-8"/>
</operator>
<operator activated="true" class="linear_regression" compatibility="9.2.000" expanded="true" height="103" name="Linear Regression" width="90" x="581" y="493">
<parameter key="feature_selection" value="none"/>
<parameter key="alpha" value="0.05"/>
<parameter key="max_iterations" value="10"/>
<parameter key="forward_alpha" value="0.05"/>
<parameter key="backward_alpha" value="0.05"/>
<parameter key="eliminate_colinear_features" value="false"/>
<parameter key="min_tolerance" value="0.05"/>
<parameter key="use_bias" value="true"/>
<parameter key="ridge" value="1.0E-8"/>
</operator>
<operator activated="true" class="multiply" compatibility="9.2.000" expanded="true" height="103" name="Multiply (2)" width="90" x="246" y="493"/>
<operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model (2)" origin="GENERATED_TUTORIAL" width="90" x="715" y="646">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" origin="GENERATED_TUTORIAL" width="90" x="715" y="232">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<connect from_op="Polynomial" from_port="output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Multiply" to_port="input"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Multiply (2)" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Vector Linear Regression" to_port="training set"/>
<connect from_op="Multiply" from_port="output 2" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Vector Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Multiply (2)" from_port="output 1" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Multiply (2)" from_port="output 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="90"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Polynomial" origin="GENERATED_TUTORIAL" width="90" x="45" y="187">
<parameter key="repository_entry" value="//Samples/data/Polynomial"/>
</operator>
<operator activated="true" class="split_data" compatibility="9.2.000" expanded="true" height="103" name="Split Data" origin="GENERATED_TUTORIAL" width="90" x="112" y="340">
<enumeration key="partitions">
<parameter key="ratio" value="0.5"/>
<parameter key="ratio" value="0.5"/>
</enumeration>
<parameter key="sampling_type" value="automatic"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
<operator activated="true" class="multiply" compatibility="9.2.000" expanded="true" height="103" name="Multiply" width="90" x="346" y="263"/>
<operator activated="true" class="vector_linear_regression" compatibility="9.2.000" expanded="true" height="82" name="Vector Linear Regression" origin="GENERATED_TUTORIAL" width="90" x="581" y="187">
<parameter key="use_bias" value="true"/>
<parameter key="ridge" value="1.0E-8"/>
</operator>
<operator activated="true" class="linear_regression" compatibility="9.2.000" expanded="true" height="103" name="Linear Regression" width="90" x="581" y="493">
<parameter key="feature_selection" value="none"/>
<parameter key="alpha" value="0.05"/>
<parameter key="max_iterations" value="10"/>
<parameter key="forward_alpha" value="0.05"/>
<parameter key="backward_alpha" value="0.05"/>
<parameter key="eliminate_colinear_features" value="false"/>
<parameter key="min_tolerance" value="0.05"/>
<parameter key="use_bias" value="true"/>
<parameter key="ridge" value="1.0E-8"/>
</operator>
<operator activated="true" class="multiply" compatibility="9.2.000" expanded="true" height="103" name="Multiply (2)" width="90" x="246" y="493"/>
<operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model (2)" origin="GENERATED_TUTORIAL" width="90" x="715" y="646">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" origin="GENERATED_TUTORIAL" width="90" x="715" y="232">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<connect from_op="Polynomial" from_port="output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Multiply" to_port="input"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Multiply (2)" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Vector Linear Regression" to_port="training set"/>
<connect from_op="Multiply" from_port="output 2" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Vector Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Multiply (2)" from_port="output 1" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Multiply (2)" from_port="output 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="90"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Regards
Sebastian
Hi @varunm1
they are not the same, although pointing the difference is not as straightforward as I thought. To begin with, Vector Linear Regression on one label is just ordinary linear regression. The question is then what is the difference between SVR and linear regression.
I've found this discussion on ResearchGate:
This exceeds my knowledge of SVR, but it is clear that the cost function is different. Then factors just as sparcity and presence of outliers would dictate which of the two is better for a given problem. Maybe @IngoRM and @mschmitz can provide further insights.
Regards,
Sebastian
Sort by:
1 - 1 of
11
Hi @castmonkeys
Are you talking about SVM or about Vector Linear Regression?
Vector Linear Regression is just linear regression applied to multiple labels. It is equivalent to regressing each label separately. It is not comparable to most model types in RapidMiner, which take only one label.
Regards
Sebastian