Train on subset of data XValidate on full set of data
noah977
New Altair Community Member
Hi,
Thanks for all the help so far. I couldn't have gotten this far without all the advice of the people here. You guys are great!
My next challenging question...
I want to train a model on a subset of the data, but then test it during the XV stage on the FULL set of data.
For example, imagine data where the label is height and the input variable is birth-weight.
I want to say,
1) Train an SVM to regress height from birth-weight, but ONLY use birth-weight > 6 kg for training."
2) TEST using XValidation against ALL the input data.
The premise is that learning from a subset of data will create a more accurate model to use against all the data. (yes, for my application, this has been proven to work.)
So as I iterate through different values of the SVM parameters, I want to train on a subset, but test on the full set.
How can I do this in RM??
Thanks
Thanks for all the help so far. I couldn't have gotten this far without all the advice of the people here. You guys are great!
My next challenging question...
I want to train a model on a subset of the data, but then test it during the XV stage on the FULL set of data.
For example, imagine data where the label is height and the input variable is birth-weight.
I want to say,
1) Train an SVM to regress height from birth-weight, but ONLY use birth-weight > 6 kg for training."
2) TEST using XValidation against ALL the input data.
The premise is that learning from a subset of data will create a more accurate model to use against all the data. (yes, for my application, this has been proven to work.)
So as I iterate through different values of the SVM parameters, I want to train on a subset, but test on the full set.
How can I do this in RM??
Thanks
Tagged:
0
Answers
-
Hello
I am afraid I got you wrong. As far as I understand, you mean with "all input data" "inputdata without restrictions"
you can use ExampleFilter in the training step ...something like this
<operator name="Root" class="Process" expanded="yes">
<operator name="XValidation" class="XValidation" expanded="yes">
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="ExampleFilter" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
</operator>
<operator name="LibSVMLearner" class="LibSVMLearner">
</operator>
</operator>
<operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
</operator>
<operator name="ClassificationPerformance" class="ClassificationPerformance">
</operator>
</operator>
</operator>
</operator>
That's interesting. Normally you introduce something called "sample selection bias" or "incidental truncation" this way. Such a bias normally HARMS the performance. I am interested in this problem , so I would really appreciate some comments about this issue from your side (maybe per PM ?) .noah977 wrote:
The premise is that learning from a subset of data will create a more accurate model to use against all the data. (yes, for my application, this has been proven to work.)
kind regards,
Steffen
0