Hi,
I am trying to use CrossValidation with Evolutionary Weights and Nearest Neighbor learning as described by Ingo at
http://rapid-i.com/rapidforum/index.php/topic,41.msg87.html#msg87 . Specifically, I have this excerpt:
<operator name="WrapperXValidation" class="WrapperXValidation" expanded="yes">
<parameter key="number_of_validations" value="5"/>
<parameter key="sampling_type" value="shuffled sampling"/>
<operator name="EvolutionaryWeighting" class="EvolutionaryWeighting" expanded="yes">
<parameter key="maximum_number_of_generations" value="20"/>
<parameter key="p_crossover" value="0.5"/>
<parameter key="population_size" value="2"/>
<operator name="XValidation" class="XValidation" expanded="yes">
<parameter key="number_of_validations" value="5"/>
<operator name="WeightLearner" class="NearestNeighbors">
<parameter key="k" value="10"/>
<parameter key="weighted_vote" value="true"/>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="Performance" class="Performance">
</operator>
</operator>
</operator>
</operator>
<operator name="WeightedModelLearner" class="NearestNeighbors">
<parameter key="k" value="10"/>
<parameter key="weighted_vote" value="true"/>
</operator>
<operator name="WeightedApplierChain" class="OperatorChain" expanded="yes">
<operator name="WeightedModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
<parameter key="keep_model" value="true"/>
</operator>
<operator name="WeightedPerformance" class="Performance">
</operator>
</operator>
</operator>
This works for me as long as all the features are attributes are numerical. However, I have a couple of nominal attributes I want to include, but when I try to include them, I get:
AttributeTypeException Process failed Message: Attribute 'myNomAttrib': Cannot map index of nominal attribute to nominal value: index -1 is out of bounds!
What I think is happening is that when the ModelApplier node inside the XValidation node executes, sometimes the holdout data contains a nominal value for the myNomAttrib attribute that did not occur in the training data, and that is causing the ModelApplier to fail.
If my assessment is correct, how can I avoid this situation? My first inclination was to use stratified sampling, but that only appears to work for nominal labels, not nominal attributes.
Thanks,
Keith