Basic question - which learner, other operators for probably numeric label

raymor
raymor New Altair Community Member
edited November 5 in Community Q&A
I'm an utter newbie to this.  Doing some simple tests, I'm having
success with binomial learners like decision trees, but I'm not seeing
how to get good results on even a simplified version of my actual
problem.  I _think_ the label needs to be numeric, which greatly reduces
the number of learners I can use.  It may be, though, that I'm looking
at the problem backwards and I really should have the binomial attribute
as the label.  If someone wee to point me in the right direction that may
be very helpful.

Here are a few of my examples.  Attribute #1 is what will be known when
it's time to apply the model.  We'll call this "knowledge of job". 
Attribute #3 is the outcome of doing the job - a higher number meaning
that we got better results.    Attribute #2, yes or no, is whether or not
we accepted the job.  Refusing a job always has a small negative result
(or zero).

Knowledge,Accept,Result
6,no,-2
6,yes,94
7,no,-4
7,yes,28
0,no,-3
0,yes,-23
0,no,-4
0,yes,-2
5,no,-3
5,yes,-28
2,no,-2
2,yes,-13
4,no,-1
4,yes,36
1,no,-0
1,yes,-2
4,no,-1
4,yes,11
6,no,-2
6,yes,98

We can see intuitively that the higher our knowledge of the job, the better the results
tend to be, if we accept it.  That's not always the case - accepting the job with
knowledge level 4 was more successful than the job with knowledge level 5, but it tends
to be that higher knowledge = better result.  Note that while the knowledge level appears
to be numeric, there are only a few possible values, so it COULD be treated as polynomial,
though I would think that would lose information.  (The model would do well to know that
4 is between 3 and 5, and should be expected to have results between those of 3 and 5).

What I had originally tried to do was test a new value of knowledge level for both
accept = yes and accept = no, then choose the one with the highest expected result.
Integer labels seem to be problematic, though, and besides maybe there is some way
to use "accept" as a binomial label and solve for it directly?

BTW - With the "real" problem, there are a dozen or more evidence attributes,  and
hundreds of thousands of examples.  I suspect that adding more attributes will be easier
once I get the simplified version working, though.  (Though it looks like neural nets may
be too slow beyond just a few examples.)

Thanks so much for any tips.

Tagged:

Answers

  • haddock
    haddock New Altair Community Member
    Hi there!

    Welcome to the dataminers' asylum! I've taken a quick look at your problem and pondered that the "Accept" attribute is redundant, so in the following rig I've extracted examples where accept is "no", and removed the attribute. To give you a framework to play with I've put in an optimiser to tune the regression learner, in this case a support vector machine ( these don't choke on large attribute sets, and have useful mathematical properties to commend them ). Have fun!
    <operator name="Root" class="Process" expanded="yes">
       <operator name="CSVExampleSource" class="CSVExampleSource">
           <parameter key="filename" value="C:\Documents and Settings\Alien\My Documents\rm_workspace\q.csv"/>
           <parameter key="label_column" value="3"/>
       </operator>
       <operator name="ExampleFilter" class="ExampleFilter">
           <parameter key="condition_class" value="attribute_value_filter"/>
           <parameter key="parameter_string" value="Accept=no"/>
           <parameter key="invert_filter" value="true"/>
       </operator>
       <operator name="FeatureNameFilter" class="FeatureNameFilter">
           <parameter key="skip_features_with_name" value="Accept"/>
       </operator>
       <operator name="IOStorer" class="IOStorer">
           <parameter key="name" value="data"/>
           <parameter key="io_object" value="ExampleSet"/>
           <parameter key="remove_from_process" value="false"/>
       </operator>
       <operator name="EvolutionaryParameterOptimization" class="EvolutionaryParameterOptimization" expanded="yes">
           <list key="parameters">
             <parameter key="LibSVMLearner.nu" value="[0.0;0.5]"/>
             <parameter key="LibSVMLearner.C" value="[0.0;10000.0]"/>
           </list>
           <parameter key="show_convergence_plot" value="true"/>
           <operator name="XValidation" class="XValidation" expanded="yes">
               <parameter key="keep_example_set" value="true"/>
               <operator name="LibSVMLearner" class="LibSVMLearner">
                   <parameter key="svm_type" value="nu-SVR"/>
                   <parameter key="C" value="687.6071838964747"/>
                   <parameter key="nu" value="0.377369833390911"/>
                   <list key="class_weights">
                   </list>
               </operator>
               <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                   <operator name="ModelApplier" class="ModelApplier">
                       <list key="application_parameters">
                       </list>
                   </operator>
                   <operator name="RegressionPerformance" class="RegressionPerformance">
                       <parameter key="root_mean_squared_error" value="true"/>
                   </operator>
               </operator>
           </operator>
       </operator>
       <operator name="IORetriever" class="IORetriever">
           <parameter key="name" value="data"/>
           <parameter key="io_object" value="ExampleSet"/>
       </operator>
       <operator name="ParameterSetter" class="ParameterSetter">
           <list key="name_map">
             <parameter key="LibSVMLearner" value="LibSVMLearner (2)"/>
           </list>
       </operator>
       <operator name="LibSVMLearner (2)" class="LibSVMLearner">
           <parameter key="svm_type" value="nu-SVR"/>
           <parameter key="C" value="687.6071838964747"/>
           <parameter key="nu" value="0.377369833390911"/>
           <list key="class_weights">
           </list>
       </operator>
       <operator name="ModelApplier (2)" class="ModelApplier">
           <parameter key="keep_model" value="true"/>
           <list key="application_parameters">
           </list>
       </operator>
    </operator>
    PS Don't worry about the warning triangles - RM sometimes thinks there won't be examples when actually there will.