beginers question

shone
shone New Altair Community Member
edited November 5 in Community Q&A
Hi, in my coledge we have one project about data mining, and the tool we use is rapidminer. Since I'm new to rapidminer, a have one question for you. My process looks like this:

Root
     ExampleSource
     FeatureSelection
           XValidation (number_of_validations = 10)
           MetaCost
                 DecisionTree
           OperatorChain
                 ModelApplier
                 ClassificationPerformance

I figured that model building is happening in iterations and the model we get at the and is the one that has the best results. When the process is finished, it shows me PerformaceVector in form of confusion matrix. The question is: Is that ConfusionMatrix for the last model, or for the best model?
Tagged:

Answers

  • steffen
    steffen New Altair Community Member
    Hello and welcome to RapidMiner

    The answer is: the last model. But: Since the FeatureSelection stops when no more improvememt can be achieved (see description of FeatureSelection in tutorial.pdf or by selecting the operator and press F1) it is also the best model, which can represent a local maximum.

    See another example in <your-rm-workspace>\sample\05_Features\10_ForwardSelection.xml.

    regards,

    Steffen

  • shone
    shone New Altair Community Member
    Thanks for the reply. :)
    The reason I asked this is, because, when I save the model (which is result of the given process), and load it in another process and apply it to the same data set, that was used in in the first process, Confusion matrix produced by ClassificationPerformance is different then the one in first process. Why is that?
  • steffen
    steffen New Altair Community Member
    Ok, I think some terms have been mixed up. In the future please provide the complete setup (just copy all the text from the xm-tab in RapidMiner ... and put it into the thread by please using the code (#) tag).

    Your posted setup as the example mentioned by me does not produce a model. It just produces AttributeWeights. So to gain comparable result you have to use a process like this one:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="Input" class="ExampleSource">
            <parameter key="attributes" value="../data/polynomial.aml"/>
        </operator>
        <operator name="AttributeWeightsLoader" class="AttributeWeightsLoader">
        </operator>
        <operator name="AttributeWeightsApplier" class="AttributeWeightsApplier">
        </operator>
        <operator name="XValidation" class="XValidation" expanded="yes">
            <parameter key="create_complete_model" value="true"/>
            <parameter key="sampling_type" value="shuffled sampling"/>
            <operator name="NearestNeighbors" class="NearestNeighbors">
                <parameter key="k" value="5"/>
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="Applier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Performance" class="Performance">
                </operator>
            </operator>
        </operator>
        <operator name="ProcessLog" class="ProcessLog">
            <list key="log">
              <parameter key="generation" value="operator.FS.value.generation"/>
              <parameter key="performance" value="operator.FS.value.performance"/>
            </list>
        </operator>
    </operator>
    I said "comparable" not "the same", because to gain exactly the same results you have to ensure that the data is splitted by XValidation exactly the same way as in the last iteration of FeatureSelection. You can achieve this by setting the parameter local_random_seed to a value > 0 (in both the FeatureSelection process and the process specified above). But I do not know why this should matter.

    If your proces does produce a model or I misunderstood anything else, please post it here. Otherwise I am restricted to guessing ...

    Hope this was helpful

    regards,

    Steffen

  • shone
    shone New Altair Community Member
    I forgott to write, that i've added ModelWriter after the ClassificationPerformance operator.
  • steffen
    steffen New Altair Community Member
    Fine.

    So you save the model every step of XValidation or only the final model (by setting the related parameter) ? No matter what case is the true one, make sure that you have understood XValidation and / or read the documentation of the RapidMiner implementation (select the operator and press F1).