🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Creating and Applying Thresholds

User: "M_Martin"
New Altair Community Member
Updated by Jocelyn

Colleagues:  

 

I've used the Preventitive Maintainence Machine Failure data set that comes with RapidMiner to experiment with creating various classification models.  I saved a model I developed to predict machine failure using the "Write Model" operator. This model was the output of a fair amount of optimiizations, feature selection experimentation using "Optimize Parameters", "Cross Validation", and other feature selection related operators.

 

I'd like to use the "Read Model" operator to load the Model I developed and load new data that the model hasn't seen and apply predictions using the beforementioned model - and then set various thresholds using the "Set Threshold" and "Apply Threshold" operators (related to the confidence attributes added by applying the model) to see the effect on prediction outcomes.

 

The file "Create_and_Apply_Threshold_Example_No_Error.png" shows a very simple process (based on the tutorial example) in which I can set and apply thresholds - but only by using a very generic setup with a (knn in this case) learner and no cross validation.  All Attributes are recognized (26 in the data and 3 added by the model) for a total of 29.

 

The file "Create_and_Apply_Threshold_Example_Error.png" is another process in which I load the before mentioned saved model using "Read Model" and apply it to new data - but as the error message shows, the output of "Apply Model" shows only 26 attributes and the Apply Threshold operator returns the error message shown in "Create_and_Apply_Threshold_Example_No_Error_Nr_2.jpg".   For some reason, the attributes (the Fail / No Fail predictions and confidences) added by applying the model against new data are not recognized by the "Apply Threshold" opertator.

 

I went back to my original process in which I created the model and tried to apply thresholds against new data in my original process but I still get the same error messages.  Once again, the attributes (the Fail / No Fail predictions and confidences) added by applying the model against new data within my original process are not recognized by the "Apply Threshold" opertator.

 

The only way I can get the "Apply Threshold" operator to work is within the most simple of processes as mentioned above.  

 

I imagine I am missing a very obvious point as it appears setting and applying thresholds is dead simple to do.  To ensure that alll metadata would be available at run time, I stored the test data, the new data, and the predictive model in my local repositiory before trying to build a process that included setting and applying thresholds using these objects.

 

Thanks for any suggestions and best wishes, Michael Martin

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "M_Martin"
    New Altair Community Member
    OP
    Accepted Answer

    Colleagues:

     

    A comment (from contributor xitignin a post from today (6.August) named "Unable to select attribute subset using select attributes" gave me an idea to try in a process in which I attempt to set and apply predicition confidence thresholds.  

     

    In this process, I read a classification model from disk and apply that model to data the model hasn't seen before.  After applying the model to the new data, I want to set a probability threshold for prediction confidence other than the default confidence.

     

    The Solution: I simply opened the Process, went to the "Process" Menu, and clicked on "Validate Process" as described by xitign.

     

    The warning message within the "Apply Threshold" operator remains, but I can now experiment with a varierty of confidence thresholds and note the effect changeing the confidence threshold has on model predictions.  

     

    RapidMiner no loner aborts the running of the process due to the lack of a Label (predictable attribute) in the new data.

     

    This is one tip I won't forget.  Thanks xitign!   ;-)

     

    Michael