Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

Creating and Applying Thresholds

Colleagues:

I've used the Preventitive Maintainence Machine Failure data set that comes with RapidMiner to experiment with creating various classification models. I saved a model I developed to predict machine failure using the "Write Model" operator. This model was the output of a fair amount of optimiizations, feature selection experimentation using "Optimize Parameters", "Cross Validation", and other feature selection related operators.

I'd like to use the "Read Model" operator to load the Model I developed and load new data that the model hasn't seen and apply predictions using the beforementioned model - and then set various thresholds using the "Set Threshold" and "Apply Threshold" operators (related to the confidence attributes added by applying the model) to see the effect on prediction outcomes.

The file "Create_and_Apply_Threshold_Example_No_Error.png" shows a very simple process (based on the tutorial example) in which I can set and apply thresholds - but only by using a very generic setup with a (knn in this case) learner and no cross validation. All Attributes are recognized (26 in the data and 3 added by the model) for a total of 29.

The file "Create_and_Apply_Threshold_Example_Error.png" is another process in which I load the before mentioned saved model using "Read Model" and apply it to new data - but as the error message shows, the output of "Apply Model" shows only 26 attributes and the Apply Threshold operator returns the error message shown in "Create_and_Apply_Threshold_Example_No_Error_Nr_2.jpg". For some reason, the attributes (the Fail / No Fail predictions and confidences) added by applying the model against new data are not recognized by the "Apply Threshold" opertator.

I went back to my original process in which I created the model and tried to apply thresholds against new data in my original process but I still get the same error messages. Once again, the attributes (the Fail / No Fail predictions and confidences) added by applying the model against new data within my original process are not recognized by the "Apply Threshold" opertator.

The only way I can get the "Apply Threshold" operator to work is within the most simple of processes as mentioned above.

I imagine I am missing a very obvious point as it appears setting and applying thresholds is dead simple to do. To ensure that alll metadata would be available at run time, I stored the test data, the new data, and the predictive model in my local repositiory before trying to build a process that included setting and applying thresholds using these objects.

Thanks for any suggestions and best wishes, Michael Martin

Find more posts tagged with

AI Studio

Accepted answers

M_Martin

Colleagues:

A comment (from contributor xitign) in a post from today (6.August) named "Unable to select attribute subset using select attributes" gave me an idea to try in a process in which I attempt to set and apply predicition confidence thresholds.

In this process, I read a classification model from disk and apply that model to data the model hasn't seen before. After applying the model to the new data, I want to set a probability threshold for prediction confidence other than the default confidence.

The Solution: I simply opened the Process, went to the "Process" Menu, and clicked on "Validate Process" as described by xitign.

The warning message within the "Apply Threshold" operator remains, but I can now experiment with a varierty of confidence thresholds and note the effect changeing the confidence threshold has on model predictions.

RapidMiner no loner aborts the running of the process due to the lack of a Label (predictable attribute) in the new data.

This is one tip I won't forget. Thanks xitign! ;-)

Michael

All comments

MartinLiebig

Hi Michael,

this pretty much looks strange. I am using thresholds weekly and never encountered such a problem. Any chance you can share the data/model privatly?

Best,

Martin

M_Martin

Hallo Martin:

Danke fuer Ihre Meldung!

I've tried posting the the data in Excel format, the .rmp file, and the model file to this thread in the forum but I keep getting a message stating the the file contents don't match the expected file type. I've even tried zipping everything up in a .rar archive, no luck.

Can I email everything to you privately? If so, please tell me where I should send everything to. My email is michael@informationarts.ca

Best wishes,

Michael

M_Martin

Hallo Martin:

I put the .mod file (which was saved in .xml format) inside of a Word Document (attached) and that solved the issue I mentioned above.

You would need to copy the contents of the word doc and paste them into an empty text document and save it using a .mod extension under the file name:

mdl_Machine_Failiure_Predictions.mod

Also attached is the data (Excel format) the model generates predictions against, and the .rmp process file with the "Create Threshold" and "Apply Threhold" operators.

What I want to do is experiment with different thresholds and see how these changes impact the number of 'yes' (i.e. failure) predictions.

Thanks for considering all of this when you get a chance.

Best wishes,

Michael

Prediction_Model.docx

Run_Model_With_New_Data.rmp

M_Martin

Hallo Martin:

I see that Excel files cannot be posted - attached is the same data in .csv format. As the process expects Excel you would need to either mofidy the process to the read .csv file or put the .csv file contents into an Excel file. I am also attaching the two other files (attached to my lat post) to this post.

Best wishes, Michael

MM_Generated_New_Machine_Data.csv

Prediction_Model.docx

Run_Model_With_New_Data.rmp

MartinLiebig

hi,

i am not able to get this running. Can you send it to me: mschmitz at rapidminer.com

Cheers,

Martin

M_Martin

Colleagues:

The Solution: I simply opened the Process, went to the "Process" Menu, and clicked on "Validate Process" as described by xitign.

RapidMiner no loner aborts the running of the process due to the lack of a Label (predictable attribute) in the new data.

This is one tip I won't forget. Thanks xitign! ;-)

Michael