bulk scoring in rapidminer server
Neel
New Altair Community Member
Hi everyone,
What is the best way to bulk score new records (100s of thousands originating from enterprise DB) using a deployed model (deployed via Deployment) in the Rapidminer server?
>I have tried using the web service, but it does not scale. The response time for a single record is around 3 seconds currently.
> There's no 'real-time' scoring requirement. It is a daily single bulk request.
> There's no 'real-time' scoring requirement. It is a daily single bulk request.
1
Best Answer
-
Hi,With the upcoming RM 9.6 version you can turn off explanations for predictions which slows down the scoring a lot. But for true bulk scoring a single row web service approach does not seem to be great anyway IMHO.If you check the repository folder of the deployed models, you will find a process called "score_set" which you can use as a blueprint. Make a copy of this and adapt it a bit (especially for the operator "Explain Prediction" turn on the parameter "only predictions" to speed things up!) and add a data source (reading from you DB) in the beginning. If you also want to add the monitoring, you may also want to add the operator MDMLogging to this (which is a bit more complicated - I suggest to deal with this last if everything else works and you want the logging...).Hope this helps,
Ingo2
Answers
-
Hi,With the upcoming RM 9.6 version you can turn off explanations for predictions which slows down the scoring a lot. But for true bulk scoring a single row web service approach does not seem to be great anyway IMHO.If you check the repository folder of the deployed models, you will find a process called "score_set" which you can use as a blueprint. Make a copy of this and adapt it a bit (especially for the operator "Explain Prediction" turn on the parameter "only predictions" to speed things up!) and add a data source (reading from you DB) in the beginning. If you also want to add the monitoring, you may also want to add the operator MDMLogging to this (which is a bit more complicated - I suggest to deal with this last if everything else works and you want the logging...).Hope this helps,
Ingo2 -
Hi @IngoRM,
Thank you. I could re-purpose the "score_set" to "bulk-score" by
1. setting the "select which=1" for "Define Target" block as there shouldn't be a target column for prediction.
2. setting the "select which=1" for "Define ID" block as the training mode doesn't need an identifier (optional) and prediction needed one.
It would actually be great to have a standard "bulk-score" process auto-generated from the deployment.
Cheers,
Neel
1