bulk scoring in rapidminer server

Neel
Neel New Altair Community Member
edited November 5 in Community Q&A
Hi everyone,

What is the best way to bulk score new records (100s of thousands originating from enterprise DB) using a deployed model (deployed via Deployment) in the Rapidminer server? 

>I have tried using the web service, but it does not scale. The response time for a single record is around 3 seconds currently.
> There's no 'real-time' scoring requirement. It is a daily single bulk request. 

Best Answer

  • IngoRM
    IngoRM New Altair Community Member
    Answer ✓
    Hi,
    With the upcoming RM 9.6 version you can turn off explanations for predictions which slows down the scoring a lot.  But for true bulk scoring a single row web service approach does not seem to be great anyway IMHO.
    If you check the repository folder of the deployed models, you will find a process called "score_set" which you can use as a blueprint.  Make a copy of this and adapt it a bit (especially for the operator "Explain Prediction" turn on the parameter "only predictions" to speed things up!) and add a data source (reading from you DB) in the beginning.  If you also want to add the monitoring, you may also want to add the operator MDMLogging to this (which is a bit more complicated - I suggest to deal with this last if everything else works and you want the logging...).
    Hope this helps,
    Ingo

Answers

  • IngoRM
    IngoRM New Altair Community Member
    Answer ✓
    Hi,
    With the upcoming RM 9.6 version you can turn off explanations for predictions which slows down the scoring a lot.  But for true bulk scoring a single row web service approach does not seem to be great anyway IMHO.
    If you check the repository folder of the deployed models, you will find a process called "score_set" which you can use as a blueprint.  Make a copy of this and adapt it a bit (especially for the operator "Explain Prediction" turn on the parameter "only predictions" to speed things up!) and add a data source (reading from you DB) in the beginning.  If you also want to add the monitoring, you may also want to add the operator MDMLogging to this (which is a bit more complicated - I suggest to deal with this last if everything else works and you want the logging...).
    Hope this helps,
    Ingo

  • Neel
    Neel New Altair Community Member
    Hi @IngoRM,

    Thank you. I could re-purpose the "score_set" to "bulk-score" by
    1. setting the  "select which=1" for "Define Target" block as there shouldn't be a target column for prediction. 
    2. setting the "select which=1" for "Define ID" block as the training mode doesn't need an identifier (optional) and prediction needed one.

    It would actually be great to have a standard "bulk-score" process auto-generated from the deployment.

    Cheers,
    Neel