🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

how to predict response rate or responses in Rapid Miner

User111113User: "User111113"
New Altair Community Member
Updated by Jocelyn
Hi All,

I'm fairly new to Rapid Miner and looking for a way to predict response rate based on historical data from past 2 years.
I have customer id and categories and of course quantity mailed and responses 

for example

id    category     state   year   month     QtyMailed    Responses Received            Response Rate
1        a                OH    2018    oct           5000                  200                                    4%                          
1        b                CA    2018    Nov          10000               130
1        c               PA     2018    dec           35000               512
2
2

and so on.............. I would like to predict responses or response rate let's say for upcoming month 

Sort by:
1 - 15 of 151
    You can try some simple ML algorithms like Decision Tree or Naive Bayes and see what they look like.  But if you only have data monthly you actually don't have that much data to train the model so don't be surprised if the fit is not that great.  If you review the cross validation operator tutorial it will provide some guidance on how you should set up this process.
    @Telcontar120
    Thank you for your response.

    I tried a few things and looked at some examples. It gives me a lot of errors and asked me to auto fix which I don't even get how and why it is doing so. Only one time it ran and took year as a prediction value where it should be either responses or response rate. I am stuck not sure how to move forward
    Hi @User111113,

    In order we can understand what's going on, could you share : 

     - your process ( via File --> Export Process)
     - your data

    Regards,

    Lionel
    This is my data File

    lionelderkrikor Here I am attaching the process
    Hi @User111113,

    The error means that the attributes in your training set and the attributes in your test set are not strictly the same.
    This error is caused by the Nominal to Numerical operator in the training part of your Cross Validation operator which create attribute(s) in the training set and not in your test set.
    The solution is to move the Nominal to Numerical operator outside the CV operator.

    In attached file, the working process.

    Regards,

    Lionel
    User111113User: "User111113"
    New Altair Community Member
    OP
    Updated by User111113
    @lionelderkrikor

    Thank you for your response. I used decision tree and it looks like it's working fine. I would like to know one more thing here, the responses these models are giving are based on what parameters like in my case I want the model to make predictions based on category and state or may be category, state and total mailed.

    Can I set it up myself so it looks only at those 2 or those 3 columns and predict the response.
    @User111113,
    Of course ! 

     - Put a Select Attributes operator after your data retrieval.
     - In the parameters of this operator, choose attribute filter type = subset
     - Select your 2 or 3 relevant attributes :  



    Regards,

    Lionel
     
    @lionelderkrikor

    I have a few more questions I guess....


    When I am trying automodel it shows "back" and "next button" sometimes and sometimes it doesn't. If you see the below screenshot I cannot go back or front... and sometimes it do shows up. Do you know how to resolve this.


    Hi @User111113,

    Strange ! 

    Try to select the attribute you want to predict (the label).

    Regards,


    Lionel
    @lionelderkrikor
    @Telcontar120

    Thank you for your help.

    I have more parameters that I want to add to my data to predict responses but I wanted to see a better way. I have indexes which are like 0,1,2,3 let's say responses with index 0 is higher now my data will look like below. 


    id    category index    state   year   month     QtyMailed    Responses Received            
    1        a           0          OH    2018    oct           3000                 150                                                   
    1        a           1         OH    2018    oct           1000                 40                                                         1       a           2         OH    2018    oct           1000           10                                                             
    1        b                           CA    2018    Nov          10000               130
    1        c                             PA     2018    dec           35000               512
    2
    2


    my question is that I know important factors that changes the responses are indexes, state and month of the year but how much are they affecting like may be % wise can we find that out and is it also possible to feed data by counties or zip codes and then see if that makes any difference because people would have responded may be only from 3 zip codes and not from other 2....

    I have a lot in my mind hope I am not confusing anyone

    When I tried doing AutoModel it says "DeSelect" quantityMailed column and if I do that I knows it's not going to work as I saw response predicted and they were not up to the mark at all technically everything was same... so I never deselect that column 
    Hi @User111113,

    I have difficulties to understand what is your question...
    Can you explain more explicitly what you get and what you want to obtain ?
    In the meantime, you can indeed apply your dataset to AutoModel. If you have doubt about one or more columns (attributes)
    first select it (them) and enable the Automatic Feature Selection before running AutoModel. If, in fine, these attributes are not relevant
    they will be removed from the final feature set.
    Concerning the "weights", you can see that for several models you have access to the weights of each regular attributes by clicking
    on Weights for a given model.

    Hope this helps,

    Regards,

    Lionel

    @lionelderkrikor

    I did more research and modified my data set and generated new models. my questions are:

    how can I reduce the error rate, have better performance ?

    do I need to validate my models? if yes, then how can we do it after we deployed models using auto-models?

    What do you think about grouping the models?
    User: "varunm1"
    New Altair Community Member
    Accepted Answer
    Hello @User111113

    how can I reduce the error rate, have better performance?

    Are you optimizing the predictive models? You need to adopt concepts from feature selection, optimizing hyperparameters ("Optimize Parameter Grid"), try different models, generate new features from existing features. As there is no single solution to improve model performance. You can try the above-mentioned concept in your modeling to check if you could get better performance.

    do I need to validate my models? if yes, then how can we do it after we deployed models using auto-models?
    Yes, you need to validate your models. There are different validation methods like cross-validation, split validation and multi hold-out validation (used in automodel). Auto model uses multi-hold out validation while training and testing your model. After deploying you can score on new data, I am not clear on this question. Once we deploy a model it just predicts the labels. If you have the new original labels you can always retrieve your trained model and then apply the model on new data and use performance to check the performance of new data.