how to predict response rate or responses in Rapid Miner

Hi All,

I'm fairly new to Rapid Miner and looking for a way to predict response rate based on historical data from past 2 years.
I have customer id and categories and of course quantity mailed and responses

for example

id category state year month QtyMailed Responses Received Response Rate
1 a OH 2018 oct 5000 200 4%
1 b CA 2018 Nov 10000 130
1 c PA 2018 dec 35000 512
2
2

and so on.............. I would like to predict responses or response rate let's say for upcoming month

Find more posts tagged with

AI Studio

Decision Tree

Apply Model

Linear Regression

Predictions + Scoring

Accepted answers

lionelderkrikor

@User111113,
Of course !

- Put a Select Attributes operator after your data retrieval.
- In the parameters of this operator, choose attribute filter type = subset
- Select your 2 or 3 relevant attributes :

Image: https://us.v-cdn.net/6030995/uploads/editor/ws/n0u47k21c5n4.png

Regards,

Lionel

varunm1

Hello @User111113

how can I reduce the error rate, have better performance?

Are you optimizing the predictive models? You need to adopt concepts from feature selection, optimizing hyperparameters ("Optimize Parameter Grid"), try different models, generate new features from existing features. As there is no single solution to improve model performance. You can try the above-mentioned concept in your modeling to check if you could get better performance.

do I need to validate my models? if yes, then how can we do it after we deployed models using auto-models?

Yes, you need to validate your models. There are different validation methods like cross-validation, split validation and multi hold-out validation (used in automodel). Auto model uses multi-hold out validation while training and testing your model. After deploying you can score on new data, I am not clear on this question. Once we deploy a model it just predicts the labels. If you have the new original labels you can always retrieve your trained model and then apply the model on new data and use performance to check the performance of new data.

All comments

Telcontar120

You can try some simple ML algorithms like Decision Tree or Naive Bayes and see what they look like. But if you only have data monthly you actually don't have that much data to train the model so don't be surprised if the fit is not that great. If you review the cross validation operator tutorial it will provide some guidance on how you should set up this process.

User111113

@Telcontar120
Thank you for your response.

I tried a few things and looked at some examples. It gives me a lot of errors and asked me to auto fix which I don't even get how and why it is doing so. Only one time it ran and took year as a prediction value where it should be either responses or response rate. I am stuck not sure how to move forward

lionelderkrikor

Hi @User111113,

In order we can understand what's going on, could you share :

- your process ( via File --> Export Process)
- your data

Regards,

Lionel

This is my data File

lionelderkrikor Here I am attaching the process

testExportProcess.rmp

lionelderkrikor

Hi @User111113,

The error means that the attributes in your training set and the attributes in your test set are not strictly the same.
This error is caused by the Nominal to Numerical operator in the training part of your Cross Validation operator which create attribute(s) in the training set and not in your test set.
The solution is to move the Nominal to Numerical operator outside the CV operator.

In attached file, the working process.

Regards,

Lionel

testExportProcess_2.rmp

User111113

@lionelderkrikor

Thank you for your response. I used decision tree and it looks like it's working fine. I would like to know one more thing here, the responses these models are giving are based on what parameters like in my case I want the model to make predictions based on category and state or may be category, state and total mailed.

Can I set it up myself so it looks only at those 2 or those 3 columns and predict the response.

lionelderkrikor

Regards,

Lionel

User111113

@lionelderkrikor

I have a few more questions I guess....

When I am trying automodel it shows "back" and "next button" sometimes and sometimes it doesn't. If you see the below screenshot I cannot go back or front... and sometimes it do shows up. Do you know how to resolve this.

Image: https://us.v-cdn.net/6030995/uploads/editor/ji/ea3w8im52i6m.jpg

lionelderkrikor

Hi @User111113,

Strange !

Try to select the attribute you want to predict (the label).

Regards,

Lionel

User111113

@lionelderkrikor
@Telcontar120

Thank you for your help.

I have more parameters that I want to add to my data to predict responses but I wanted to see a better way. I have indexes which are like 0,1,2,3 let's say responses with index 0 is higher now my data will look like below.

id category index state year month QtyMailed Responses Received
1 a 0 OH 2018 oct 3000 150
1 a 1 OH 2018 oct 1000 40 1 a 2 OH 2018 oct 1000 10
1 b CA 2018 Nov 10000 130
1 c PA 2018 dec 35000 512
2
2

my question is that I know important factors that changes the responses are indexes, state and month of the year but how much are they affecting like may be % wise can we find that out and is it also possible to feed data by counties or zip codes and then see if that makes any difference because people would have responded may be only from 3 zip codes and not from other 2....

I have a lot in my mind hope I am not confusing anyone

When I tried doing AutoModel it says "DeSelect" quantityMailed column and if I do that I knows it's not going to work as I saw response predicted and they were not up to the mark at all technically everything was same... so I never deselect that column

lionelderkrikor

Hi @User111113,

I have difficulties to understand what is your question...
Can you explain more explicitly what you get and what you want to obtain ?
In the meantime, you can indeed apply your dataset to AutoModel. If you have doubt about one or more columns (attributes)
first select it (them) and enable the Automatic Feature Selection before running AutoModel. If, in fine, these attributes are not relevant
they will be removed from the final feature set.
Concerning the "weights", you can see that for several models you have access to the weights of each regular attributes by clicking
on Weights for a given model.

Hope this helps,

Regards,

Lionel

User111113

@lionelderkrikor

I did more research and modified my data set and generated new models. my questions are:

how can I reduce the error rate, have better performance ?

do I need to validate my models? if yes, then how can we do it after we deployed models using auto-models?

What do you think about grouping the models?

varunm1

Hello @User111113

how can I reduce the error rate, have better performance?

do I need to validate my models? if yes, then how can we do it after we deployed models using auto-models?