Problem with too many parameter to put as columns into example set

ahaensel
ahaensel New Altair Community Member
edited November 5 in Altair RapidMiner
My problem-task is that I have customers with a unique ID and they have parameter (binomial) and I would like to predict the value of certain target variables, so far only one but possible multiple.
In my test case I used the following input dataset, see meta data, each customer is represented in a row and the parameter are in the columns – simply the usual way.
meta data:
Role         Name         Type
id         Customer_Id integer
label    Target         binominal
regular Para1         binominal
regular Para2         binominal
regular Para3         binominal
regular Para4         binominal
dataset:
Customer_Id Target Para1 Para2 Para3 Para4
1 M 1 0 1 0
2 V 1 0 0 1
3 M 0 1 1 1

=> With Naïve Bayes I get great prediction results in the test case with limited dimensions.

Problem with the actual dataset:
I have some 100,000s of parameter and the number is growing a lot. The actual number of active parameter for a customer is very small and so the table would be extremely large and sparse. So my idea was to use the following dataset format as input:
meta data:
Role         Name         Type
id         Customer_Id integer
label    Target         binominal
regular ActivePara polynominal
data:
Customer_Id Target ActivePara
1 M Para1
1 M Para3
2 V Para1
2 V Para4
3 M Para2
3 M Para3
3 M Para4

BUT now I do not get consistent predictions per customer what I get is something like this
Customer_Id Target ActivePara Prediction of Target
1 M Para1 V
1 M Para3 M
2 V Para1 V
2 V Para4 V
3 M Para2 M
3 M Para3 M
3 M Para4 V

But I want/need the target prediction per customer_id to be consistent.

How do I need to set up the input data/ the model to get the result!

Thanks a lot in advance for any hints and help!!!


Tagged: