Attributes do not match, apply model to the classifikation data not possible

Question

Hi together, I have some issues with my project. My task is to generate target variable training data  (Retouren_Training.txt),  which  determines the class membership according  to  the case  description for  the  return rate.In concrete terms, all customers whose forecast return quota(=  RETOUREN_MENGE/LIEFER_MENGE) is a maximum of 0.18 (ie a  maximum  of  18%)  are  considered  to  be  low returners. On the other hand, all  customers whose predicted return  rate  is  greater  than 0.4 (ie greater  than 40%) are considered high returners.  All  other  customers  are  neutral, ie neither low nor high returners.Furthermore, a data mining model  is to  be created,  which  is  to  be applied  to  the  9,900  customers to be classified as an example (Retouren_Klassigung.txt)  and a  class assignment in neutral, low or high returns.

I created the model, calculated the variable, but the Apply model still doesn't work. Where is the error? Would be very grateful for your support.

Best wishes Lina

BalazsBaranyRM · Accepted Answer

Hi!

You are doing some preprocessing inside the cross validation, including generation of new attributes (columns).

Then you build models on this altered data.

Models are built upon the attributes that go into them. Unless the model is doing its own attribute selection, it will expect ALL incoming attributes (with the same name and type) upon predicting with Apply Model.

You need to do the same preprocessing on the training data as you will apply later on the testing or prediction data. The easiest way to achieve this is to put the preprocessing into a separate process and then use this process as a subprocess twice in the main process, once for the modeling and then for the model application.

Regards,
Balázs