Some attribute is missing the input from example set

bernardo_pagnon
bernardo_pagnon New Altair Community Member
edited November 5 in Community Q&A
Hello,

I have a simple RM process, and whenever I use the set role operator I get the warning message "The attribute loan_status is missing in the input example set". I can run a model, but the performance operator gives me a confusion matrix full of zeros. How can I fix this?  I checked the data and it is binominal, the conversion was well performed. 

Regards,
Bernardo

Best Answer

  • Roland Jones_21245
    Roland Jones_21245
    Altair Employee
    Answer ✓
    Hi Bernardo,

    No apologies necessary! I've got both now. 

    Regarding location of Set Role, it is fine as is. The only thing is just to make sure you've got "include special attributes" turned on in Map, as you already have.

    In general your process can be simplified a little, and removing the data duplication. This can be done using the unmatched data from the Filter Examples operator as shown in the screenshot below. The top data takes the non-missing examples, then the unmatched gives the examples with missing loan status.



    Regarding polynominal to binominal, are you wanting to do this to all variables? There shouldn't be a need to do this to loan status as the software will be able to tell by default that this binominal.

    The performance operator gives all zeros as it compares the predicted label to actual. You're scoring data where there is no actual label, hence no performance metric.

    Any further questions or clarifications please let me know.

    Best,

    Roland

Answers

  • Roland Jones_21245
    Roland Jones_21245
    Altair Employee
    Hi @bernardo_pagnon,

    Usually the warning, would be down to a metadata issue, however looking at your process I'd be surprised if that's your issue.

    Regarding your confusion matrix being full of zeros, I couldn't see where you were applying that performance operator? I'd be happy to test if you're able to provide the data?

    Best,

    Roland
  • bernardo_pagnon
    bernardo_pagnon New Altair Community Member
    Dear Roland,

    thanks a lot for your reply. I am sorry, it is my bad. I have been changing the process so much that I probably deleted the performance operator. I am attaching the current version, which is giving me the zeros. 

    What I always struggle with rapidminer is the order of operators. Where do I put the set role? When do I convert polinominal to binomial?
    In this case, I am checking and when I add the polinominal to binomial operator I start receiving warning messages. However, when I checked the types of data everything seemed to be fine. 

    Here is a link to the data: 

    LoansData_sample.csv

    Regards,
    Bernardo
  • Roland Jones_21245
    Roland Jones_21245
    Altair Employee
    Answer ✓
    Hi Bernardo,

    No apologies necessary! I've got both now. 

    Regarding location of Set Role, it is fine as is. The only thing is just to make sure you've got "include special attributes" turned on in Map, as you already have.

    In general your process can be simplified a little, and removing the data duplication. This can be done using the unmatched data from the Filter Examples operator as shown in the screenshot below. The top data takes the non-missing examples, then the unmatched gives the examples with missing loan status.



    Regarding polynominal to binominal, are you wanting to do this to all variables? There shouldn't be a need to do this to loan status as the software will be able to tell by default that this binominal.

    The performance operator gives all zeros as it compares the predicted label to actual. You're scoring data where there is no actual label, hence no performance metric.

    Any further questions or clarifications please let me know.

    Best,

    Roland
  • bernardo_pagnon
    bernardo_pagnon New Altair Community Member
    Thank you for your detailed reply, and for running the process in your computer. 

    Let's see:
    1 - Ok with the set role
    2 - You are totally right about simplifying things, I will work on that. 
    3 - The polinominal to binominal, I only did it with the label (loan_status), and I included special attributes. 
    4 - The software cannot tell the default that loan status is binomial. In the statistics tab, it says polinominal, that is why I changed it. 
    5 - It is still weird to me that I am getting the zeros in the confusion matrix. If I remove performance, I can see the unknown instances being classified by the apply model operator, so there is an actual label and the code is doing what I want. For some reason, the program is not recognizing it in the performance operator. 
    6 - I did a new version in which, when I import the file using import data I tell rapidminer that the loan_status is binominal (no need for the  binominal to polinominal in this case). I still get zeros in the matrix, despite having each unknown instance classified by the apply model. I gave up... I tried everything I could... I exported the output of apply model and built the matrix by hand in Excel, which is what performance should do. 



    Best,
    Bernardo
  • bernardo_pagnon
    bernardo_pagnon New Altair Community Member
    Dear Roland,

    I reread your reply, more carefully this time, and I understood what you were saying. 
    Done, problem solved!

    Best,
    Bernardo