What seems to be the problem in this case?

cjjc20001
cjjc20001 New Altair Community Member
edited November 2024 in Community Q&A
I am trying Lightgbm with a dataset. It is giving the following error. 




Sample data are gender, degree concentration etc. Mostly ready-made options coming from a survey where the participant just selects the most appropriate option.
Tagged:

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,

    looks like your text field has categories in application which werent present in training.

    BR,
    Martin
  • cjjc20001
    cjjc20001 New Altair Community Member
    I think the problem is that there are data instances that only occur once, and during the sampling, this occurrence is not chosen by the training data; hence during the validation; they are marked as unrecognized. When I removed the split, it worked. However, I need to train and test the model. I utilized cross-validation but it has the same problem. What is the solution for this?