Unexpected predictions
dataminer99
New Altair Community Member
Hello,
Despite seeing good predictions (~70% accuracy) from my training and validation sets, I am having trouble scoring my records for use. I have 250K records to score and 99% of them have the same prediction result (Y) and identical confidence scores. Yes confidences in the scored data set are always 0.818. No confidences are always 0.182.
My expectation is the predictions and associated confidences will not be identical, as they are when I score my data. I have replaced my actual / real data with the "Generate Direct Mailing Data" operator in every process. Unfortunately the generated data produces consistent training, validation, and scoring data throughout. I.e. no problems. My real training data set has 44,000 records; 2 special attributes (1 ID, 1 nominal label) and 66 regular attributes (26 integer, 12 nominal, 28 real). I would have added code from my 4 processes but it caused this message to exceed the 20K character limit. Any suggestions are much appreciated! Mike
Despite seeing good predictions (~70% accuracy) from my training and validation sets, I am having trouble scoring my records for use. I have 250K records to score and 99% of them have the same prediction result (Y) and identical confidence scores. Yes confidences in the scored data set are always 0.818. No confidences are always 0.182.
My expectation is the predictions and associated confidences will not be identical, as they are when I score my data. I have replaced my actual / real data with the "Generate Direct Mailing Data" operator in every process. Unfortunately the generated data produces consistent training, validation, and scoring data throughout. I.e. no problems. My real training data set has 44,000 records; 2 special attributes (1 ID, 1 nominal label) and 66 regular attributes (26 integer, 12 nominal, 28 real). I would have added code from my 4 processes but it caused this message to exceed the 20K character limit. Any suggestions are much appreciated! Mike
Tagged:
0
Answers
-
What kind of model are you fitting? When you say you get good accuracy, is there a range of probabilities produced or just this same identical score?0
-
Hi,
if you use accuracy as a performance measure, you will have to compare it with the default accuracy, that means: What accuracy would you have, if you always say it's the most frequent label. If your "yes" examples cover 70% of your examples, an accuracy of 70% does not sound too good
Greetings,
Sebastian0