Is it possible to get 100% for split validation accuracy ?
Joannach0ng
New Altair Community Member
Is it possible to get 100% for split validation accuracy and what are the pros of getting 100% accuracy ?Thank you
Tagged:
0
Answers
-
Hi @Joannach0ng,
In my opinion, most of the time this would be alarming. For some problems it may be possible, and for most real business problems not. A point of reference that might be helpful is to ask, 'If a team of experts were to look closely at the data, how good would they be at making their predictions?' That can sometimes give you an idea for what a good accuracy might be. For some simple problems it may be near or at 100%, for many problems in business it won't be anywhere close.
If you have 100% accuracy, I would check for attributes that are too closely correlated with the outcome; they may contain information that wouldn't be available until after the outcome is observed. There's some more information about correct validation in this course: https://academy.rapidminer.com/learn/course/applications-use-cases-professional/
I'd recommend taking a little time to go through the course. Also, if you have come up with 100% accuracy, are you able to share more about the use-case and data, or the process you are using? We might be able to provide better help.2 -
If you come across this problem, check if you included any ID’s in your data source. This happens especially when you are using Decision Trees (or another tree-based algorithm): the tree tries to overfit and the best way to identify a row becomes the ID, so your algorithm isn’t useful, because every single row will have an unseen ID in production.
my 2 cents.2 -
@jmergler Hi thank you for you reply !Actually I was told by my tutor to have a 100% accuracy prediction ,so I was wondering if it is possible as I have tried from 0-1 but could get to 100% ,can adding some operator do so ?Thanks!0
-
@rfuentealba Hi thank you for you reply !Actually I was told by my tutor to have a 100% accuracy prediction ,so I was wondering if it is possible as I have tried from 0-1 but could I get to 100% accuracy by adding some operator do so ?Thanks!0
-
Hi @Joannach0ng
I am taking a risk of being accused by others for teaching you bad things but technically you can achieve it this way, if you train and test model on exactly same dataset:
But still, take other commenters concerns into account, because this thing:- Makes no sense for and real life / machine learning problems.
- Is a serious mistake from data science point of view.
2 -
I want to echo the many cautions here--in real life, 100% accuracy on any test dataset is almost always an indicator that there is some performance leakage occurring---an id, or a surrogate for the label that would not really be available at the time of the prediction. It should be viewed very skeptically, not as a realistic goal.
One possible exception might be if you have a small number of examples in the test dataset but a large number of attributes in the model, in which case your model can be "over-specified" (basically too many attributes will lead to some unique combination serving as a kind of id to make the predictions). Or if you just have too few examples in the test set altogether (e.g., imagine the reductio of 1 test case, which would then either be 100% accurate or 0%!) this can also happen by random chance.
1 -
Now that you mention, I had a requirement once, years ago. I didn't even exist here. If you are familiar with logic gates, you know how they work. Else, there is an explanation here.
The thing is that I had a dataset with some 12 attributes working like this (for the sake of reducing complexity, I'm going to explain with an OR logic gate):a1 a2 ax<br> 0 0 0<br> 0 1 1<br> 1 0 1<br> 1 1 1<br>
The idea was to actually build a program that could act like that because the program was compiled in C, there was no source and the logic controller it was compiled on needed a replacement. I ended up training a decision tree because I had no clue on what the order of the logic gates could be, and the logic controller ended up being an old computer.
Not the most elegant solution but hell of a win for data science.All the best,Rodrigo.1