Algortihms are "cheating" and copying right label from other instances
Hi everyone,
I have a problem with my model. It should predict a monthly product volume from some given attributes.
My (training)data consists of data from ~ 60 past month. Each instance in the dataset represents one day. Two given attributes are the "month" and the "year". The label is the product volume at the end of the month. So in my case every instance of a specific month (~ 30 days/month --> ~ 30 instances) has the same label. Now when I train the algorithm (via Cross Validation / Deep Learning) and look at the performance measure (relative_error) it seems like the algorithm looks at the attributes "month" and "year" and adopts the label value from another row with the same month and year as his prediction for this instance.
I hope you can follow my description. If there is something you don't understand feel free to ask.
I would be very thankfull if someone can tell me if my guess on this is right and how I can avoid this mistake.
Now I am trying to avoid this by just having the month as an attribute, not month+year.
Thanks for your replies,
Sebastian