Accounting for number of observations / evidence
User23400
New Altair Community Member
Dear RM-Enthusiasts,
Working on an online advertising dataset I have a list with Product-IDs. Every product has a number of attributes and I want to predict one of them. A fairly basic Decision Tree model is already yielding acceptable results.
However I still have one source of predictive potential that is not used yet and that is the number of observations. The data for some Product-IDs are based on 1 observation, while others are based on 20 or more observations. Obviously I would like to weigh the data for the IDs with many observations heavier than the ones with few observations.
Can anybody direct me to a way of handling this? Maybe a tutorial or youtube video?
Any advice would be greatly appreciated. Thanks in advance!
Best,
Working on an online advertising dataset I have a list with Product-IDs. Every product has a number of attributes and I want to predict one of them. A fairly basic Decision Tree model is already yielding acceptable results.
However I still have one source of predictive potential that is not used yet and that is the number of observations. The data for some Product-IDs are based on 1 observation, while others are based on 20 or more observations. Obviously I would like to weigh the data for the IDs with many observations heavier than the ones with few observations.
Can anybody direct me to a way of handling this? Maybe a tutorial or youtube video?
Any advice would be greatly appreciated. Thanks in advance!
Best,
Marc
Tagged:
0
Answers
-
Hi,you can use aggreagte to generate this count and then set the role of this attribute to weight. Then it is counting more in learners.
Be a bit careful with it. It may lead to a bias towards well known things.Best,martin2 -
Dear Martin,
Thanks a lot for your response. I understand the concept and I found 3 „Aggregate“ operators: Generate Aggregation, Aggregate and Extract aggregates. I chose „Aggregate“.
Next, I chose number_observations as “aggregation attribute”. When selecting the corresponding “aggregation_function” (average, concatenation, count etc.) though, I could not find “weight”.
Do you have any idea where I’m going wrong?
Best,
Marc
0 -
Hi @User23400,
You have to choose count in aggregation function in the parameters of the Aggregate operator.
Then you have to put a Set Role operator in your process and in the parameters of this operator, select in attribute name the attribute you just created and set weight as target role
Regards,
Lionel0 -
Thanks Lionel,
Clear. It worked so far, but I now only have the aggregated attribute on the output port of the Aggregator operator. The other attributes are not passed through. I tried a few things but couldn't get it to work. Any idea?
Thanks,
Marc0