"Preprocessing Data for Decision Tree (Weights)"

New Altair Community Member

Nov 24, 2009

Updated Nov 5, 2024 by Jocelyn

Hi,

I have a special problem because of the characteristics of my data. The attributes are:

- ID (I declared as ID)
- contact (nominal and declared as regular)
- product (nominal and declared as regular)
- execution (nominal and declared as label)
- quantity (numerical and declared as weight)

The data covers all possible combinations of contact, product and execution, if the combination doesn't exist, the quantity is zero, if the quantity is 300, then this case appeared 300 times (in reality but not in the datasheet). So it isn´t leading to the desired results, when i build a decision tree or some rules. I tried to declare the quantity-attribute as weight, but seemingly it isn´t the right way. Can someone tell me, how to weight the data correctly?

Thanks a lot!

Find more posts tagged with

AI Studio

Weights

Decision Tree

Sort by:

1 - 3 of 31

land

New Altair Community Member

Nov 24, 2009

Hi,
I would have suggested to declare the quantity as weight. This should work with learners supporting weights. What went wrong?
By the way:
I would filter out all examples having quantity =0 using the example filter operator. This would at least make things faster.

Greetings,
Sebastian

mmaelzer

New Altair Community Member

Nov 25, 2009

Hi Sebastian,

Filtering out examples with quantity 0 reduces the classification error (to 75%). When I´m not filtering out this examples the classification error is at 99%. Because of this I thought that weights are not correctly used or declared.
At first I used a X-Validation, as I understood this splits the dataset into two or more disjoint datasets (problematic because of the fact, that every case appeares just one time). classification error: 89% with filter/ 99% without filter
Now I tried to split the data manually in two datasets (month1, month2) covering all cases and used month1 as trainingset for the learner und month 2 as testset after applying the model to the testset. classification error: 75% with filter/ 99% without filter
The tree doesn´t represent the data, for example:

contact - product - execution - quantity
c1 - p1 - e1 - 2
c1 - p1 - e2 - 500

leads to this path in the tree: c1 -> p1 -> e1
It seems like the learner takes the first combination and ignores the weights.
I tried it with decision tree and CHAID.

Regards,

M. Mälzer

land

New Altair Community Member

Nov 25, 2009

Hi,
sorry, but I don't see any need for doing classification anyway. If you have each combination of the nominal attributes and each combination is assigned a label, where's the need for learning? It seems to me, the list of combinations with labels is a perfect classifier?

Greetings,
Sebastian

"Preprocessing Data for Decision Tree (Weights)"

Find more posts tagged with

Quick Links