sampling / learning curve

New Altair Community Member

May 23, 2010

Updated Nov 5, 2024 by Jocelyn

Dear all,

Sampling the training set can have a major impact on classification accuracy.
Especially when the data is skewed.

Lets say you have a dataset of 100k negative examples and 1k positive examples.
And you wish experiment with different pos/neg ratios in the training set.

To do this you need:
example filter: select all negative
example filter: absolute amount
example filter: select all positive
example filter: absolute amount
merge

when there are more then two classes, it gets even more cumbersome.

Would be cool if this could be combined into a single operator.

This might also be faster and more memory efficient.

Best regards,

Wessel

Find more posts tagged with

AI Studio

Sort by:

1 - 2 of 21

fischer

New Altair Community Member

May 27, 2010

Hi,

just to get it right: What would be the parameters of your operator? If I get it right, it would be

- a ratio for each class
- an absolute number of examples you want as output?

Cheers,
Simon

wessel

New Altair Community Member

May 27, 2010

Lets see:

Input: a dataset

Parameters fields:
label = class_A [absolute amount] or [relative amount] and [sampling type]
label = class_B [absolute amount] or [relative amount] and [sampling type]
...
label = class_Z [absolute amount] or [relative amount] and [sampling type]

Defaults: absolute amount = '' relative amount = 1 sampling type = linear

Examples:
Input, dataset with 2000 examples of class A

class_A [1000] or [] and [linear] Returns a dataset containing the first 1000 instances of class A

class_A [1000] or [] and [random] Returns a dataset containing 1000 instances of class A randomly sampled

class_A [] or [0.5] and [linear] Returns a dataset containing the first 1000 instances of class A

class_A [] or [0.5] and [random] Returns a dataset containing 1000 instances of class A randomly sampled

class_A [3000] or [] and [random] Returns an error?

class_A [] or [1.4] and [random] Returns an error?

sampling / learning curve

Find more posts tagged with

Quick Links