nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Siemens Community Catalyst Program

The Siemens Community Catalyst program was co-created with our community to acknowledge technology leaders who consistently contribute to the Siemens Community. Nominations are accepted on a rolling basis.

Nominate Now

⚠️Please Note

Technical discussions have been migrated to the Siemens Support Center as Knowledge Base (KB) articles; please note that this content is no longer maintained and may be outdated, so for the latest information, log in to the Siemens Support Center, search online, or contact our support team.

Search for Content in Siemens Support Center

Classification of highly imbalanced data

bojana_trisic

Hi guys,

I'm working on churn prediction problem and I'm having a problem with highly imbalanced data (only 0.1% churners in data set). I have tried different types of pre-processing and modeling, but still cannot get decent results (maximum 20 % real churners in 10% of highest propensity records).

I tried to use upsampling, downsampling, something in between, clustering set before classification, normalization, PCA, feature selection... And different modeling techniques, decision trees, neural nets, SVM... Bagging and boosting and missclassification cost. This has helped me to improve accuracy of my model from 2% to 20 % in highest propensity segment, but this is the most i got.

Did anyone work on similar problems? Which technique did you find most helpful?

Thank you in advance,

Bojana