🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Determining which attributes contribute to value of a label

User: "goccolo"
New Altair Community Member
Updated by Jocelyn

Hi Community, I'm currently running my first data mining project and I'm having some serious doubts. I hope I'm posting my question in the right place and some of you could help me with some kind of hint or advice. It may be obvious to you, but to me the differences between many tools and techniques is still very thin.

 

I have my data in an Oracle database, stored into 4 tables: CUSTOMERS, ACCOUNTS, TRANSACTIONS and ALERTS.

The common attribute for each of them is CUSTOMER_ID.

The attribute which is most "interesting" to me is called TRUE_POSITIVE, it's a column from table ALERTS, and it takes either value "Yes" or "No".

The GOAL of my project is to determine which of the attributes contribute the most to the value of TRUE_POSITIVE being = "Yes".

 

My dataset is moderate in size (maybe 50 attributes in total, tables having between 50k to 700k examples).

At this point I've imported my data in RapidMiner Studio and did some initial data clearing (rejected certain columns, filtered out examples with missing important attributes.etc.)

Many attributes are take binominal values (for example: CUSTOMERS.FACE_TO_FACE_IDENTIFIED), many are polynominal (for example: CUSTOMERS.NATIONALITY).

I've also created some new attributes in table CUSTOMERS, like NO_OF_ALERTS_POS, which stores the number of true positive alerts for the particular customer, or HR_CASHFLOW which stores customers' average monthly value of transactions made "with" high risk countries.

 

My main question is:

Which tool / operand should I use to achieve my goal? Correlation matrix? Regression?

 

And some additional questions:

What would be the optimal number of attributes? Does my current dataset require much dimensionality reduction?

Can I use my new attribues to avoid joining tables? Does it make any sense and is there big risk I will miss the change of detecting some unobvious correlations?

 

Many thanks in advance for your help.

Find more posts tagged with