Classification and clustering of clients of a bank

Question

Greetings to all members!
I have never used Rapidminer, I do not know IT and I really need your help. I have a database of about 300 clients of a bank. The database has: name, county, age, civil status, children, active loans, home, higher education, income, company where they work. I have to categorize these clients in 4 categories: A, B, C, D. Category A are customers who have high salary and do not pose a risk. Category B, are clients who receive the credit, and have active credit and who does not represent a risk of default. Category C, are clients that find it harder to borrow, need to have a co-payer or derogation from the bank. Category D, customers who are unlikely to be given credit. 
What features of the application should I use to be able to accomplish this project? The application should be like a credit scoring, classify these customers, and divide them into the four categories to show what type of customer gets credit and who does not. 
I would like to receive an answer from you, it would help me a lot to know how to start. 
Thanks!

SGolbert · Answer

Hi @catsta,

Before developing a RapidMiner process you should have some idea on how to solve the problem "on paper". There are different options, depending on which data is available. From what I read there are 2 options:

1. Train a decision model based on historic data, for example a database from clients that have defaulted and clients that haven't.

2. Apply decision rules coming from best practices / especialists.

The second option doesn't involve machine learning, so in that case RapidMiner wouldn't be necessary (but can be used for data manipulation if you want).

If you can tell us a little more, we can help you further.

Regards,

Sebastian

kypexin · Answer

Hi @catsta @rfuentealba

Just to add my 5 cents to the topic, as I have had experience with credit scoring.

Though there are numerous studies for using cluster analysis for credit scoring existing, I am rather sceptical that it is possible to get meaningful results in this area by using clustering algorithms, at least fast and easy. Credit scoring is a classification problem by its nature, so you'd need historic data on clients perfromance in order to build a classification model. Using clustering algorithms, you may get a good separation of different customers segments but it's pretty hard to make sure that those segments actually represent different levels of credit risk.

rfuentealba · Answer

By the way, I just remembered something that might be useful for you to begin understanding your problem. With RapidMiner Studio, when you are presented with the first screen, if you create a new process, you have predefined templates for certain common cases.

The case of "Credit Risk Modeling" (light blue) uses an algorithm named "Support Vector Machine" to help discovering your clusters of data. It is also available on the "Repository Tab > Samples > Templates > Credit Risk Modeling". You might want to begin building your solution based off this example.