Hi!
I'm doing a donor (customer) analysis for my master-thesis and I hope you can help me, as I'm not very deep into RapidMiner. I have data from three different departments (dialogue marketing, campaign team, online marketing) of a NPO, as they don't have a central data warehouse yet. I already managed to match the three data sheets and did some data preparation.
My problem now is that I don't know which my final operator will be and therefore what my next steps are.
I have following data from the donors: donorID, e-mail, zipcode, gender (man/woman/family), creation date, product status (we differ 9 products, e.g. "godfather", "member", "protector"), origin (e.g. "internet", "mailing"), total dontation, number of donations and date of birth.
I want to find new insights in the data. There was never an analysis of the complete data. The three departments have different goals. The dialogue marketing team tries to get high amounts of donations. The campaign team wants a lot of signatures for petitions. The online marketing team wants the people to subscribe for the newsletter. I want to find the donors who donated the biggest amount of money. Maybe donors who are also subscribed to our newsletter donate more money, or maybe not. Maybe donors who are above 40, signed a petition and are from a specific region donate a lot of money.
Is it better to have different data sheets (e.g. matched donors from dialogue and online marketing team) or use only one big one (with columns: newsletter TRUE/FALSE, campaign TRUE/FALSE). Which operators should I use to analyse the data?
I also have some questions for data preparation. I want to transform the date of birth in age. Is there an operator who calculates the age, using the current date? Is there an operator I can use to generate age groups (e.g. 18-25, 26-35, 36-45, ...)?
The zipcodes consist of five numbers (Germany). To get a bigger region, I'd like to use only the first two numbers. Which operator can I use to cut the four last numbers?
Thanks in advance for your help!
Tim