email analysis to assess customer experience
Luca95
New Altair Community Member
Hello everyone!
I am a newbie in RapidMiner and I am trying to find a solution for my dissertation topic.
I am working in collaboration with a SME specialised in the design and construction of eco-recycling technologies.
Customers are always in thouch with the export area of the company since the making process could take up to 18 months to be done.
During this long period of time, customers may make changes in their product or negotiate about delivery costs, etc.
Requestes, changes and complaints are usually recorded in the mail box of the export area managers.
I would like to know if it is possible to make a complaints analysis, sentiment analysis or customer experience analysis by analysing requests and answers from customers. I do not have any feedback from customers, just emails.
Would be easier to make a quastionnaire for customers ?
I am a newbie in RapidMiner and I am trying to find a solution for my dissertation topic.
I am working in collaboration with a SME specialised in the design and construction of eco-recycling technologies.
Customers are always in thouch with the export area of the company since the making process could take up to 18 months to be done.
During this long period of time, customers may make changes in their product or negotiate about delivery costs, etc.
Requestes, changes and complaints are usually recorded in the mail box of the export area managers.
I would like to know if it is possible to make a complaints analysis, sentiment analysis or customer experience analysis by analysing requests and answers from customers. I do not have any feedback from customers, just emails.
Would be easier to make a quastionnaire for customers ?
0
Best Answers
-
In general this is a very classical NLP issue, with a variety of options.
But while in theory it is perfectly possible to create a classification workflow it also depends on the quality and the quantity of available material.
If you have like thousands of emails that you can use to create training data I'd say go for it, if it is just a few hundred or so a questionnaire might be more realistic. The more training data the more accurate your models will become, without these the models will under perform.
1 -
The mails do not have to be standard, if they were you could use some simple rules :-)
It definitely is possible with the community edition, but you need to do all the dirty activity by yourself.
What is needed is a 'defined topic'. In other words, if you have few mails and given the context you can decide if it's about topic A or topic B, a machine can theoretically do the same. Now, the more data you have, the more likely a machine can be trained to come close to what you want. 1000 emails can be enough if looking for 2 topics, and far from enough if you look for 50 topics. Also, the further topics are apart, the easier to catch them.
Ask 10 data scientists which solution is the best and you will probably get 15 different answers so don't take mine for granted.
There are few scenarios, depending on your situation.
Scenario one is when you have no real clue on how to split your data into topics, or if there are no real dependencies. Typically I would start first with some unsupervised clustering models so I could already get a good idea on how a machine looks at my data. I can recommend LDA (part of the toolbox), it's not too intimidating and gives quite some good results if the data is good enough to start with. You will have some hits and misses but you can then use the hits to define your first labels and work more on the misses. Next step I take is than create training data for a supervised model, as I prefer some consistency.
Scenario 2 is if you have a set of labels (topics) predefined and all your data needs to fit in either one of these. Then you're probably up to some combination of manual tagging first and try and error onwards.
If you have your data and your labels the preprocessing begins, the text mining operator is your friend for this. Typical work flow could be to roleset the label, convert your email to text > lower cap > tokenize > stopwords etc using the data to documents operator. This will vectorize your content (tf-idf gives you proper weight) and the wordlist output will show you the impact of words by label/topic so you can use that to improve your stopwords list or define additional weighting.
Once you are happy about the preprocessing pick your favorite model. SVM and Bayes are pretty good in average. But again, try and error.
Just don't make the mistake preprocessing can be quick and dirty. It doesn't, be prepared to spend a good amount of time getting the data ready. Your models will be very grateful then...
1
Answers
-
Hi @Luca95 - this is MarlaBot. I found these great videos on our RapidMiner Academy that you may find helpful:
MarlaBot0 -
In general this is a very classical NLP issue, with a variety of options.
But while in theory it is perfectly possible to create a classification workflow it also depends on the quality and the quantity of available material.
If you have like thousands of emails that you can use to create training data I'd say go for it, if it is just a few hundred or so a questionnaire might be more realistic. The more training data the more accurate your models will become, without these the models will under perform.
1 -
Hello Kayman
Thank you so much for your response. Unfortunately emails are all different. So, I do not have standard responses. What I am trying to do is to get information from all the messages between export area and customers to highlight what are the main issues during customer experience and also to point out what customer value is according to what customer request.
Then, I would like to create metrics and weight them based on the most frequent keywords.
Do you think is possible to do that with rapidminer community version? I have about 1000 emails from 2018-2019 and I am waiting for 2015-2016 backup. This batch contains emails between export area and customers.
0 -
The mails do not have to be standard, if they were you could use some simple rules :-)
It definitely is possible with the community edition, but you need to do all the dirty activity by yourself.
What is needed is a 'defined topic'. In other words, if you have few mails and given the context you can decide if it's about topic A or topic B, a machine can theoretically do the same. Now, the more data you have, the more likely a machine can be trained to come close to what you want. 1000 emails can be enough if looking for 2 topics, and far from enough if you look for 50 topics. Also, the further topics are apart, the easier to catch them.
Ask 10 data scientists which solution is the best and you will probably get 15 different answers so don't take mine for granted.
There are few scenarios, depending on your situation.
Scenario one is when you have no real clue on how to split your data into topics, or if there are no real dependencies. Typically I would start first with some unsupervised clustering models so I could already get a good idea on how a machine looks at my data. I can recommend LDA (part of the toolbox), it's not too intimidating and gives quite some good results if the data is good enough to start with. You will have some hits and misses but you can then use the hits to define your first labels and work more on the misses. Next step I take is than create training data for a supervised model, as I prefer some consistency.
Scenario 2 is if you have a set of labels (topics) predefined and all your data needs to fit in either one of these. Then you're probably up to some combination of manual tagging first and try and error onwards.
If you have your data and your labels the preprocessing begins, the text mining operator is your friend for this. Typical work flow could be to roleset the label, convert your email to text > lower cap > tokenize > stopwords etc using the data to documents operator. This will vectorize your content (tf-idf gives you proper weight) and the wordlist output will show you the impact of words by label/topic so you can use that to improve your stopwords list or define additional weighting.
Once you are happy about the preprocessing pick your favorite model. SVM and Bayes are pretty good in average. But again, try and error.
Just don't make the mistake preprocessing can be quick and dirty. It doesn't, be prepared to spend a good amount of time getting the data ready. Your models will be very grateful then...
1 -
Thank you so much kayman.
You were really helpful
I have the last request. Since I am a newbie, would you recommend me a tutorial video or similar solved case that could help me in this issue?
I would like to follow an example to make the things easier for me since I do not have more than 2 months.
Thank you so much dear
1