Extracting data and learning

Question

Hello!

What we got are mostly text which consist out of customer opinions. For example a customer writes "...bla bla and I had a problem with the credit card thing...". This information is stored in different databases - for example excel and sorted in parts like "problems with paying", "problems with webpage".
So what we wanted to do is getting more information out of the user opinions. For that we can read all of them or maybe we can data mine them.
But how? How can I tell a software, that all "problem with payment" should also be sorted by "all problem with payment" that have the word "credi" inside - to get all credit card problems, and also sorted by all problems with the word "paypal" inside to get all paypal problems.

Do you know what I mean? By seeing that I have 500 problems with payment I cannot be sure if this is 499 credit card problems and 1 paypal or 499 paypal and 1 credit card. I have to read them all.
In my opinion one way could be to tell the software to sort by "credit" + "credit card" + "visa" + "american express" to maybe get all problems regarding a credit card.

I have a lot of information (8.000 a month) but I cannot read them all. I have to sort them, data mine them, whatever!
Any good idea please? I was able to get the excel thing into rapid miner. But then I am stuck. What do I have to do? Or is rapidminer the wrong tool for something like that?

Kind regards
Michael E.

Legacy User · Answer

Please understand that so far I am on my own. I would like to implemented such thing like data mining, but for that I have to prove if this is necessary and if this works. Strange thing, but I just want to take a look around for a few more days, try to get some other guys interested and then start to get serious.

So far I managed to get a word list which also counts the words. That is a nice thing. Right now I know how often for example the word "problem" appears. So I am on the right way.
The next step is to see if data set no. 1 includes the word "problem" + "payment". if yes, the maybe sort it or mark it as a "payment problem"-thing. And so on.
Looks like this is no "rocket science" and I am able to solve this as a "dumb managment guy" ;-)
But it needs some time, some googleing etc.!

I'll keep you posted!

IngoRM · Answer

Hi,

you might be interested in the work we had done for mobilkom austria in the field of mail classification. This was a large-scale automatic e-mail routing problem which probably has a lot in common to your problem:

http://rapid-i.com/content/view/124/1/

After a short project which kept total costs really low, the system was ready for production. If you are interested, you can of course contact us at sales@rapid-i.com in order to discuss details or ideas.

Cheers,
Ingo

Legacy User · Answer

:)
Wow, thanks. Fast and looks more than I expected! :-)
But I need some time to read and try it out.

As you are from Germany (isn't it?) you might know Quelle.de ;-) and we got a lot of customer informations like phone calls, chats, forum etc. that contain a lot of good information about where the problems are and what the customers want.
So far they are just sorted in something around 10 categories and inside of different databases. And not every info has a category because the support team was not able to set a category because of a lack of time or information.

Now I do the MacGyver job and try to find a good and cheap :-( way to get the information out of this data.
But the information is very unstructured as the customers use their own words and phrases.