text mining on Service desk call logs

Question

I am new to Rapid miner and looking for assistance on the following please.

I have an export of our service desk text logs and am looking to highlight key incident types. as such I have now exploded the data I have so the description of the incidents is split by each word.

For example

Customer requires password reset

user needs a reset of windows password

MS office will not open

Outlook not installed

so column 1 has the words

Customer

user

MS

outlook

Column 2 has

Requires

Needs

office

not

and so on throughout the data.

I now want to be able to count all the times 'password' is used in all these columns and to be able to combine this key word with others such as 'windows' or 'email'. to build up a picture of the number of incident 'types' we are truly receiving as the options for logging these incidents are not being used correctly.

I will then be able to report on the number of 'Windows password rests', 'email password resets', 'Outlook installations' etc. as I build the key words and search criteria.

Any assistance in achieving this would be appreciated

Telcontar120 · Accepted Answer

A detailed tutorial on text mining is probably beyond a simple forum post.  But if you import the raw data into RapidMiner (before you parse all the terms into separate columns) and then convert the original field with the ticket description into data type "text", then you will be able to use the "process documents" operator, which will allow you to do the calculation you are looking for very easily by selecting the "term occurrences" in the word vector parameter.  You will also need to "tokenize" on words in the inner process, and you will be able to do other things like "remove stopwords" which will probably improve the results as well.