text mining on Service desk call logs
I am new to Rapid miner and looking for assistance on the following please.
I have an export of our service desk text logs and am looking to highlight key incident types. as such I have now exploded the data I have so the description of the incidents is split by each word.
For example
Customer requires password reset
user needs a reset of windows password
MS office will not open
Outlook not installed
so column 1 has the words
Customer
user
MS
outlook
Column 2 has
Requires
Needs
office
not
and so on throughout the data.
I now want to be able to count all the times 'password' is used in all these columns and to be able to combine this key word with others such as 'windows' or 'email'. to build up a picture of the number of incident 'types' we are truly receiving as the options for logging these incidents are not being used correctly.
I will then be able to report on the number of 'Windows password rests', 'email password resets', 'Outlook installations' etc. as I build the key words and search criteria.
Any assistance in achieving this would be appreciated
Best Answer
-
A detailed tutorial on text mining is probably beyond a simple forum post. But if you import the raw data into RapidMiner (before you parse all the terms into separate columns) and then convert the original field with the ticket description into data type "text", then you will be able to use the "process documents" operator, which will allow you to do the calculation you are looking for very easily by selecting the "term occurrences" in the word vector parameter. You will also need to "tokenize" on words in the inner process, and you will be able to do other things like "remove stopwords" which will probably improve the results as well.
1
Answers
-
A detailed tutorial on text mining is probably beyond a simple forum post. But if you import the raw data into RapidMiner (before you parse all the terms into separate columns) and then convert the original field with the ticket description into data type "text", then you will be able to use the "process documents" operator, which will allow you to do the calculation you are looking for very easily by selecting the "term occurrences" in the word vector parameter. You will also need to "tokenize" on words in the inner process, and you will be able to do other things like "remove stopwords" which will probably improve the results as well.
1