Hi All
I am trying to get a word count for words in a text column in excell data sheets.
I have used the following to create a word list
Read Excel -> Select Attributes -> Process documents from data
and tokenised the process docs data
This gives me a list of words and if they are used in the column as follows
Incident id Password Account Reset Computer outlook Crash
INC1 1 1 1 0 0 0
INC2 0 1 0 0 1 0
INC3 0 0 0 1 0 1
However what i now need to get to is a count of the words as follows
Password 1
Account 2
Reset 1
Computer 1
Outlook 1
Crash 1
Whats the best way of returning these results?
This will allow me to quickly identify words used in the data that do not hoit the threashhold of number of times used. for example the data set i have has over 75000 different words used but i am only iterested in any word that has been used 25 times or more. i will then be able to add these words to the 'filter stopwords (dictonary)' easily