Word count in excell column

martin_red
martin_red New Altair Community Member
edited November 5 in Community Q&A

Hi All

I am trying to get a word count for words in a text column in excell data sheets.

I have used the following to create a word list

Read Excel -> Select Attributes -> Process documents from data

and tokenised the process docs data

This gives me a list of words and if they are used in the column as follows

Incident id    Password   Account   Reset   Computer  outlook   Crash

INC1           1                 1               1           0               0             0

INC2           0                 1               0           0               1             0

INC3           0                 0               0           1               0             1

 

However what i now need to get to is a count of the words as follows

Password         1

Account            2

Reset                1

Computer          1

Outlook             1

Crash                1

 

Whats the best way of returning these results?

This will allow me to quickly identify words used in the data that do not hoit the threashhold of number of times used. for example the data set i have has over 75000 different words used but i am only iterested in any word that has been used 25 times or more. i will then be able to add these words to the 'filter stopwords (dictonary)' easily

Best Answer

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    Answer ✓

    I see. So then you should output the WOR port on the Process Documents from Data operator. That is the WordList port and it will give you a list of how many times the words occured. 

     

    if you want to save this data back to Excel, then attache a WordList to data operator to the WOR output side and Write Excel. 

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    In this case I would use an Aggregate operator instead of Process Documents for data. The Aggregate operator will take each column where your word is in the header and sum up all the 1's and 0's and give you the total.

  • martin_red
    martin_red New Altair Community Member

    I am using the Process documents to break down the sentence though so

    password reset needed for account

    once run through the process doc with tokenize is then split down as

                    Password      Reset     Account

    INC1          1                    1            1

     

    When using the aggregate this just gives me

    a count of 'password reset needed for account ' as 1

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    Answer ✓

    I see. So then you should output the WOR port on the Process Documents from Data operator. That is the WordList port and it will give you a list of how many times the words occured. 

     

    if you want to save this data back to Excel, then attache a WordList to data operator to the WOR output side and Write Excel. 

  • martin_red
    martin_red New Altair Community Member

    Thank you.

     

    I have been changing all sorts of operators etc and a very simple solution. failed to spot what the different out put does. will now look into these for other operators.

     

    Thanks again