🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

How can I calculate the frequency of specific words for each row in the excel data

User: "psyduck"
New Altair Community Member
Updated by Jocelyn
Hi,
I'm working on a data that each sentence is in separate rows. I want to determine word frequency in each row with a word list that I have created. Then I would like to add these values to my dataframe as a new variable.

For example:
Let's say, I have a list of words that contains apple and banana (it is my dictionary). And I have independent sentences in rows like that:
1. X x x apple x x banana x apple.
2. X apple x x x x.
3. X x banana x apple x.
.
..
...
Now I want to calculate how many times the words in my list have been repeated separately. As a result, the new column I want to create is:
1. = 3
2. = 1
3. = 2
.
..
...
Thanks in advance.

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "Telcontar120"
    New Altair Community Member
    Accepted Answer
    If I understand your question, this is pretty straightforward in RapidMiner.  Process your text data using the "Process Documents from Data" operator, which allows you to input both a defined wordlist and your data source.  Inside you'll need to use Tokenize to split your text into words and then set the word vector option to "term occurrences".  The output will be a new attribute (column) for each word in your wordlist with the count of the number of occurrences for the text you process (each text will be its own row or example).