Time based term frequency analysis

dawidprozesky
dawidprozesky New Altair Community Member
edited November 5 in Community Q&A
Hi, I explored rapidminer a while ago, and have now returned with a specific analysis which I hope to achieve. I have a data set in Excel with the following columns:

Date (dd/mm/yyyy format) | Body of Text (text) | Publisher (name)

So each record in the data set relates to a specific body of text published at a specific date, and the name of the publisher.

My end goal is to identify words/terms in the texts which started occurring after a given date (i.e. after 1 January 2010), as well as see the word/term frequencies of these identified words/terms over time (can be per year) after the given date.

My current config is: Read Excel - Nominal to Text - Process Documents from Data (tokenizing, filtering and transforming) - Wordlist to Data

I am very new to rapidminer, so any assistance would be really appreciated!!
Tagged:

Answers

  • Telcontar120
    Telcontar120 New Altair Community Member
    You are probably going to want to do some preprocessing on your date/time data first before your text analysis to facilitate your subsequent comparisons.  Try Date to Numerical to summarize by month/year.  Then when you generate your word counts, you can aggregate by the appropriate time window later.
    As far as looking for occurrences after a specific date, a simple Filter Examples should suffice to handle that.
    This should get you started.