"Historical Twitter data"

LC
LC New Altair Community Member
edited November 5 in Community Q&A
Hi all

Relatively new to rapidminer but looking at it as a useful analysis tool for some research I am doing with colleagues. In particular we are looking to use historical Twitter data but the biggest stumbling block seems to be how to import the data into Rapidminer in an efficient manner - our current 'manual' system of creating an excel sheet with the data separated into categories (ID, date, location, retweets, tweet text etc.) is somewhat labour intensive.

I have looked online but not found anything that would work for us, yet! For example, http://vancouverdata.blogspot.co.uk/2010/11/text-analytics-with-rapidminer-loading.html - a brilliant set of tutorials but hasn't greatly helped in getting what we need at the start.  ???

I have had a look at the Twitter connector - fantastic and the data is ready to go - but I'm not sure it works for historical data. We are looking at tweets from around 18 months ago and when I limit the results by date nothing comes back. Does anyone know if the Twitter connector would work here, and if so how? If not, can anyone point me in the right direction of how best to import historical Twitter data into rapid miner.

Thanks
Tagged:

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,

    i think twitter limites the api for 2 weeks back or something. There is nothing RM can do about this.

    Cheers,
    Martin
  • JEdward
    JEdward New Altair Community Member
    I'm assuming Twiter limits access to historical tweets as they sell that data through their subsidiary Gnip. https://www.gnip.com/sources/twitter/historical/

    I don't know their pricing, but you can look at some of the partners plugged into their feed to see if any of those have RapidMiner connectors: Splunk for example.  
    https://www.gnip.com/partners/plugged-in/  (Although, not all these partners get as much as others, Alteryx only gets the last 30 days of Tweets through their Gnip connector).  Brandwatch was the one I used to use.  

    Another potential option is free API tools like Snapbird https://github.com/remy/snapbird who claim to circumvent limits with their search API.  I haven't tested I'm afraid.