Looking for a data set classifiable by humans and mining - possibly email/spam

steinar
steinar New Altair Community Member
edited November 5 in Altair RapidMiner
Hi

I am looking for a collection of email messages to classify as spam or regular mail. The only data set I've found is the spambase set (http://archive.ics.uci.edu/ml/datasets/Spambase). Unfortunately that does not include the actual messages, but only attributes.
Finding spam mail should be easy. My spam folder has plenty. Finding email messages which could be made open publicly is more difficult. The only collection I've found is Sarah Palin's emails (http://www.crivellawest.net/palin2011/allList.html). However, it is unfortunate that they are all addressed to the same person and are only available in pdf format anyways.

Email is just the first sort of data set I came up with. If you have ideas for other kinds of data which could be both classified by humans and data mining methods, please let me know. It would be an advantage if the data set is tried and tested.

Best regards,
Steinar
Tagged: