I teach a data mining course to business management students (with little or no programming experience) using a combination of R and RapidMiner. I try to duplicate the examples from each package in the other so that students to appreciate the differences in usability, available algorithms and results. For obvious reasons I use the graphical process approach in RapidMiner, rather than teaching XML (which I don't know anyway).
I have two csv data sets which I use in R, neither of which I have been able to import in the appropriate format to use in RapidMiner, despite playing around with what look like sensible operator options.
sms_spam.csv has two columns, the first identifies the content of the second as "spam" or "ham", while the second is a text message. (
http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/). I want to import this so that I can use Naive Bayes to build a classifier for messages.
groceries.csv is an example set that comes with the arules library in R. It has multiple (unlabelled) columns, with each row representing a transaction, and as many columns used as there are items - so unstructured. I want to use association rules and/or fp-growth on this.
Any suggestions on how I can get either or both of these data sets into RapidMiner in a usable form would be greatly appreciated.