Email classification models using Naive Bayesian, SVM and Neural Networks

Question

Hello,

I am a student at the University of Gloucestershire and have decided to extend some of the email classification work that we did earlier this year for my dissertation. Please forgive me if my question is too vague or I do not provide enough information but I have read through the manual and scoured the forums but can not find an answer.

I am trying to compare the performance of the 3 classification models (mentioned above) when tasked with classifying SPAM and non-SPAM email. I have a corpus of emails that is already categorized into SPAM and non-SPAM (the corpus is in the form of text files and is used as an example in the book "Machine Learning for Hackers [O'Reilly, 2012]")

I have managed to make a start on my models but keep running into problems. I have not accomplished a great deal, basically I have go to the stage of Processing Documents from Files, creating a Vector which removes some of the unwanted data through stemming and tokenizing, then Wordlist to Data, then Write to Excel. That is where I get a bit stuck, I'm not sure how to complete the models or even if what I have done previously is correct.

I know it's a big ask but I would really appreciate it if somebody would be kind enough to take me through creating one of the models step-by-step (I assume that once I have completed one model, the other 2 should be very similar).

Thanks for your time.
Elliot

roohishahid · Answer

have you found solution to this ??

MariusHelf · Answer

Hi Elliot,

did you already check out our video tutorials on our website? They explain quite well how to create and validate models in general, and there are videos specially tailored to text processing. If you combine the knowledge from both video series, you are almost there :)

If you have any specific problems, please let us know.

Best regards,
Marius