🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Anomaly detection experiment

User: "fwood201"
New Altair Community Member
Updated by Jocelyn

Im undertaking my final year project on machine learning for cyber security and am a complete beginner to RM. I wish to create a process that will demonstrate how effective machine learning techniques are for detecting both signatures and anomalies in an IDS, for this I am using the KD99 cup dataset for which i have labelled and unlabeled sets. the aim is to obviously create a classifier that will train from this data and be able to spot anomalies. I have downloaded the anomaly detection extensions but am also not too sure how to use them. 

 

Additionally since the data is already labelled I would like to know if it would be better to have the results name the specific attack that happens (i.e smurf, SQLattack etc) or to simply output 'malicious' or 'benign' and how to do this. 

 

Fraser

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "YYH"
    Altair Employee
    Accepted Answer

    Hi @fwood201,

     

    KDD99 is a widely used data set for anomaly detection. I would suggest to use binary labels as the good starting point (attacks or nomal) because there may not be sufficient cases in several categories of attack for multinominal classification. Watch out for the umbalanced classes.

     

    I did a quick google search and some researchers had sumarized the accuracy of different learners in a paper.

    They mentioned that in raw data, you may need to be carefull about the duplicated data.

     

    You can build SVM, Decision Tree, Random Forest, Naive Bayes, GBT, etc. models in rapidminer for binominal classfication and evaluate the performances (AUC, accuracy, recall, F-measurement,...) for your own models. If interested in unsupervised learning algorithms, you may take a look a the outlier detection operators and anomaly detection extensions from Marketplace. For instance, LOF or HBOS. Also some cutting edge fraud detection algorithms are available (e.g isolation forest) by combining the power of any R/Python libraries.

     

    Happy RapidMining.

    YY