Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

[SOLVED] Speed / Evaluation time improvement of kNN Classifier

Good day,

I've already developed a Java application, which uses RapidMiner.jar (and the other jars), to classify my test data. Classifier that I've used is kNN (k=3, distance measure = cosine similarity). I've already performed the necessary optimization with respect to k and distance measure to be used.
My model is comprised of 25k data set/rows, 31 attributes.
Now, when I ran a test data, which is a CSV file with an average of 3k data set/rows, execution time is quite very long, which is 1 hr+ (avg).
Do you have any suggestions/recommendations on how I can improve the execution/evaluation time of my kNN classifier application based on the details I've mentioned?

Hoping to receive feedback. Thank you

Find more posts tagged with

AI Studio

Accepted answers

All comments

MariusHelf

Hi,

as you probably know, kNN is a lazy learner, which means that training a model is very fast (basically just storing the training set), but application is quite slow, since for each new example the k nearest neighbours have to be found. The only possibility to reduce execution time of kNN is to reduce the size of the training set, either by removing attributes or by removing examples (where the latter will probably have the greater impact).
Otherwise I would suggest to use another learner than kNN. Basically any learner which actually creates a model will be way faster during application than kNN. Additionally you may be able to learn something about your data by looking at the model. A linear SVM for example outputs the example weights, such that you can see how big the influence of an attribute is for classification. You may want to try: SVM (linear or rbf kernel), decision trees, Linear Regression if you have a regression problem, ...

Best regards,
Marius

jaysonpryde

Thank you very much for this feedback!