Which are the most important parameters to tune for k-NN, NB, RF, DL, SVM for text classification?

User: "jochen_hartmann"

jochen_hartmann

New Altair Community Member

Updated Nov 5, 2024 by Jocelyn

Dear community,

I would like to compare the performance of the following five algorithms on different text classification tasks*:

k-Nearest Neighbors (k-NN)
Naive Bayes (NB)
Random Forest (RF)
Deep Learning (DL)
Support Vector Machines (SVM)

Question 1: Which paramesters are the most important to optimize for each method 1-5?

Question 2: What ranges should I give those parameters in the parameter optimization operator in order to avoid "boiling the ocean"?

Thanks in advance!

* each task has between 3 to 5 classes and the text length varies between 3 to 70 words per document / example

Find more posts tagged with

Text Mining + NLP

Sort by:

1 - 2 of 21

User: "Thomas_Ott"

New Altair Community Member

Accepted Answer

Great question!

With K-nn I would optimize around "k".
Naive Bayes I usually don't optimize
Random Forest I would optimize depth of trees, # of trees, confidence
Deep Learning I'm not sure but I would choose a few of the activation functions
For text, I would use a LinearSVM and optimize C.

View in context

User: "Telcontar120"

New Altair Community Member

Accepted Answer

Excellent suggestions from @Thomas_Ott as usual. I would add a couple more:

There isn't actually anything to optimize with Naive Bayes, there is only one parameter (Laplace correction) and I would definitely leave it on.
For Random Forest, I would also optimize the growing criterion (information gain, gain ratio, Gini, accuracy).
For SVM, you might also try a polynomial kernel and optimize C as well as degree in the range of 1-4.

View in context

Quick Links