Which are the most important parameters to tune for k-NN, NB, RF, DL, SVM for text classification?
jochen_hartmann
New Altair Community Member
Dear community,
I would like to compare the performance of the following five algorithms on different text classification tasks*:
- k-Nearest Neighbors (k-NN)
- Naive Bayes (NB)
- Random Forest (RF)
- Deep Learning (DL)
- Support Vector Machines (SVM)
Question 1: Which paramesters are the most important to optimize for each method 1-5?
Question 2: What ranges should I give those parameters in the parameter optimization operator in order to avoid "boiling the ocean"?
Thanks in advance!
* each task has between 3 to 5 classes and the text length varies between 3 to 70 words per document / example
Tagged:
0
Best Answers
-
Great question!
- With K-nn I would optimize around "k".
- Naive Bayes I usually don't optimize
- Random Forest I would optimize depth of trees, # of trees, confidence
- Deep Learning I'm not sure but I would choose a few of the activation functions
- For text, I would use a LinearSVM and optimize C.
2 -
Excellent suggestions from @Thomas_Ott as usual. I would add a couple more:
- There isn't actually anything to optimize with Naive Bayes, there is only one parameter (Laplace correction) and I would definitely leave it on.
- For Random Forest, I would also optimize the growing criterion (information gain, gain ratio, Gini, accuracy).
- For SVM, you might also try a polynomial kernel and optimize C as well as degree in the range of 1-4.
3
Answers
-
Great question!
- With K-nn I would optimize around "k".
- Naive Bayes I usually don't optimize
- Random Forest I would optimize depth of trees, # of trees, confidence
- Deep Learning I'm not sure but I would choose a few of the activation functions
- For text, I would use a LinearSVM and optimize C.
2 -
Excellent suggestions from @Thomas_Ott as usual. I would add a couple more:
- There isn't actually anything to optimize with Naive Bayes, there is only one parameter (Laplace correction) and I would definitely leave it on.
- For Random Forest, I would also optimize the growing criterion (information gain, gain ratio, Gini, accuracy).
- For SVM, you might also try a polynomial kernel and optimize C as well as degree in the range of 1-4.
3