Help with correct understanding results of classification
Serek91
New Altair Community Member
Hi, I have such table with results of classifications:
I have 4 algorithms. Classification was made for 16 different training sets:
- all => all 15 predictors were used
- 1-15 => each set contains 14 predictors and in each set one different type of predictor was removed
Example of set is in attachment.
Type of excluded predictor | column name in csv
1 - characters_number
2 - sentences_number
3 - words_number
4 - average_sentence_length
5 - average_sentence_words_number
6 - ratio_unique_words
7 - average_word_length
8 - ratio_word_length_[1-16]
9 - ratio_special_characters
10 - ratio_numbers
11 - ratio_punctuation_characters
12 - most_used_word_[1-4]
13 - ratio_letter_[a-z]
14 - ratio_questions;
15 - ratio_exclamations;
I have to samehow conclude why results for 1-15 for each algorithm
and each set are better/worse than results in column "ALL".
But I don't
have any idea why. I know that in most cases, when difference between column ALL and column [1-15] is very small (like < 1%) it is just a luck and randomness. But in cases when difference is higher, probably it is caused by something.
The most important thing - I don't know why for k-NN algorithm results are the same
for columns 9-15...
And good will be to know, why Naive Bayes is the best (54%) and k-NN is a bad algorithm for this task (20%).
Can someone help me with that?
Tagged:
0
Best Answer
-
Hi!
Some partial answers:
k-NN might maily learn from a single or few attributes only if you don't normalize the data. (It's comparing values of different attributes directly, so an attribute with a high scale (like 1000) will dominate attributes on small scales (like nominal attributes encoded as 0 or 1).) If this attribute is still in your data, the result will stay more or less the same.
Naive Bayes is frequently a good algorithm without tuning. On the other hand, there's not a lot to tune, so it's seldom the best.
If you try different pruning settings in the decision tree, you might even get a better result. You can use a building block to do it:
https://community.rapidminer.com/discussion/33910/optimize-decision-tree-and-optimize-svm
Regards,
Balázs1
Answers
-
Hi!
Some partial answers:
k-NN might maily learn from a single or few attributes only if you don't normalize the data. (It's comparing values of different attributes directly, so an attribute with a high scale (like 1000) will dominate attributes on small scales (like nominal attributes encoded as 0 or 1).) If this attribute is still in your data, the result will stay more or less the same.
Naive Bayes is frequently a good algorithm without tuning. On the other hand, there's not a lot to tune, so it's seldom the best.
If you try different pruning settings in the decision tree, you might even get a better result. You can use a building block to do it:
https://community.rapidminer.com/discussion/33910/optimize-decision-tree-and-optimize-svm
Regards,
Balázs1 -
Hello @Serek91
There is a concept in machine learning known as the interaction effect. When you analyze your predictors/features, it is not just the independent features that have an impact on algorithm learning but also due to the interaction effect. For example, let's think there are two features A and B, Now if you run your machine learning model on only A and only B then you might get average performance. If you run the algorithm on A & B in combination, you might get a good result or a really worse result, this means that A and B acted independently in a different way compared to the both A and B when combined given to an algorithm.
This is one reason to check your features using feature selection methods like forward selection or backward elimination. You can also use automatic feature engineering to check this. In your method you tried adding one feature after the other, but what if feature 3 and feature 6 works better in combination than having 1,2,3,4,5,6. This is one important reason we use feature selection. The interaction effect plays major role in traditional algorithms.
Also, did you tune the hyperparameters of these algorithms? For example, in KNN, how did you choose K value? There is an elbow technique that can be used to determine good K-Value. As @BalazsBarany mentioned, it is also important to check the hyperparameters of decision trees, like criterion, pruning (pre and post).
KNN is a lazy algorithm and depends on the K-value. If your labels or data cannot be separated in feature space, KNN misclassifies a lot. Also, you need to check what is the best value for K.
Hope this helps.1 -
Ok, thanks. I added normalization to the k-NN and now I have better results (~46%).Normalization is not needed in rest of algorithms (Naive Bayes, Decision Tree)? I don't see any difference with and without it.0
-
I don't say it's not needed, but for KNN you will definitely find a difference with normalization. The reason is the distance calculation methods used in KNN. KNN mainly relies on surrounding data samples for prediction. There is a beautiful visual example in the StackOverflow post below.
https://stats.stackexchange.com/questions/287425/why-do-you-need-to-scale-data-in-knn
From my experience, there won't be much difference (normalization) in the decision tree as they calculate the impurity index for each attribute and branch down.1 -
Ok, thanks.Results for k-NN are now a way better. Results for Decision Tree are a bit better, but difference is not significant. I will try a bit more to improve it.0
-
Hi, I have next question:Decision Tree - result in columns ALL and 12 are the same. Column 12 has only string values (words), not numerical. Can Decision Tree use predictors with text values? It seems that it can't.0
-
From my understanding, the text data is treated as categorical (nominal) in this case.
0 -
According to docs:This Operator can process ExampleSets containing both nominal and numerical Attributes.So it should have some impact on final result. But result is still the same. No matter if predictor is included or not.
0 -
You should see the two models and see if it has that feature/attribute in the tree. May be that attribute got pruned0
-
I made prediction only using this one parameter, and I got:0
-
Makes sense, its zero accuracy cause it cannot predict with that one, it just randomly labeled predictions. If you want to predict from text, you should use some techniques like tokenization1
-
What is 792246? Is it a column name? I think some issue in the process structure. Not sure unless I see data and process. Based on posted picture I am bit confused. Only reasons I can think is everything go pruned due to no added value in tree or some issue in process input0
-
Ehhh... so it will be hard to do it now... I don't have time for it...Thanks anyway.EDIT:What is 792246? Is it a column name? I think some issue in the process structure. Not sure unless I see data and process. Based on posted picture I am bit confused. Only reasons I can think is everything go pruned due to no added value in tree or some issue in process inputI added wrong image^^It should be this one:1