How to get F_score in Naive Bayes sentiment analysis

HeikoeWin786 · July 2020

Dear all,

I am getting an error when I connect the performance matrix (binomial) to the model.
I need to calculate F_score as my datasets is imbalance..
Will be truly appreciated

Image: https://us.v-cdn.net/6030995/uploads/editor/dg/ql1si1a7jqp7.png

if anyone of you faced this issue before or can suggest me the way out here.

thanks a lot in advance,
regards,
Heikoe

jacobcybulski · July 2020

Yes, pretty much spot on, you can balance the training set to avoid the model bias but you apply the model to a test set which has a mix of label classes which is representative of the population. Just ensure that if you do any pre-processing for training you will need to do exactly the same pre-processing for a test set (except the class balancing) , using the pre-processing models from the training run (you can save them and then retrieve them later).

Telcontar120 · July 2020

This error is telling you that your label is polynominal (meaning it has many potential values) and not binominal (meaning it has exactly two values). So you need to make sure you are using a compatible label for this performance operator.

jacobcybulski · July 2020

You can also use a normal classification performance and measure Kappa, which also better copes with imbalanced class distribution. However, any model trained on an imbalanced label classes may end up biased towards the majority class, so performance measurement may not fix your problems. Instead you could try balancing your classes before model training, e. g. using SMOTE operator, and then apply the resulting model to the test set which has the original class distribution (to get a realistic idea on the model performance). Also always check the whole confusion matrix rather than a single value performance measure.

HeikoeWin786 · July 2020

@jacobcybulski

Hello Jacob, thanks a lot for explanation. For this, if I understood correctly,
1) Retrieve training dataset --> SMOTE --> Pre-processing the data (Process data to doc) --> NBC --> Store the model
2) Retrieve training dataset --> Pre-processing the data (Process data to doc) -->apply the model (which we stored in step 1)

Am I correct?

thanks much,
Heikoe

jacobcybulski · July 2020

Yes, pretty much spot on, you can balance the training set to avoid the model bias but you apply the model to a test set which has a mix of label classes which is representative of the population. Just ensure that if you do any pre-processing for training you will need to do exactly the same pre-processing for a test set (except the class balancing) , using the pre-processing models from the training run (you can save them and then retrieve them later).

HeikoeWin786 · July 2020

@jacobcybulski

Fully understood, Jacob. I will try as advised. Much appreciated for your time and help.

Regards,
Heikoe

How to get F_score in Naive Bayes sentiment analysis

Welcome!

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories