🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

binary text classification test-set problem

User: "yeahiii"
New Altair Community Member
Updated by Jocelyn
Hey,
I created a process to classify 2 categories of documents. Every works fine, while reducing the test set (from a different database / domain) to only 1 class (recall 99%). If I remove the filtering of the second class the whole process doesn't work anymore. I don't think it's a problem of overfitting, since the test data is coming from another database. Currently my setup looks like this:

DB-Training -> Process Documents (TF/IDF) -> Train libSVM --------------------------V

DB-Test (different db) -> Filter Class 1 -> Process Documents (TF/IDF) -> Apply Svm -> Performance (Recall of Class 2 = 99%)

I did NOT connect the wordlist of the training-db-"processed documents" to the test-db-"processed documents" one. If i do so, the recall decreases to 0%. Am I doing something wrong with the process-documents of the training-data part or am I missing something?

Find more posts tagged with

Comments

No comments on this post.