🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Text Classification

User: "ar4o"
New Altair Community Member
Updated by Jocelyn

Hi there!

 

I have tried to find something which would help me on this forum but couldn't. Hopefully, someone will answer me and I would be able to solve the issue. 

 

Let me first a bit describe the task. I have 2 datasets, which contain 2 columns: sentence and label. There are 2 possible labels - true or false. I also have 3 dictionaries of phrases (they can be unigrams, bigram, 3-grams,...).

 

What I want to do:

1) To train SVM classifier on dataset1 and test it on the same dataset (I did it sucessfully with cross-validation). 

2) To train SVM classifier on dataset2 and apply the model on dataset1.

3) Use dictionary of phrases as features to dataset1.

 

My questions:

1) As far as I understand, if I want to train model on one dataset and test it on another, I have to use the same set of features. So I am trying to use the operator "Process documents from data" with the same staff inside (tokenizer, stemming, filtering out stopwords,...) than I take the wordlist of dataset2 and trying to add it as an input to the next "Process documents from data" as a wordlist.

 

Снимок экрана 2017-04-29 в 14.55.01.png

But while running I get this error message:

Снимок экрана 2017-04-29 в 14.56.31.png

In WikiTraining I have 10000 sentences, in debates 2000. 

But I don't get the problem. Can someone please explain me and how can I avoid it?

 

2) How can I use separate CSV-files with phrases (let's call it dictionaries) as my features in a dataset? Let's say that my dictionary contains only triggers, which says that this sentence is of class TRUE. How can I do that?

 

Thank you in advance!

Find more posts tagged with

Sort by:
1 - 1 of 11