Labelling training cases in polynominal text clasification task

User22883 · November 2019

Each case in my dataset contains multiple sentences as shown below.

"Criterion 4Writing General ‚Äì language and grammar and referencing.UnacceptableSentence structure and grammar inadequate for clarity and/or incomplete referencing of sourced material.AcceptableSentence structure and grammar adequate, but errors cause distraction and/or errors in referencing.GoodSentence structure and grammar adequate, with minor errors that do not distract reader from the main message.Very GoodSentence structures and grammar are good with correct referencing of all sourced material.ExcellentEmploys words with fluency for ease of reading. Writing and references are essentially error free."

I would like to classify the cases according to their main focus. My labels are ["Information Literacy", Written Communication", Digital Literacy"...] 8 in total.

When developing the training set some cases clearly relate to one area such as Information Literacy... In those instances my training data looks like this:

ID, Text, Lable

01 "string", "Information Literacy"

However, some cases relate to multiple labels.

My question is how should these cases be documented in the training set?

Hope that makes sense.

rfuentealba · November 2019

Hello,

Let's use a simpler case here to make an example.

Label         | Text<br>weather       | This will be a cold winter<br>food          | a few sandwiches for me<br>weather       | It's raining today<br>food          | Give me some coffee<br>sports        | Michael Jordan is the greatest basketball player ever <br>

The result for this one should be:

weather, food | Today it was cold, I made coffee and sandwiches.<br>

Right?

What I did to solve this was to train three different models (8, in your case). One that can recognize weather from not-weather, other that can recognize food from not-food, and a third one that can recognize sports from not-sports.

You can make use of Multiply, Macros, and a few other things to train multiple models and then apply these models iteratively.

It's not the most elegant solution and maybe @Telcontar120 has another one. I'll try to find an example to share with you, ok?

All the best,

Rodrigo.

rfuentealba · November 2019

Hello,

Let's use a simpler case here to make an example.

Label         | Text<br>weather       | This will be a cold winter<br>food          | a few sandwiches for me<br>weather       | It's raining today<br>food          | Give me some coffee<br>sports        | Michael Jordan is the greatest basketball player ever <br>

The result for this one should be:

weather, food | Today it was cold, I made coffee and sandwiches.<br>

Right?

What I did to solve this was to train three different models (8, in your case). One that can recognize weather from not-weather, other that can recognize food from not-food, and a third one that can recognize sports from not-sports.

You can make use of Multiply, Macros, and a few other things to train multiple models and then apply these models iteratively.

It's not the most elegant solution and maybe @Telcontar120 has another one. I'll try to find an example to share with you, ok?

All the best,

Rodrigo.

User22883 · November 2019

Hi Rodrigo,

I understand the methodology thank you. Could you provide some guidance on how I might go about implementing the approach?

Labelling training cases in polynominal text clasification task

Best Answer

Answers

Categories