Text Classification/Labeling using Description

rikin_j_parekh
rikin_j_parekh New Altair Community Member
edited November 5 in Community Q&A

Hi All,

 

I am new to RapidMiner and would like to perform labeling on a 'Long Description' column using a CSV file. I will be working with 2 columns mainly, 'Long Description' and 'Label'. The 'Label' is applied based on the 'Long Description' value. I have 1000 rows out of which 80% of 'Label' values are already applied as a training set. I wish to populate the remaining 20% 'Label' values using the 'Long Description' value.

All Label Values - 

Cancellation
Price Increase
Normal Payment
Payoff
Price Decrease
Installer Installation Issue
Past Due Payment
Change Order
Incentive Payment
Assumption
Completion Certificate
Interest
Referral

Example -

Long Description - Please review change order in installation phase - loan amount increasing from USD 21;851.00 to USD 24;501.00
Label - Price Increase

Long Description - Cancellation request with SPV Assignment

Label - Cancellation

How should I proceed with this using RapidMiner and what should be the steps to perform the same?

 

Thanks

Best Answer

  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓

    You should search the forums for some of the threads on text mining, you will find a lot of helpful information there.  This is a classic classification problem.  You'll use your "long description" as the text, process and tokenize it, and then use the resulting word vectors to predict the label.

    However, you may find that you need to consolidate labels.  You have a lot of distinct values, and classification problems increase in complexity when you have have a lot of potential individual label values to predict.  So you may find better success by grouping some of the existing labels together into larger categories.  That's something that you will need to play around with manually, there's not an easy way to automate that in RapidMiner.

     

     

Answers

  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓

    You should search the forums for some of the threads on text mining, you will find a lot of helpful information there.  This is a classic classification problem.  You'll use your "long description" as the text, process and tokenize it, and then use the resulting word vectors to predict the label.

    However, you may find that you need to consolidate labels.  You have a lot of distinct values, and classification problems increase in complexity when you have have a lot of potential individual label values to predict.  So you may find better success by grouping some of the existing labels together into larger categories.  That's something that you will need to play around with manually, there's not an easy way to automate that in RapidMiner.