"Multilabeling in Text Mining"
MUNISHVIRANG
New Altair Community Member
Dear All
I m trying to classify documents into predefined class using a SVM Learner.
I want to know weather rapid miner allow me to classify one document into multiple class .And if it is possible let me know who we can do it .Appreciate in advance.
<operator name="Root" class="Process" expanded="yes">
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="TextInput" class="TextInput" expanded="yes">
<list key="texts">
<parameter key="Price" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\PRICE"/>
<parameter key="Process" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\PROCESS"/>
<parameter key="Product" value="C:\Documents and Settings\munish.virang\Desktop\SAMPLE_DATA_SET\BARCLAYSBANK\PRODUCT"/>
<parameter key="Promotion" value="C:\Documents and Settings\munish.virang\Desktop\SAMPLE_DATA_SET\BARCLAYSBANK\PROMOTION"/>
</list>
<parameter key="output_word_list" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\words.list"/>
<list key="namespaces">
</list>
<parameter key="create_text_visualizer" value="true"/>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="3"/>
</operator>
<operator name="LovinsStemmer" class="LovinsStemmer">
</operator>
</operator>
<operator name="LibSVMLearner" class="LibSVMLearner">
<parameter key="kernel_type" value="poly"/>
<list key="class_weights">
</list>
<parameter key="calculate_confidences" value="true"/>
</operator>
<operator name="ModelWriter" class="ModelWriter">
<parameter key="model_file" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\SVM.mod"/>
</operator>
</operator>
<operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
<operator name="TextInput (2)" class="TextInput" expanded="no">
<list key="texts">
<parameter key="Price" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\PRICE"/>
<parameter key="Process" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\PROCESS"/>
<parameter key="Product" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\PRODUCT"/>
<parameter key="Promotion" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\PROMOTION"/>
</list>
<parameter key="input_word_list" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\words.list"/>
<list key="namespaces">
</list>
<parameter key="create_text_visualizer" value="true"/>
<operator name="StringTokenizer (2)" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter (2)" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter (2)" class="TokenLengthFilter">
<parameter key="min_chars" value="3"/>
</operator>
<operator name="LovinsStemmer (2)" class="LovinsStemmer">
</operator>
</operator>
<operator name="ModelLoader" class="ModelLoader">
<parameter key="model_file" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\SVM.mod"/>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
</operator>
<operator name="ClassificationPerformance" class="ClassificationPerformance">
<parameter key="main_criterion" value="classification_error"/>
<parameter key="accuracy" value="true"/>
<parameter key="classification_error" value="true"/>
<parameter key="weighted_mean_recall" value="true"/>
<parameter key="weighted_mean_precision" value="true"/>
<list key="class_weights">
</list>
</operator>
</operator>
I m trying to classify documents into predefined class using a SVM Learner.
I want to know weather rapid miner allow me to classify one document into multiple class .And if it is possible let me know who we can do it .Appreciate in advance.
<operator name="Root" class="Process" expanded="yes">
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="TextInput" class="TextInput" expanded="yes">
<list key="texts">
<parameter key="Price" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\PRICE"/>
<parameter key="Process" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\PROCESS"/>
<parameter key="Product" value="C:\Documents and Settings\munish.virang\Desktop\SAMPLE_DATA_SET\BARCLAYSBANK\PRODUCT"/>
<parameter key="Promotion" value="C:\Documents and Settings\munish.virang\Desktop\SAMPLE_DATA_SET\BARCLAYSBANK\PROMOTION"/>
</list>
<parameter key="output_word_list" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\words.list"/>
<list key="namespaces">
</list>
<parameter key="create_text_visualizer" value="true"/>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="3"/>
</operator>
<operator name="LovinsStemmer" class="LovinsStemmer">
</operator>
</operator>
<operator name="LibSVMLearner" class="LibSVMLearner">
<parameter key="kernel_type" value="poly"/>
<list key="class_weights">
</list>
<parameter key="calculate_confidences" value="true"/>
</operator>
<operator name="ModelWriter" class="ModelWriter">
<parameter key="model_file" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\SVM.mod"/>
</operator>
</operator>
<operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
<operator name="TextInput (2)" class="TextInput" expanded="no">
<list key="texts">
<parameter key="Price" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\PRICE"/>
<parameter key="Process" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\PROCESS"/>
<parameter key="Product" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\PRODUCT"/>
<parameter key="Promotion" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\PROMOTION"/>
</list>
<parameter key="input_word_list" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\words.list"/>
<list key="namespaces">
</list>
<parameter key="create_text_visualizer" value="true"/>
<operator name="StringTokenizer (2)" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter (2)" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter (2)" class="TokenLengthFilter">
<parameter key="min_chars" value="3"/>
</operator>
<operator name="LovinsStemmer (2)" class="LovinsStemmer">
</operator>
</operator>
<operator name="ModelLoader" class="ModelLoader">
<parameter key="model_file" value="C:\Documents and Settings\munish.virang\Desktop\XMX\BARCLAYSBANK\SVM.mod"/>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
</operator>
<operator name="ClassificationPerformance" class="ClassificationPerformance">
<parameter key="main_criterion" value="classification_error"/>
<parameter key="accuracy" value="true"/>
<parameter key="classification_error" value="true"/>
<parameter key="weighted_mean_recall" value="true"/>
<parameter key="weighted_mean_precision" value="true"/>
<list key="class_weights">
</list>
</operator>
</operator>
Tagged:
0
Answers
-
Hi,
thats possible, but needs a rather complex process setup. You could simply define several attributes containing a true or false for defining if an example is assigned to the associated class. You could assign these attributes roles like "label01", "label02" and so on. Together with the multiple label iterator, you could learn several SVM models, one per class.
That should solve your problems, although making the apply process a little more complicated, because you would have to iterate manually over all models, load and apply them and everytime rename the old predicted attribute, so that it is not overwritten.
I think, we really might need a Multilabel Meta Learner, solving both steps in one operator, making things much easier, but unfortunately it has a low priority in this moment.
Greetings,
Sebastian0