"Text Classification with SVM"

Jepse
Jepse New Altair Community Member
edited November 2024 in Community Q&A
Hi,

My goal is to optimize sentiment prediction by using SVM insted knn.

Therefor i had to bring up a substep which proves for subjectivity (with a two class Model: subjectivity, nonsub). Each sentence with subjectivity would be relevant for my SVM based approach. Certainly, i'm using SVM to predict the subjectivity. But when i apply the Model on unclassified sentences i receive as a result just on prediction class.

This is my process to create the model. The process to apply the model is down under.  Am i missing something?
[tt]<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
    <process expanded="true" height="539" width="1069">
      <operator activated="false" class="store" compatibility="5.1.001" expanded="true" height="60" name="Store" width="90" x="916" y="435">
        <parameter key="repository_entry" value="01_model_svm"/>
      </operator>
      <operator activated="false" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="06_classified_data_binomial"/>
      </operator>
      <operator activated="false" class="naive_bayes" compatibility="5.1.001" expanded="true" height="76" name="Naive Bayes" width="90" x="514" y="30">
        <parameter key="laplace_correction" value="false"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Retrieve (3)" width="90" x="45" y="120">
        <parameter key="repository_entry" value="11_classified_data_subjectivity_binomial"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="5.1.000" expanded="true" height="76" name="Process Documents from Data" width="90" x="246" y="30">
        <parameter key="vector_creation" value="Binary Term Occurrences"/>
        <parameter key="prune_method" value="absolute"/>
        <parameter key="prune_below_absolute" value="2"/>
        <parameter key="prune_above_absolute" value="10000"/>
        <list key="specify_weights"/>
        <process expanded="true" height="537" width="839">
          <operator activated="true" class="text:tokenize" compatibility="5.1.000" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
          <operator activated="true" class="text:stem_snowball" compatibility="5.1.000" expanded="true" height="60" name="Stem (Snowball)" width="90" x="447" y="30">
            <parameter key="language" value="German"/>
          </operator>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Stem (Snowball)" to_port="document"/>
          <connect from_op="Stem (Snowball)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes" width="90" x="112" y="255">
        <parameter key="attribute_filter_type" value="no_missing_values"/>
        <parameter key="attributes" value="label|sentence"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.1.001" expanded="true" height="76" name="Set Role" width="90" x="246" y="165">
        <parameter key="name" value="label"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="support_vector_machine_libsvm" compatibility="5.1.001" expanded="true" height="76" name="SVM (5)" width="90" x="514" y="345">
        <parameter key="kernel_type" value="linear"/>
        <parameter key="cache_size" value="40"/>
        <parameter key="epsilon" value="0.1"/>
        <list key="class_weights"/>
        <parameter key="calculate_confidences" value="true"/>
      </operator>
      <operator activated="true" class="store" compatibility="5.1.001" expanded="true" height="60" name="Store (3)" width="90" x="715" y="300">
        <parameter key="repository_entry" value="10_model_no_boosting_svm"/>
      </operator>
      <operator activated="false" class="k_nn" compatibility="5.1.001" expanded="true" height="76" name="k-NN" width="90" x="648" y="165">
        <parameter key="k" value="3"/>
        <parameter key="measure_types" value="NumericalMeasures"/>
        <parameter key="numerical_measure" value="CosineSimilarity"/>
      </operator>
      <operator activated="false" class="store" compatibility="5.1.001" expanded="true" height="60" name="Store (2)" width="90" x="782" y="165">
        <parameter key="repository_entry" value="//sentiment_prediction/10_model_no_boosting_knn_subjectivity"/>
      </operator>
      <connect from_op="Retrieve (3)" from_port="output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="SVM (5)" to_port="training set"/>
      <connect from_op="SVM (5)" from_port="model" to_op="Store (3)" to_port="input"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>[/tt]



This is the Process to apply the model:
[tt]<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
    <parameter key="parallelize_main_process" value="true"/>
    <process expanded="true" height="469" width="768">
      <operator activated="true" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="JPW_getTestData" width="90" x="45" y="120">
        <parameter key="repository_entry" value="//sentiment_prediction/05_test_dataset"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Model SVM#" width="90" x="45" y="255">
        <parameter key="repository_entry" value="10_model_no_boosting_svm"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="5.1.000" expanded="true" height="76" name="Process Documents from Data (2)" width="90" x="447" y="30">
        <parameter key="prune_method" value="absolute"/>
        <parameter key="prune_below_absolute" value="2"/>
        <parameter key="prune_above_absolute" value="10000"/>
        <list key="specify_weights"/>
        <process expanded="true" height="469" width="763">
          <operator activated="true" class="text:tokenize" compatibility="5.1.000" expanded="true" height="60" name="Tokenize (2)" width="90" x="45" y="30"/>
          <operator activated="true" class="text:stem_snowball" compatibility="5.1.000" expanded="true" height="60" name="Stem (2)" width="90" x="404" y="30">
            <parameter key="language" value="German"/>
          </operator>
          <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
          <connect from_op="Tokenize (2)" from_port="document" to_op="Stem (2)" to_port="document"/>
          <connect from_op="Stem (2)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="informationExtraction:tree_svm" compatibility="1.0.000" expanded="true" height="94" name="TreeSVM" width="90" x="305" y="396"/>
      <operator activated="true" class="apply_model" compatibility="5.1.001" expanded="true" height="76" name="Apply Model" width="90" x="313" y="165">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="5.1.001" expanded="true" height="94" name="Multiply" width="90" x="447" y="345"/>
      <operator activated="true" class="select_attributes" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes" width="90" x="514" y="165">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="id|prediction(label)|sentence|confidence(neutral)|confidence(positiv)|confidence(negativ)"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="write_database" compatibility="5.1.001" expanded="true" height="60" name="Write Database" width="90" x="648" y="165">
        <parameter key="connection" value="192.168.1.124 @sentiment_analysis"/&gt;
        <parameter key="table_name" value="sentiment_unclassified_set_subjectivity"/>
        <parameter key="overwrite_mode" value="overwrite"/>
      </operator>
      <connect from_op="JPW_getTestData" from_port="output" to_op="Process Documents from Data (2)" to_port="example set"/>
      <connect from_op="Model SVM#" from_port="output" to_op="Apply Model" to_port="model"/>
      <connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 2" to_port="result 1"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Write Database" to_port="input"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>[/tt]

Answers

  • haddock
    haddock New Altair Community Member
    Hi there Jepse,

    In my own work I've found that SVMs are very sensitive to their parameters, kernel type, C and epsilon. Things aren't made any easier by by the fact that those parameters are rather tricky to define http://www.svms.org/parameters/; so your best bet is to do the combinations and check the performances.

    I only mention this because I see that your model is made on one pass with C set to zero; without checking against the data it is not possible to be definitive, but I'm not that surprised that you just get one class in the prediction column. There is a handy paper on this subject at http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf which will point you in the right direction.

    Hope so, have fun!

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.