"how to decrease model size - delete where weight

emolano
emolano New Altair Community Member
edited November 5 in Community Q&A
Hi there.. me again :)
I have a process to create a textmining model. My model is too big so I want it to use data where weight>0... on the weight table I see lots of words with weight=0 that I want to delete - not include in the model. Is there a way to do this?
thanks again for your help!
here my code
 
<operator name="Root" class="Process" expanded="yes">
    <description text="#ylt#h3#ygt#text Data Mining#ylt#/h3#ygt##ylt#p#ygt##ylt#/p#ygt#"/>
    <operator name="DatabaseExampleSource" class="DatabaseExampleSource">
        <parameter key="database_url" value="jdbc:mysql://bi01:3306/database"/>
        <parameter key="username" value="user"/>
        <parameter key="password" value="pwd"/>
        <parameter key="query" value="SELECT `ID_NUM`, `SHORT_DESC`, `PLATFORM` FROM `TABLEX`;"/>
        <parameter key="label_attribute" value="PLATFORM"/>
        <parameter key="id_attribute" value="ID_NUM"/>
    </operator>
    <operator name="StringTextInput" class="StringTextInput" expanded="yes">
        <parameter key="filter_nominal_attributes" value="true"/>
        <parameter key="remove_original_attributes" value="true"/>
        <parameter key="default_content_language" value="english"/>
        <parameter key="output_word_list" value="crmtraining_words.list"/>
        <list key="namespaces">
        </list>
        <operator name="StringTokenizer" class="StringTokenizer">
        </operator>
        <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
        </operator>
        <operator name="TokenLengthFilter" class="TokenLengthFilter">
            <parameter key="min_chars" value="2"/>
        </operator>
        <operator name="ToLowerCaseConverter" class="ToLowerCaseConverter">
        </operator>
        <operator name="PorterStemmer" class="PorterStemmer">
        </operator>
        <operator name="StopwordFilterFile" class="StopwordFilterFile">
            <parameter key="file" value="stop_filter_platform.txt"/>
        </operator>
        <operator name="TermNGramGenerator" class="TermNGramGenerator">
            <parameter key="max_length" value="3"/>
        </operator>
    </operator>
    <operator name="LibSVMLearner" class="LibSVMLearner">
        <parameter key="kernel_type" value="linear"/>
        <list key="class_weights">
        </list>
    </operator>
    <operator name="ModelWriter" class="ModelWriter">
        <parameter key="model_file" value="model.mod"/>
    </operator>
</operator>
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi,
    you could use a weighting scheme before applying the learner, this would reduce the number of attributes and hence the length of support vectors. A similar weighting to the svm's weight vectors will be given by the SVMWeighting operator.  If you need to apply the weights lateron, you could use the attributeWeightsApplier.

    Greetings,
      Sebastian