"how to decrease model size - delete where weight
emolano
New Altair Community Member
Hi there.. me again
I have a process to create a textmining model. My model is too big so I want it to use data where weight>0... on the weight table I see lots of words with weight=0 that I want to delete - not include in the model. Is there a way to do this?
thanks again for your help!
here my code
<operator name="Root" class="Process" expanded="yes">
<description text="#ylt#h3#ygt#text Data Mining#ylt#/h3#ygt##ylt#p#ygt##ylt#/p#ygt#"/>
<operator name="DatabaseExampleSource" class="DatabaseExampleSource">
<parameter key="database_url" value="jdbc:mysql://bi01:3306/database"/>
<parameter key="username" value="user"/>
<parameter key="password" value="pwd"/>
<parameter key="query" value="SELECT `ID_NUM`, `SHORT_DESC`, `PLATFORM` FROM `TABLEX`;"/>
<parameter key="label_attribute" value="PLATFORM"/>
<parameter key="id_attribute" value="ID_NUM"/>
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<parameter key="filter_nominal_attributes" value="true"/>
<parameter key="remove_original_attributes" value="true"/>
<parameter key="default_content_language" value="english"/>
<parameter key="output_word_list" value="crmtraining_words.list"/>
<list key="namespaces">
</list>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="2"/>
</operator>
<operator name="ToLowerCaseConverter" class="ToLowerCaseConverter">
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
<operator name="StopwordFilterFile" class="StopwordFilterFile">
<parameter key="file" value="stop_filter_platform.txt"/>
</operator>
<operator name="TermNGramGenerator" class="TermNGramGenerator">
<parameter key="max_length" value="3"/>
</operator>
</operator>
<operator name="LibSVMLearner" class="LibSVMLearner">
<parameter key="kernel_type" value="linear"/>
<list key="class_weights">
</list>
</operator>
<operator name="ModelWriter" class="ModelWriter">
<parameter key="model_file" value="model.mod"/>
</operator>
</operator>
I have a process to create a textmining model. My model is too big so I want it to use data where weight>0... on the weight table I see lots of words with weight=0 that I want to delete - not include in the model. Is there a way to do this?
thanks again for your help!
here my code
<operator name="Root" class="Process" expanded="yes">
<description text="#ylt#h3#ygt#text Data Mining#ylt#/h3#ygt##ylt#p#ygt##ylt#/p#ygt#"/>
<operator name="DatabaseExampleSource" class="DatabaseExampleSource">
<parameter key="database_url" value="jdbc:mysql://bi01:3306/database"/>
<parameter key="username" value="user"/>
<parameter key="password" value="pwd"/>
<parameter key="query" value="SELECT `ID_NUM`, `SHORT_DESC`, `PLATFORM` FROM `TABLEX`;"/>
<parameter key="label_attribute" value="PLATFORM"/>
<parameter key="id_attribute" value="ID_NUM"/>
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<parameter key="filter_nominal_attributes" value="true"/>
<parameter key="remove_original_attributes" value="true"/>
<parameter key="default_content_language" value="english"/>
<parameter key="output_word_list" value="crmtraining_words.list"/>
<list key="namespaces">
</list>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="2"/>
</operator>
<operator name="ToLowerCaseConverter" class="ToLowerCaseConverter">
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
<operator name="StopwordFilterFile" class="StopwordFilterFile">
<parameter key="file" value="stop_filter_platform.txt"/>
</operator>
<operator name="TermNGramGenerator" class="TermNGramGenerator">
<parameter key="max_length" value="3"/>
</operator>
</operator>
<operator name="LibSVMLearner" class="LibSVMLearner">
<parameter key="kernel_type" value="linear"/>
<list key="class_weights">
</list>
</operator>
<operator name="ModelWriter" class="ModelWriter">
<parameter key="model_file" value="model.mod"/>
</operator>
</operator>
0
Answers
-
Hi,
you could use a weighting scheme before applying the learner, this would reduce the number of attributes and hence the length of support vectors. A similar weighting to the svm's weight vectors will be given by the SVMWeighting operator. If you need to apply the weights lateron, you could use the attributeWeightsApplier.
Greetings,
Sebastian0