"SVM learner on Excel data"
Hello
Further to my last post I've tried to use an Excel sheet which I put some post data in.
I import the Excel via ExcelExampleSource, then use StringTextInput to process the data set via StringTokenizer, EnglishStopWordFilter, TokenLengthFilter, PorterStemmer
This works fine, but I get an error message when I try to run LibSVMLearner on the results, basically telling me that I can't use polynomial attributes.....but there are only two labels (positive and negative in the dataset).
http://mrinterview2.gfknop.co.uk/jk/work/clipboard02.jpg
http://mrinterview2.gfknop.co.uk/jk/work/clipboard01.jpg
http://mrinterview2.gfknop.co.uk/jk/work/clipboard03.jpg
Excel is here:
http://mrinterview2.gfknop.co.uk/jk/work/Book2.xls
I'm just trying to get my head around making the learner build a model based on this data.
The XML is here:
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="D:\Work\Book2.xls"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="label_column" value="3"/>
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<parameter key="filter_nominal_attributes" value="true"/>
<list key="namespaces">
</list>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="3"/>
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
</operator>
<operator name="LibSVMLearner" class="LibSVMLearner">
<parameter key="C" value="10.0"/>
<list key="class_weights">
</list>
<parameter key="keep_example_set" value="true"/>
<parameter key="kernel_type" value="linear"/>
</operator>
</operator>
Thanks!
Jason
Further to my last post I've tried to use an Excel sheet which I put some post data in.
I import the Excel via ExcelExampleSource, then use StringTextInput to process the data set via StringTokenizer, EnglishStopWordFilter, TokenLengthFilter, PorterStemmer
This works fine, but I get an error message when I try to run LibSVMLearner on the results, basically telling me that I can't use polynomial attributes.....but there are only two labels (positive and negative in the dataset).
http://mrinterview2.gfknop.co.uk/jk/work/clipboard02.jpg
http://mrinterview2.gfknop.co.uk/jk/work/clipboard01.jpg
http://mrinterview2.gfknop.co.uk/jk/work/clipboard03.jpg
Excel is here:
http://mrinterview2.gfknop.co.uk/jk/work/Book2.xls
I'm just trying to get my head around making the learner build a model based on this data.
The XML is here:
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="D:\Work\Book2.xls"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="label_column" value="3"/>
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<parameter key="filter_nominal_attributes" value="true"/>
<list key="namespaces">
</list>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="3"/>
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
</operator>
<operator name="LibSVMLearner" class="LibSVMLearner">
<parameter key="C" value="10.0"/>
<list key="class_weights">
</list>
<parameter key="keep_example_set" value="true"/>
<parameter key="kernel_type" value="linear"/>
</operator>
</operator>
Thanks!
Jason
0
Answers
-
Hi Jason,
the problem here is not the label (containing positive or negative as values) but a polynomial attribute. If you look more closely at your data (or maybe better the meta data) you will recognize that your data contains a polynomial attribute ... the attribute text which contains the actual texts is potentially a string attribute and therefore (poly)nominal. You simply have to filter that attribute out (or set it to a special role) and then your process should be working.
Hope that helps,
Tobias0