"SVM learner on Excel data"

Unknown
edited November 5 in Community Q&A
Hello

Further to my last post I've tried to use an Excel sheet which I put some post data in.

I import the Excel  via ExcelExampleSource, then use StringTextInput to process the data set via StringTokenizer, EnglishStopWordFilter, TokenLengthFilter, PorterStemmer

This works fine, but I get an error message when I try to run LibSVMLearner on the results, basically telling me that I can't use polynomial attributes.....but  there are only two labels (positive and negative in the dataset).

http://mrinterview2.gfknop.co.uk/jk/work/clipboard02.jpg
http://mrinterview2.gfknop.co.uk/jk/work/clipboard01.jpg
http://mrinterview2.gfknop.co.uk/jk/work/clipboard03.jpg

Excel is here:

http://mrinterview2.gfknop.co.uk/jk/work/Book2.xls

I'm just trying to get my head around making the learner build a model based on this data.

The XML is here:

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExcelExampleSource" class="ExcelExampleSource">
        <parameter key="excel_file" value="D:\Work\Book2.xls"/>
        <parameter key="first_row_as_names" value="true"/>
        <parameter key="label_column" value="3"/>
    </operator>
    <operator name="StringTextInput" class="StringTextInput" expanded="yes">
        <parameter key="filter_nominal_attributes" value="true"/>
        <list key="namespaces">
        </list>
        <operator name="StringTokenizer" class="StringTokenizer">
        </operator>
        <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
        </operator>
        <operator name="TokenLengthFilter" class="TokenLengthFilter">
            <parameter key="min_chars" value="3"/>
        </operator>
        <operator name="PorterStemmer" class="PorterStemmer">
        </operator>
    </operator>
    <operator name="LibSVMLearner" class="LibSVMLearner">
        <parameter key="C" value="10.0"/>
        <list key="class_weights">
        </list>
        <parameter key="keep_example_set" value="true"/>
        <parameter key="kernel_type" value="linear"/>
    </operator>
</operator>

Thanks!

Jason
Tagged:

Answers

  • TobiasMalbrecht
    TobiasMalbrecht New Altair Community Member
    Hi Jason,

    the problem here is not the label (containing positive or negative as values) but a polynomial attribute. If you look more closely at your data (or maybe better the meta data) you will recognize that your data contains a polynomial attribute ... the attribute text which contains the actual texts is potentially a string attribute and therefore (poly)nominal. You simply have to filter that attribute out (or set it to a special role) and then your process should be working.

    Hope that helps,
    Tobias