🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"newbie: Excel to text"

User: "shilaski"
New Altair Community Member
Updated by Jocelyn
Here is my project scope.  I have an excel spreadsheet of warranty claims with around 9100 entries.  One of the columns within the spreadsheet contains a comment section.  This section is where a tech will write what was wrong with the vehicle.  These sections are what I want to text mine.

I have figured out how to load the sheet and run it thru the filter so I am just concentrating on data that I am interested in.  Now I am guessing that I need to use the text plugin tool to create word vectors (please tell me if I am wrong).  It appears that the textinput operator expects an exampleset as it's input from a directory.  My question is how to correctly load the textinput operator.  Of couse I could be completely wrong...maybe there is a better way to do this?

Here is what I have

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExcelExampleSource" class="ExcelExampleSource">
        <parameter key="excel_file" value="C:\Documents and Settings\shilaski.TRICO29\Desktop\50percentWarranty\Conceptsheet_Claims_short.xls"/>
        <parameter key="first_row_as_names" value="true"/>
    </operator>
    <operator name="AttributeFilter" class="AttributeFilter">
        <parameter key="condition_class" value="attribute_name_filter"/>
        <parameter key="parameter_string" value="comments"/>
    </operator>
    <operator name="Nominal2String" class="Nominal2String">
    </operator>
    <operator name="TextInput" class="TextInput" expanded="yes">
        <parameter key="create_text_visualizer" value="true"/>
        <parameter key="id_attribute_type" value="long"/>
        <parameter key="use_content_attributes" value="true"/>
        <operator name="StringTokenizer" class="StringTokenizer">
        </operator>
    </operator>
</operator>

Find more posts tagged with

Sort by:
1 - 3 of 31
    User: "TobiasMalbrecht"
    New Altair Community Member
    Hi Stacy,

    in principal you are right. You simply have to use the [tt]StringTextInput[/tt] operator instead of the [tt]TextInput[/tt]. The first one will load the texts from strings form an already present example set. The latter one will load the texts from files directly.

    Hope that helps,
    regards,
    Tobias
    User: "shilaski"
    New Altair Community Member
    OP
    Alright...Here is where I am at..

    <operator name="Root" class="Process" expanded="yes">
        <description text="#ylt#h3#ygt#Finding important terms#ylt#/h3#ygt##ylt#p#ygt#This experiments shows how to find terms that are characteristic for a set of texts#ylt#/p#ygt#. #ylt#p#ygt##ylt#b#ygt#Hint:#ylt#/b#ygt#In the interactive keyword selection, click on weight to sort the terms by their relevance to the class specified in the CorpusBasedWeighting operator.#ylt#/p#ygt#"/>
        <operator name="ExcelExampleSource" class="ExcelExampleSource">
            <parameter key="datamanagement" value="long_array"/>
            <parameter key="excel_file" value="C:\Documents and Settings\shilaski.TRICO29\Desktop\50percentWarranty\Conceptsheet_Claims_short.xls"/>
        </operator>
        <operator name="AttributeFilter" class="AttributeFilter">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="parameter_string" value="comments"/>
        </operator>
        <operator name="Nominal2String" class="Nominal2String">
        </operator>
        <operator name="StringTextInput" class="StringTextInput" expanded="yes">
            <parameter key="default_content_language" value="english"/>
            <parameter key="vector_creation" value="TermOccurrences"/>
            <operator name="StringTokenizer" class="StringTokenizer">
            </operator>
            <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
            </operator>
            <operator name="TokenLengthFilter" class="TokenLengthFilter">
                <parameter key="min_chars" value="3"/>
            </operator>
            <operator name="PorterStemmer" class="PorterStemmer">
            </operator>
        </operator>
        <operator name="CorpusBasedWeighting" class="CorpusBasedWeighting">
            <parameter key="class_to_characterize" value="graphics"/>
        </operator>
        <operator name="InteractiveAttributeWeighting" class="InteractiveAttributeWeighting">
        </operator>
    </operator>

    Problem now is that I keep on getting an error

    Error in: StringTextInput (StringTextInput) The input example set does not contain any attributes with value type string. Some operators require example sets with attributes of a specific value type. Please refer to the documentation of the used operators for further details.
    User: "shilaski"
    New Altair Community Member
    OP
    figured it out.  Somehow I missed called out the parameter for which column I wanted.  Had it called out before,  but I supposed I should have troubleshot it before posting to the forums.

    Thanks