"newbie: Excel to text"
shilaski
New Altair Community Member
Here is my project scope. I have an excel spreadsheet of warranty claims with around 9100 entries. One of the columns within the spreadsheet contains a comment section. This section is where a tech will write what was wrong with the vehicle. These sections are what I want to text mine.
I have figured out how to load the sheet and run it thru the filter so I am just concentrating on data that I am interested in. Now I am guessing that I need to use the text plugin tool to create word vectors (please tell me if I am wrong). It appears that the textinput operator expects an exampleset as it's input from a directory. My question is how to correctly load the textinput operator. Of couse I could be completely wrong...maybe there is a better way to do this?
Here is what I have
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\shilaski.TRICO29\Desktop\50percentWarranty\Conceptsheet_Claims_short.xls"/>
<parameter key="first_row_as_names" value="true"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="comments"/>
</operator>
<operator name="Nominal2String" class="Nominal2String">
</operator>
<operator name="TextInput" class="TextInput" expanded="yes">
<parameter key="create_text_visualizer" value="true"/>
<parameter key="id_attribute_type" value="long"/>
<parameter key="use_content_attributes" value="true"/>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
</operator>
</operator>
I have figured out how to load the sheet and run it thru the filter so I am just concentrating on data that I am interested in. Now I am guessing that I need to use the text plugin tool to create word vectors (please tell me if I am wrong). It appears that the textinput operator expects an exampleset as it's input from a directory. My question is how to correctly load the textinput operator. Of couse I could be completely wrong...maybe there is a better way to do this?
Here is what I have
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\shilaski.TRICO29\Desktop\50percentWarranty\Conceptsheet_Claims_short.xls"/>
<parameter key="first_row_as_names" value="true"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="comments"/>
</operator>
<operator name="Nominal2String" class="Nominal2String">
</operator>
<operator name="TextInput" class="TextInput" expanded="yes">
<parameter key="create_text_visualizer" value="true"/>
<parameter key="id_attribute_type" value="long"/>
<parameter key="use_content_attributes" value="true"/>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
</operator>
</operator>
Tagged:
0
Answers
-
Hi Stacy,
in principal you are right. You simply have to use the [tt]StringTextInput[/tt] operator instead of the [tt]TextInput[/tt]. The first one will load the texts from strings form an already present example set. The latter one will load the texts from files directly.
Hope that helps,
regards,
Tobias0 -
Alright...Here is where I am at..
<operator name="Root" class="Process" expanded="yes">
<description text="#ylt#h3#ygt#Finding important terms#ylt#/h3#ygt##ylt#p#ygt#This experiments shows how to find terms that are characteristic for a set of texts#ylt#/p#ygt#. #ylt#p#ygt##ylt#b#ygt#Hint:#ylt#/b#ygt#In the interactive keyword selection, click on weight to sort the terms by their relevance to the class specified in the CorpusBasedWeighting operator.#ylt#/p#ygt#"/>
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="datamanagement" value="long_array"/>
<parameter key="excel_file" value="C:\Documents and Settings\shilaski.TRICO29\Desktop\50percentWarranty\Conceptsheet_Claims_short.xls"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="comments"/>
</operator>
<operator name="Nominal2String" class="Nominal2String">
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<parameter key="default_content_language" value="english"/>
<parameter key="vector_creation" value="TermOccurrences"/>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="3"/>
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
</operator>
<operator name="CorpusBasedWeighting" class="CorpusBasedWeighting">
<parameter key="class_to_characterize" value="graphics"/>
</operator>
<operator name="InteractiveAttributeWeighting" class="InteractiveAttributeWeighting">
</operator>
</operator>
Problem now is that I keep on getting an error
Error in: StringTextInput (StringTextInput) The input example set does not contain any attributes with value type string. Some operators require example sets with attributes of a specific value type. Please refer to the documentation of the used operators for further details.0 -
figured it out. Somehow I missed called out the parameter for which column I wanted. Had it called out before, but I supposed I should have troubleshot it before posting to the forums.
Thanks0