"newbie: Excel to text"
Here is my project scope. I have an excel spreadsheet of warranty claims with around 9100 entries. One of the columns within the spreadsheet contains a comment section. This section is where a tech will write what was wrong with the vehicle. These sections are what I want to text mine.
I have figured out how to load the sheet and run it thru the filter so I am just concentrating on data that I am interested in. Now I am guessing that I need to use the text plugin tool to create word vectors (please tell me if I am wrong). It appears that the textinput operator expects an exampleset as it's input from a directory. My question is how to correctly load the textinput operator. Of couse I could be completely wrong...maybe there is a better way to do this?
Here is what I have
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\shilaski.TRICO29\Desktop\50percentWarranty\Conceptsheet_Claims_short.xls"/>
<parameter key="first_row_as_names" value="true"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="comments"/>
</operator>
<operator name="Nominal2String" class="Nominal2String">
</operator>
<operator name="TextInput" class="TextInput" expanded="yes">
<parameter key="create_text_visualizer" value="true"/>
<parameter key="id_attribute_type" value="long"/>
<parameter key="use_content_attributes" value="true"/>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
</operator>
</operator>
I have figured out how to load the sheet and run it thru the filter so I am just concentrating on data that I am interested in. Now I am guessing that I need to use the text plugin tool to create word vectors (please tell me if I am wrong). It appears that the textinput operator expects an exampleset as it's input from a directory. My question is how to correctly load the textinput operator. Of couse I could be completely wrong...maybe there is a better way to do this?
Here is what I have
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\shilaski.TRICO29\Desktop\50percentWarranty\Conceptsheet_Claims_short.xls"/>
<parameter key="first_row_as_names" value="true"/>
</operator>
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="comments"/>
</operator>
<operator name="Nominal2String" class="Nominal2String">
</operator>
<operator name="TextInput" class="TextInput" expanded="yes">
<parameter key="create_text_visualizer" value="true"/>
<parameter key="id_attribute_type" value="long"/>
<parameter key="use_content_attributes" value="true"/>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
</operator>
</operator>