"Error messages using stringtextinput"
Here is my code:
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="C:\Documents and Settings\rkenney\My Documents\rm_workspace\Comments09.aml"/>
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<parameter key="remove_original_attributes" value="true"/>
<parameter key="default_content_language" value="english"/>
<list key="namespaces">
</list>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="3"/>
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
</operator>
<operator name="SVDReduction" class="SVDReduction">
<parameter key="keep_example_set" value="true"/>
<parameter key="return_preprocessing_model" value="true"/>
<parameter key="dimensions" value="15"/>
</operator>
<operator name="EMClustering" class="EMClustering">
<parameter key="k" value="5"/>
</operator>
<operator name="ExcelExampleSetWriter" class="ExcelExampleSetWriter">
<parameter key="excel_file" value="C:\Projects\Memb Sat Survey\2009\Data\RapidMinerOutput\RMClusters.xls"/>
</operator>
</operator>
While running the stringtextinput operators I get an error message for each one of my text documents. For example this one: P Jul 14, 2009 3:03:44 PM: [Warning] StringTextInput: File C:\Program Files\Rapid-I\RapidMiner\RE BILLING; I GET THE FORM SHOWING WHAT CC HAS PAID, BUT UNDER PATIENT RESPONSIBILITY, IT SHOWS 0.00 WHICH ISN'T TRUE. I DON'T RECALL EVER SEEING ONE THAT HAD A FIGURE. not found. Assuming the text is directly encoded as document source...
My input file has about 300 examples with three columns. Col 1 is comments from surveys set as a string variable; col 2 is the member number set as an ID variable, which I use to attach demograhic data, and col 3 is a grouping variable set as a label. The part in the error message in all caps is the actual text that I want to analyze. The output file looks fine, but I'm worried about the warnings. What do you think?
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="C:\Documents and Settings\rkenney\My Documents\rm_workspace\Comments09.aml"/>
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<parameter key="remove_original_attributes" value="true"/>
<parameter key="default_content_language" value="english"/>
<list key="namespaces">
</list>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="3"/>
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
</operator>
<operator name="SVDReduction" class="SVDReduction">
<parameter key="keep_example_set" value="true"/>
<parameter key="return_preprocessing_model" value="true"/>
<parameter key="dimensions" value="15"/>
</operator>
<operator name="EMClustering" class="EMClustering">
<parameter key="k" value="5"/>
</operator>
<operator name="ExcelExampleSetWriter" class="ExcelExampleSetWriter">
<parameter key="excel_file" value="C:\Projects\Memb Sat Survey\2009\Data\RapidMinerOutput\RMClusters.xls"/>
</operator>
</operator>
While running the stringtextinput operators I get an error message for each one of my text documents. For example this one: P Jul 14, 2009 3:03:44 PM: [Warning] StringTextInput: File C:\Program Files\Rapid-I\RapidMiner\RE BILLING; I GET THE FORM SHOWING WHAT CC HAS PAID, BUT UNDER PATIENT RESPONSIBILITY, IT SHOWS 0.00 WHICH ISN'T TRUE. I DON'T RECALL EVER SEEING ONE THAT HAD A FIGURE. not found. Assuming the text is directly encoded as document source...
My input file has about 300 examples with three columns. Col 1 is comments from surveys set as a string variable; col 2 is the member number set as an ID variable, which I use to attach demograhic data, and col 3 is a grouping variable set as a label. The part in the error message in all caps is the actual text that I want to analyze. The output file looks fine, but I'm worried about the warnings. What do you think?