Classify text stored in SQL records
Original message from SourceForge forum at http://sourceforge.net/forum/forum.php?thread_id=2044574&;forum_id=390413
HI
Congratulations on releasing RapidMiner 4.1!!!
I am trying to classify text stored in SQL records. Each text is an email transaction I want to cluster (no supervised learning yet). I have not found an easy method to get the text into the classifier.
1- TextInput uses directories as document source, and not individual records or files.
2- DatabaseExampleSource -> StringTextInput does not provide a way to specify which field is text. I select FilterNominalAttributes but I still have an error message about stream input.
3- Operator->Preprocessing->Attributes does not have a nominal to string converter. There is ChangeAttributeType operator, but it seems to be the same as ChangeAttributeRole. (Shouldn't Type be string, boolean, integer, etc.?)
Is it possible to move the text fields directly into the text classifier, or do I have to export and transform them to a RM format (aml/dat) or import?
<operator name="Root" class="Process" expanded="yes">
<description text="#ylt#h3#ygt#Specifying texts by an example set#ylt#/h3#ygt##ylt#p#ygt#Using the parameter list or the wizard are simple methods for setting up the directories from which the text documents are read. Sometimes, however, a more flexible solution is needed. If, for instance, your text documents have different types of encoding or are written in different languages, you might wish to provide this information for each input directory separately.#ylt#/p#ygt# #ylt#p#ygt#You can do this by using an example set that contains one row for each input directory and corresponding attributes for source, encoding, type and class. If such an example set is provided, the texts in the parameter list are ignored.#ylt#/p#ygt#"/>
<operator name="DatabaseExampleSource" class="DatabaseExampleSource">
<parameter key="database_system" value="Microsoft SQL Server (JTDS)"/>
<parameter key="database_url" value="jdbc:jtds:sqlserver://localhost:1433/Rapid"/>
<parameter key="id_attribute" value="RecIDNbr"/>
<parameter key="password" value="xxx"/>
<parameter key="query" value="SELECT [RecIDNbr], [Service] FROM [CustomerHist]"/>
<parameter key="username" value="sa"/>
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<parameter key="filter_nominal_attributes" value="true"/>
<list key="namespaces">
</list>
</operator>
</operator>
Error message:
Error in: StringTextInput (StringTextInput) The input example set does not contain any attribues with value type string. Some operators require example sets with attributes of a specific value type. Please refer to the documentation of the used operators for further details.
The Input Example Set does not contain any attributes with value type string.
Thank you.
Answer by Ingo:
Hello,
could you please provide us a screenshot of the result right (meta data view and data view) after loading it in from your database, i.e. right after
<operator name="DatabaseExampleSource" class="DatabaseExampleSource">
<parameter key="database_system" value="Microsoft SQL Server (JTDS)"/>
<parameter key="database_url" value="jdbc:jtds:sqlserver://localhost:1433/Rapid"/>
<parameter key="id_attribute" value="RecIDNbr"/>
<parameter key="password" value="xxx"/>
<parameter key="query" value="SELECT [RecIDNbr], [Service] FROM [CustomerHist]"/>
<parameter key="username" value="sa"/>
</operator>
You could for example upload the screenshots here:
http://tinypic.com/
and post the links. Please considering blacking out sensible contents if necessary. I just have to be sure that everything is fine for using the StringTextInput operator.
Thanks in advance. Cheers,
Ingo
HI
Congratulations on releasing RapidMiner 4.1!!!
I am trying to classify text stored in SQL records. Each text is an email transaction I want to cluster (no supervised learning yet). I have not found an easy method to get the text into the classifier.
1- TextInput uses directories as document source, and not individual records or files.
2- DatabaseExampleSource -> StringTextInput does not provide a way to specify which field is text. I select FilterNominalAttributes but I still have an error message about stream input.
3- Operator->Preprocessing->Attributes does not have a nominal to string converter. There is ChangeAttributeType operator, but it seems to be the same as ChangeAttributeRole. (Shouldn't Type be string, boolean, integer, etc.?)
Is it possible to move the text fields directly into the text classifier, or do I have to export and transform them to a RM format (aml/dat) or import?
<operator name="Root" class="Process" expanded="yes">
<description text="#ylt#h3#ygt#Specifying texts by an example set#ylt#/h3#ygt##ylt#p#ygt#Using the parameter list or the wizard are simple methods for setting up the directories from which the text documents are read. Sometimes, however, a more flexible solution is needed. If, for instance, your text documents have different types of encoding or are written in different languages, you might wish to provide this information for each input directory separately.#ylt#/p#ygt# #ylt#p#ygt#You can do this by using an example set that contains one row for each input directory and corresponding attributes for source, encoding, type and class. If such an example set is provided, the texts in the parameter list are ignored.#ylt#/p#ygt#"/>
<operator name="DatabaseExampleSource" class="DatabaseExampleSource">
<parameter key="database_system" value="Microsoft SQL Server (JTDS)"/>
<parameter key="database_url" value="jdbc:jtds:sqlserver://localhost:1433/Rapid"/>
<parameter key="id_attribute" value="RecIDNbr"/>
<parameter key="password" value="xxx"/>
<parameter key="query" value="SELECT [RecIDNbr], [Service] FROM [CustomerHist]"/>
<parameter key="username" value="sa"/>
</operator>
<operator name="StringTextInput" class="StringTextInput" expanded="yes">
<parameter key="filter_nominal_attributes" value="true"/>
<list key="namespaces">
</list>
</operator>
</operator>
Error message:
Error in: StringTextInput (StringTextInput) The input example set does not contain any attribues with value type string. Some operators require example sets with attributes of a specific value type. Please refer to the documentation of the used operators for further details.
The Input Example Set does not contain any attributes with value type string.
Thank you.
Answer by Ingo:
Hello,
could you please provide us a screenshot of the result right (meta data view and data view) after loading it in from your database, i.e. right after
<operator name="DatabaseExampleSource" class="DatabaseExampleSource">
<parameter key="database_system" value="Microsoft SQL Server (JTDS)"/>
<parameter key="database_url" value="jdbc:jtds:sqlserver://localhost:1433/Rapid"/>
<parameter key="id_attribute" value="RecIDNbr"/>
<parameter key="password" value="xxx"/>
<parameter key="query" value="SELECT [RecIDNbr], [Service] FROM [CustomerHist]"/>
<parameter key="username" value="sa"/>
</operator>
You could for example upload the screenshots here:
http://tinypic.com/
and post the links. Please considering blacking out sensible contents if necessary. I just have to be sure that everything is fine for using the StringTextInput operator.
Thanks in advance. Cheers,
Ingo