🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Text Classification using Text Plugin

User: "pser"
New Altair Community Member
Updated by Jocelyn

Hi,

I am trying to classify texts stored in a database. I'd like to describe some of the problems I experienced and questions that came up. Since they adress different topics I decided to split the post into three parts. In this one I ask for your opinion: How would you design an experiment for text classification with RapidMiner? If anyone has built a similar experiment I would be very grateful if he could describe the setup he used.

The setup I have in mind at the moment is something like this:

<operator name="Root" class="Process" expanded="yes">
    <operator name="DatabaseExampleSource" class="DatabaseExampleSource">
        <parameter key="database_url" value="www.example.net"/>
        <parameter key="username" value="example"/>
    </operator>
    <operator name="StringTextInput" class="StringTextInput" expanded="no">
        <parameter key="default_content_encoding" value="UTF-8"/>
        <parameter key="default_content_type" value="html"/>
        <parameter key="filter_nominal_attributes" value="true"/>
        <parameter key="input_word_list" value="example.wordlist"/>
        <list key="namespaces">
        </list>
        <parameter key="prune_above" value="5%"/>
        <parameter key="prune_below" value="3"/>
        <parameter key="remove_original_attributes" value="true"/>
        <operator name="StringTokenizer" class="StringTokenizer">
        </operator>
        <operator name="GermanStopwordFilter" class="GermanStopwordFilter">
        </operator>
        <operator name="TokenLengthFilter" class="TokenLengthFilter">
            <parameter key="max_chars" value="25"/>
            <parameter key="min_chars" value="3"/>
        </operator>
        <operator name="GermanStemmer" class="GermanStemmer">
        </operator>
    </operator>
    <operator name="XValidation" class="XValidation" expanded="yes">
        <parameter key="create_complete_model" value="true"/>
        <parameter key="number_of_validations" value="5"/>
        <operator name="W-NaiveBayesMultinomialUpdateable" class="W-NaiveBayesMultinomialUpdateable">
        </operator>
        <operator name="Testing" class="OperatorChain" expanded="yes">
            <operator name="ModelApplier" class="ModelApplier">
                <list key="application_parameters">
                </list>
                <parameter key="keep_model" value="true"/>
            </operator>
            <operator name="ClassificationPerformance" class="ClassificationPerformance">
                <list key="class_weights">
                </list>
                <parameter key="classification_error" value="true"/>
                <parameter key="correlation" value="true"/>
                <parameter key="keep_example_set" value="true"/>
            </operator>
        </operator>
    </operator>
</operator>

This is just the part for learning the model. Of course normally a part where the model is applied to unlabeled data would follow. Later on I'd like to create the wordlist from the database entries (at the moment I work with a given wordlist) and use the UpdateModel operator to update the model incrementally with new labeled data. More about this in my other posts in "Problems and Support".

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "Legacy User"
    New Altair Community Member
    Have you experimented with the examples for the Text plug-in?