🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Regarding text mining"

User: "ratheesan"
New Altair Community Member
Updated by Jocelyn
Hi,

Which Text Mining Operator can we use to extract combination of words or pattern of words in RM.
I have used string tokenizer,stopwordfilter and  Token length filter.and find out TFIDF,Term Frequency e.t.c.
Can anybody suggest a specific algorithm for solving the problem.
Thanks
Ratheesan

Find more posts tagged with

Sort by:
1 - 6 of 61
    User: "land"
    New Altair Community Member
    Hi,
    you could use BinaryOccurrences instead of TFIDF and then convert the numerical 0's and 1's to binominal values in order to apply FP-Growth. You will get FrequentItemSets containing the words occurring together in documents. Using the support threshold you can control how frequent they have to occur together.

    Greetings,
      Sebastian
    User: "ratheesan"
    New Altair Community Member
    OP
    Thanks Sebastain for your valued help and advice.I worked with the text like you mentioned.But I am getting an error message "Process failed.StackOverfloeError caught null".Here I am attaching the xml.

    <operator name="Root" class="Process" expanded="yes">
        <operator name="TextInput" class="TextInput" expanded="yes">
            <list key="texts">
              <parameter key="b" value="C:\Documents and Settings\ADMIN\Desktop\b"/>
            </list>
            <parameter key="vector_creation" value="BinaryOccurrences"/>
            <list key="namespaces">
            </list>
            <operator name="StringTokenizer" class="StringTokenizer">
            </operator>
            <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
            </operator>
            <operator name="TokenLengthFilter" class="TokenLengthFilter">
            </operator>
        </operator>
        <operator name="Numerical2Binominal" class="Numerical2Binominal">
            <parameter key="min" value="2.0"/>
            <parameter key="max" value="30000.0"/>
        </operator>
        <operator name="FPGrowth" class="FPGrowth">
            <parameter key="keep_example_set" value="true"/>
            <parameter key="min_number_of_itemsets" value="5"/>
        </operator>
    </operator>

    How can I overcome this problem.

    Thanks
    Ratheesan
    User: "land"
    New Altair Community Member
    Hi,
    if you put a break point after the Numerical2Binominal operator, does the program reaches it?
    If yes, I guess, the problem is the really memory consuming FP-Growth operator. The memory consumption depends heavily on the support level and you might increase it in order to get the things done. Of course you will receive less rules, because only rules with a higher support will be included at all.
    Please take a look at the memory monitor, to check that you have assigned RapidMiner enough maim memory. It usually uses up to 80% of the RAM.

    Greetings,
      Sebastian
    User: "ratheesan"
    New Altair Community Member
    OP
    Hi,
    I applied decision tree in a text data.But not getting a proper result.Here I am attaching the process,Can you suggest me how to proceed this code.If my way is not correct ,could you please suggest an alternative.

    <operator name="Root" class="Process" expanded="yes">
        <operator name="TextInput" class="TextInput" expanded="yes">
            <list key="texts">
              <parameter key="mydata" value="C:\Documents and Settings\ADMIN\Desktop\summary"/>
            </list>
            <list key="namespaces">
            </list>
            <operator name="StringTokenizer" class="StringTokenizer">
            </operator>
            <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
            </operator>
            <operator name="TokenLengthFilter" class="TokenLengthFilter">
            </operator>
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="claimant"/>
            <parameter key="target_role" value="label"/>
        </operator>
        <operator name="DecisionTree" class="DecisionTree">
        </operator>
    </operator>

    Thanks
    Ratheesan
    User: "sudheendra"
    New Altair Community Member
    Hai Sebastain,

    I am also getting the same memory problem. I am using Windows OS with 3GB Ram. Is it quite sufficient to work. Please suggest

    Thanks,
    Sudheendra
    User: "land"
    New Altair Community Member
    Hi,
    TextMining usually incorporates a great number of attributes. A decision tree might become veeery large, if the data is difficult to split. You probably would gain a much better classification performance if you would use a linear SVM. But if your goal is an understandable model, you will have to stick with the tree, but you should limit its maximal depth to avoid the out of memory problem. Otherwise it wouldn't help the user anyway, because a tree with depth 10 would have 2047 nodes and already loses a lot of it's understandability :)

    Greetings,
      Sebastian