"Saving specific features from a Model for Text Mining"

GeorgeDittmar
GeorgeDittmar New Altair Community Member
edited November 5 in Community Q&A
Hello,

I am using Rapidminer to build a text classification model using Naive Bayes. I have built the model fine and understand how to apply said model in RapidMiner, but I was wondering if there was anyway to save the features the Bayesian model extracts into say a database table or excel spreadsheet? I want to do this because I am planning on using the Bayes model to help select key terms for a model and then take these terms and help rank documents using a cosine similarity and weighting scheme, which I have already developed . I don't know if this is possible in RapidMiner, or if maybe RapidMiner has the cosine similarity feature already and I can just maybe use that instead somehow.

Any help would be much appreciated.

Thanks

Answers

  • haddock
    haddock New Altair Community Member
    Hi there George,

    And welcome! It sound like you're in a position to indulge in Groovy scripting; in general you can pick apart most inputs using this Java scripting operator, there are some examples on the Wiki, and even I've managed a demo at

    http://www.myexperiment.org/workflows/1299.html

    tip: download the source of the relevant model, so you know what is available.

  • GeorgeDittmar
    GeorgeDittmar New Altair Community Member
    So you think with the groovy scripting I should be able to pull the terms and their specific bayes scores? I will look into it thanks!
  • SebastianLoh
    SebastianLoh New Altair Community Member
    Hi GeorgeDittmar,

    if you do not want to enter the dark side of groovy scripting, you might consider a different learner that provides you the word weights. E.g the Support Vector Machine does that. Then you can transform the weights with the Weights to Data Operator and process them further.

    Ciao Sebastian
  • GeorgeDittmar
    GeorgeDittmar New Altair Community Member
    hmm I might have to suggest switching classifiers to the group, but we are trying to duplicate work I did last Winter because we switched everything over to the rapidminer framework while I was gone. Furiously trying to figure this framework out and get papers written for it. I cant seem to get the demo that haddock posted to work, I download the file but I cant seem to open it, maybe I am just missing something.
  • SebastianLoh
    SebastianLoh New Altair Community Member
    Hi George,

    read my footer and then search for the process haddock posted ("Association rules as examples") in the myexperiment extension.

    Ciao Sebastian
  • GeorgeDittmar
    GeorgeDittmar New Altair Community Member
    Here is a quick one I think. Is the Distribution table that is in the simple distribution tab results for Naive Bayes made on the fly by rapid miner or do you think its possible to pull that out with Groovy?
  • haddock
    haddock New Altair Community Member
    Hi there,

    Non-examplesets have their own renderers, so the answer could actually be yes and yes  ;) Anyways, here's a pointer..
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Root">
        <description>Using a simple Naive Bayes classifier.</description>
        <process expanded="true" height="362" width="547">
          <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="naive_bayes" expanded="true" height="76" name="NaiveBayes" width="90" x="179" y="30"/>
          <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="313" y="30"/>
          <operator activated="true" class="execute_script" expanded="true" height="76" name="Execute Script" width="90" x="447" y="30">
            <parameter key="script" value="&#13;import com.rapidminer.tools.Ontology;&#13;&#10;Model m = input[0];&#13;&#13;&#10;&#10;&#13;Attribute[] attributes= new Attribute[1];&#10;attributes[0] = AttributeFactory.createAttribute(&quot;String description&quot;, Ontology.STRING);&#13;&#10;MemoryExampleTable table = new MemoryExampleTable(attributes);&#10;DataRowFactory ROW_FACTORY = new DataRowFactory(0);&#13;&#10;String[] strings= new String[1];&#13;&#10;strings[0]=m.getDistribution(0,0).toString();&#13;&#10;DataRow row = ROW_FACTORY.create(strings, attributes); &#13;&#10;table.addDataRow(row);&#9;&#10;ExampleSet exampleSet = table.createExampleSet();&#10;return exampleSet;&#13;&#10;"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="NaiveBayes" to_port="training set"/>
          <connect from_op="NaiveBayes" from_port="model" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Execute Script" to_port="input 1"/>
          <connect from_op="Multiply" from_port="output 2" to_port="result 2"/>
          <connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    I couldn't bear the thought of leaving you nothing to do, so I've left the loops and labels for you to thrill over  ;D


  • GeorgeDittmar
    GeorgeDittmar New Altair Community Member
    thanks I will mess around with that. I finally got your demo to run and was able to do a little bit of scripting to pull some info with groovy so I have some where to start at least.