How to use Read RSS Feed

mendicott
mendicott New Altair Community Member
edited November 5 in Community Q&A
> http://www.meta-guide.com/home/knowledgebase/best-rapidminer-videos

I've made a webpage of 87x "Best RapidMiner Videos", above.  After a few *hours* of googling I could find no quickstart, tutorial or examples for using "Read RSS Feed"....  I want to process web feeds, primarily Twitter feeds.  I could also find no quickstart, tutorial or examples for using RapidMiner with Twitter.  The main thing I want to be able to do is filter tweets on non-identical similarity, presumably via classification.  I would also like to try using RapidMiner for link metrics.  I have searched this forum for "rss" and "twitter", without finding anything helpful to me.  Also, it is not clear to me how to access RapidMiner results programmatically like an API.  It seems that it would be much more accessible if RapidMiner were available in the cloud as a web service.  Ultimately, I hope to be able to use RapidMiner in conjunction with Yahoo! Pipes.  (And, I could find no quickstart, tutorial or examples for using RapidMiner with Yahoo! Pipes....)
Tagged:

Answers

  • haddock
    haddock New Altair Community Member
    I Googled "rapidminer Read RSS Feed" and got http://www.myexperiment.org/workflows/1465.html as the first link. It says, "This workflow connects RapidMiner to Twitter and downloads the timeline". Are we using the same Google?
  • mendicott
    mendicott New Altair Community Member
    I also got that, but could not get it to work....  There seems to be some kind of inconsistency or incompatibility between the "Read RSS Feed" and the "Process Documents from Data".  Can you reproduce it in a working version?  Can you post the steps you took to get it working?
  • tobyb
    tobyb New Altair Community Member
    I am having the same issue.  Has this been resolved?

    Thanks,
    Toby
  • Rene
    Rene New Altair Community Member
    This is the example that haddock cited -
    I just added my Twitter feed url and my
    user agent to "Read RSS Feed" and it works
    fine. (Rapid Miner 5.3.000):

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.000">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
       <process expanded="true" height="394" width="547">
         <operator activated="true" class="web:read_rss" compatibility="5.3.000" expanded="true" height="60" name="Read RSS Feed" width="90" x="45" y="75">
           <parameter key="url" value="https://api.twitter.com/1/statuses/user_timeline.rss?screen_name=lukeanker"/>
           <parameter key="user_agent" value="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0"/>
         </operator>
         <operator activated="true" class="text:process_document_from_data" compatibility="5.3.000" expanded="true" height="76" name="Process Documents from Data" width="90" x="179" y="75">
           <parameter key="prune_method" value="percentual"/>
           <parameter key="prunde_below_percent" value="5.0"/>
           <parameter key="prune_above_percent" value="80.0"/>
           <parameter key="prune_below_rank" value="5.0"/>
           <parameter key="prune_above_rank" value="5.0"/>
           <list key="specify_weights"/>
           <process expanded="true" height="374" width="547">
             <operator activated="true" class="text:tokenize" compatibility="5.3.000" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
             <operator activated="true" class="text:transform_cases" compatibility="5.3.000" expanded="true" height="60" name="Transform Cases" width="90" x="179" y="30"/>
             <operator activated="true" class="text:filter_by_length" compatibility="5.3.000" expanded="true" height="60" name="Filter Tokens (by Length)" width="90" x="313" y="30">
               <parameter key="min_chars" value="3"/>
             </operator>
             <operator activated="true" class="text:filter_stopwords_english" compatibility="5.3.000" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="447" y="30"/>
             <connect from_port="document" to_op="Tokenize" to_port="document"/>
             <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
             <connect from_op="Transform Cases" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
             <connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
             <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
             <portSpacing port="source_document" spacing="0"/>
             <portSpacing port="sink_document 1" spacing="0"/>
             <portSpacing port="sink_document 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="text:wordlist_to_data" compatibility="5.3.000" expanded="true" height="76" name="WordList to Data" width="90" x="313" y="75"/>
         <connect from_op="Read RSS Feed" from_port="output" to_op="Process Documents from Data" to_port="example set"/>
         <connect from_op="Process Documents from Data" from_port="word list" to_op="WordList to Data" to_port="word list"/>
         <connect from_op="WordList to Data" from_port="word list" to_port="result 1"/>
         <connect from_op="WordList to Data" from_port="example set" to_port="result 2"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>