SOLVED_RSS feeds & MySQL- 100 Records Only!

dudester
dudester New Altair Community Member
edited November 2024 in Community Q&A
I'll try to be brief:  basically I have an issue trying to scrape complete RSS feeds into a MySQL database.  Largely it works OK; for some reason that I can't decipher, it will only read 100 entries into MySQL, and lately has been freezing my computer, likely due to memory constraints.  (I speculate that this may due to recent extension additions -Image Processing, IDA?).  
Anyway, according to the log, the RSS feed is pulled is less than 5 seconds, then it hangs while it tries to display results.  The system monitor shows available memory down to zip.  I believe I have the MySQL settings correct; the data example set in Rapid Miner never pulls more than 100 entries at a time, even while I've got the batch size at 10,000.  I need another pair of eyes...

So, the code below for the process:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.003">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
   <parameter key="logverbosity" value="all"/>
   <process expanded="true" height="466" width="797">
     <operator activated="true" class="web:read_rss" compatibility="5.2.000" expanded="true" height="60" name="Read RSS Feed" width="90" x="45" y="30">
       <parameter key="url" value="http://some random feed=rss"/>
       <parameter key="random_user_agent" value="true"/>
       <parameter key="connection_timeout" value="100000"/>
       <parameter key="read_timeout" value="100000"/>
     </operator>
     <operator activated="true" class="write_database" compatibility="5.2.003" expanded="true" height="60" name="Write Database" width="90" x="246" y="75">
       <parameter key="connection" value="dbconnectionvalue"/>
       <parameter key="use_default_schema" value="false"/>
       <parameter key="schema_name" value="schema1"/>
       <parameter key="table_name" value="tablename1"/>
       <parameter key="overwrite_mode" value="append"/>
       <parameter key="batch_size" value="10000"/>
     </operator>
     <connect from_op="Read RSS Feed" from_port="output" to_op="Write Database" to_port="input"/>
     <connect from_op="Write Database" from_port="through" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>

Why the magic number of 100 feeds only pulled?  I don't see it, either here or in Rapid Miner preferences.
Tagged:

Answers

  • dudester
    dudester New Altair Community Member
    Oops, my bad...nothing to do with either Rapid Miner or MySQL.

    Apparently Yahoo Pipes limits the amount of data you can scrape at a time to 100 items.  There is kind of a workaround but best to either use another online mashup, or perhaps a desktop variety for later input into DM.

    From http://pipes.yqlblog.net/.

    RSS pagination.
    "Initial RSS output is now limited to the first 100 items. Each paginated page is limited to 100 items as well. To access each subsequent page add parameter &page=2…etc. to the pipe.run url to retrieve more items." 

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.