"Read CSV then read RSS feed using each row in the csv file"
montaqi
New Altair Community Member
I am currently working on a project that I want to read rss feeds from a list of rss urls. I built the following process, but somehow it has error and cannot go through. Please help me...as I think it should be a simple problem, but I just can't figure out somehow...
the csv file only contains five rows:
News
http://feeds.bbci.co.uk/news/rss.xml
http://feeds.bbci.co.uk/news/world/rss.xml
http://feeds.bbci.co.uk/news/uk/rss.xml
http://feeds.bbci.co.uk/news/business/rss.xml
My XML looks like below:
<process version="5.1.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
<process expanded="true" height="449" width="614">
<operator activated="true" class="read_csv" compatibility="5.1.006" expanded="true" height="60" name="Read CSV" width="90" x="45" y="120">
<parameter key="csv_file" value="C:\Documents and Settings\TU001YU\Desktop\RSSLoop.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="skip_comments" value="true"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="att1.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="loop_values" compatibility="5.1.006" expanded="true" height="94" name="Loop Values" width="90" x="246" y="120">
<parameter key="attribute" value="att1"/>
<process expanded="true" height="524" width="806">
<operator activated="true" class="web:read_rss" compatibility="5.1.000" expanded="true" height="60" name="Read RSS Feed" width="90" x="120" y="32">
<parameter key="url" value="%{loop_value}"/>
</operator>
<connect from_op="Read RSS Feed" from_port="output" to_port="out 1"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
</process>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Loop Values" to_port="example set"/>
<connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
<connect from_op="Loop Values" from_port="out 2" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
the csv file only contains five rows:
News
http://feeds.bbci.co.uk/news/rss.xml
http://feeds.bbci.co.uk/news/world/rss.xml
http://feeds.bbci.co.uk/news/uk/rss.xml
http://feeds.bbci.co.uk/news/business/rss.xml
My XML looks like below:
<process version="5.1.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
<process expanded="true" height="449" width="614">
<operator activated="true" class="read_csv" compatibility="5.1.006" expanded="true" height="60" name="Read CSV" width="90" x="45" y="120">
<parameter key="csv_file" value="C:\Documents and Settings\TU001YU\Desktop\RSSLoop.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="skip_comments" value="true"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="att1.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="loop_values" compatibility="5.1.006" expanded="true" height="94" name="Loop Values" width="90" x="246" y="120">
<parameter key="attribute" value="att1"/>
<process expanded="true" height="524" width="806">
<operator activated="true" class="web:read_rss" compatibility="5.1.000" expanded="true" height="60" name="Read RSS Feed" width="90" x="120" y="32">
<parameter key="url" value="%{loop_value}"/>
</operator>
<connect from_op="Read RSS Feed" from_port="output" to_port="out 1"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
</process>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Loop Values" to_port="example set"/>
<connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
<connect from_op="Loop Values" from_port="out 2" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
0
Answers
-
Hi montaqi,
you have to make sure that the example set after "Read CSV" contains only valid URLs. In your case the title of the column (News) might be contained in the data. If you use the import wizard of the "Read CSV" operator you can set this as row title.
But even after changing this, the process did not run for me either. I never used it before, but the "Read RSS Feed" operator does not seem to work. Even in a process with a single operator of this type the error messageJun 14, 2011 9:12:47 AM SEVERE: Process failed: Could not initialize class com.sun.syndication.feed.synd.SyndFeedImpl
is generated.
Regards
Matthias0