Read the whole posts from an RSS Feed with RM
dranammari
New Altair Community Member
Hi all,
I want to use the Web Mining operator "Read RSS Feed" to read ALL the posts in a forum (Search Engine Optimization forum: http://www.webproworld.com/webmaster-forum/forums/9-Search-Engine-Optimization-Forum). The feed url of this forum is given from the site itself as: http://www.webproworld.com/webmaster-forum/external.php?type=RSS2&;forumids=9
However, when I use this URL in the "Read RSS Feed" operator to get the posts, I face two problems:
1) I do NOT get all the posts that exist in the forum. I get only 15 posts (threads) though the forum contains much more posts than that. How can I get all the posts? Does RapidMiner support feed pagination for example? and how to do that?
2) I do NOT get all the text of the first post in the content attribute of the generated dataset by RapiMiner. I only get part of this text but not the whole textual content. How can I use the operator to get the whole text in the content attribute?
Many thanks in advance!
p.s The problem is not in the Feed url I provided as it occurs in all the feed urls I tried so far.
Ahmad
I want to use the Web Mining operator "Read RSS Feed" to read ALL the posts in a forum (Search Engine Optimization forum: http://www.webproworld.com/webmaster-forum/forums/9-Search-Engine-Optimization-Forum). The feed url of this forum is given from the site itself as: http://www.webproworld.com/webmaster-forum/external.php?type=RSS2&;forumids=9
However, when I use this URL in the "Read RSS Feed" operator to get the posts, I face two problems:
1) I do NOT get all the posts that exist in the forum. I get only 15 posts (threads) though the forum contains much more posts than that. How can I get all the posts? Does RapidMiner support feed pagination for example? and how to do that?
2) I do NOT get all the text of the first post in the content attribute of the generated dataset by RapiMiner. I only get part of this text but not the whole textual content. How can I use the operator to get the whole text in the content attribute?
Many thanks in advance!
p.s The problem is not in the Feed url I provided as it occurs in all the feed urls I tried so far.
Ahmad
Tagged:
0
Answers
-
Hi,
I'm using the operator "Read RSS Feed" too and i had the same problem. I get only 10 posts from Google News.
http://www.google.com.br/search?hl=en&gl=us&tbm=nws&btnmeta_news_search=1&q=aviation&oq=aviation&aq=f&aqi=d1g7d-o1&aql=&gs_sm=e&gs_upl=4108l5389l0l6342l8l6l0l1l1l0l343l562l2-1.1l2l0
When i use URL of RSS, only posts on page 1 are read.
how can i get more than 10??
0 -
I've also got a similar problem using the same operator. Got a RSS feed with about 150-200 entries, but will only pull the first 100...repetitively.Also, when I expand the batch size parameter, it seems to use all memory (~1.5g) in my machine, and then hang. I don't know if this has been answered elsewhere, but I may post again.0
-
See earlier SOLVED post...my issue, at least, if with the limitations of Yahoo Pipes, not the RSS Feed Read operator of DM, or database connectivity.0