I have list of urls and data should be crawl only from that urls using xpath

Question

Dear Team,

I am very much confused and stuck.

I have 1000 urls and i need to extract data from this 1000 urls.

I have stored 1000 urls in csv.

I also seen tutorial from http://vancouverdata.blogspot.com/2011/04/rapidminer-web-crawling-rapid-miner-web.html and http://vancouverdata.blogspot.com/2011/04/web-scraping-rapidminer-xpath-web.html. It is excellent but i am not sure where i am lost to understand.

I have enable all extensions.

Do we have one video tutorial which explains process of import url and getting data.

I must learn about this and i am very much interested. please guide me.

I have been trying this from past 2days but i am missing.

alphabeto · Answer

Hi,
Can rapid miner do a automated regular research (say daily) of a list of words in a list of url, and get each page link?
I have a list of  words and I want to regularly get every web link where any of these words appears in any of the web url from my predefined urls list.

Eg. wordlist : qwe, rty
url list: www.asd.com, www.zxc.com

What is the process path in order to get daily and automated each web link where words "qwe" and/or "rty" apear in the www.asd.com and/or www.zxc.com

Many thanks
Dan

MariusHelf · Answer

Hi,

I am not sure where exactly you got stuck, but if your problem is to access the urls stored in your file at first place, the Get Pages operator is for you. Just load your csv file containing the urls, then pass that data to get pages and specify in the link_attribute parameter which column contains the urls.

Best regards,
Marius