🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Web mining - crawling rules"

gingernissanUser: "gingernissan"
New Altair Community Member
Updated by Jocelyn
Hi i am new to Rapid Miner. I have a site i want to craw and extract/download pages. The pages i am interested in have a common URL (http://items.mywebsite.ie/for-sale/laptops/3254621) . The starting URL i am using is the site search page containing the links to the relative pages (http://items.mywebsite.ie/find/for-sale/laptops/
My overall goal of this is to pull a list of say 20 pages in this relevant format. The number is the page id but it is not relevant to the laptop section, it is site wide.

I have tried several variations of the store_with_matching_url and Follow_link_with_matching_url in an attempt to follow links with the word laptop and then subsequently store the ones that have a 7 digit number at the end.

"http://items.mywebsitel.ie\for-sale\laptops\.+[0-9]"
'http://items.mywebsite.ie\for-sale\laptops\.+[0-9]'
(^)http://items.mywebsite.ie\for-sale\laptops\.+[0-9]($)
.+[0-9]
.+laptops.+
.+laptops.+|.+[0-9]
.[0-9][0-9][0-9][0-9][0-9][0-9][0-9]

Can anyone help me out of point me in the right direction?

Any help would be greatly appreciated, Thanks

Find more posts tagged with

Sort by:
1 - 1 of 11