Hi i am new to Rapid Miner. I have a site i want to craw and extract/download pages. The pages i am interested in have a common URL (
http://items.mywebsite.ie/for-sale/laptops/3254621) . The starting URL i am using is the site search page containing the links to the relative pages (
http://items.mywebsite.ie/find/for-sale/laptops/My overall goal of this is to pull a list of say 20 pages in this relevant format. The number is the page id but it is not relevant to the laptop section, it is site wide.
I have tried several variations of the store_with_matching_url and Follow_link_with_matching_url in an attempt to follow links with the word laptop and then subsequently store the ones that have a 7 digit number at the end.
"
http://items.mywebsitel.ie\for-sale\laptops\.+[0-9]"
'
http://items.mywebsite.ie\for-sale\laptops\.+[0-9]'(^)
http://items.mywebsite.ie\for-sale\laptops\.+[0-9]($).+[0-9]
.+laptops.+
.+laptops.+|.+[0-9]
.[0-9][0-9][0-9][0-9][0-9][0-9][0-9]
Can anyone help me out of point me in the right direction?
Any help would be greatly appreciated, Thanks