Hi,
I'm just getting started with data mining and am having trouble with the URLs
I'm trying to scrape google shopping for prices. The URL I have is:
http://www.google.com/products/catalog?q=036725235182&;hl=en&cid=6520112632679181641&os=sellers
In the crawl rules I have:
store_with_matching_url = .+seller.+
follow_links_with_matching_url = .+start.+
I have two problems. 1) the first page does not store. I get an error saying the url does not have the filter results in it and 2) it does not follow the link.
I'm not sure how to fix this.
Also, is there a way to pull the ?q=036725235182 from my database? The number is the UPC of my products. Ultimately I would like to query 10,000+ records and crawl all upc's in the database one at a time. If any one knows of some examples to get my project off the ground I would be much appreciated.
Thanks in advance,
Chuck