Web Mining, Crawl Web crawling rules...please explain?
I used RapidMiner in my MBA program and it's been almost three years since I last touched it. I just started a position where I'll be using it again and I'm a bit rusty. I'm trying to scrape a site for some data (names, phone numbers, addresses, etc.) and put them into an excel file, however I'm not able to figure out the parameters. I think my main issue is understanding what the crawling rules are. What do they mean? Which should I be applying? I've Googled this and searched here, but I only get instructions specific to other users' questions. Can anyone provide a definition of what these are and what they mean/do?