Crawl web operator does not return any results

mertcatar
mertcatar New Altair Community Member
edited November 5 in Community Q&A
Hi, i have a problem with Crawl web operator that it doesn't return any result i tryied latest rapidminer and then set up 7.1 but result didn't change and empty result page even I didn't try https url i tried. Could you help me please, where am i making wrong here ?




Best Answer

  • kayman
    kayman New Altair Community Member
    Answer ✓
    You didn't apply any crawling rules so basically your operator is just doing nothing even if it reads the content. If you do not state which patterns to follow and which of these to store the system just crawls clueless. 

    Try the get page operator first and see if you get any result then. This loads the actual url and returns the data. This way you can already validate the connection. 

    Next define which pages to crawl and store (patterns) with the crawl web operator, or use the get pages and provide a list of url's.

    Typically that's a bit more trustworthy as webstructures can be pretty complex for 'blind' crawling, and quite some sites will just kick you of their server if you do this too obviously 

Answers

  • kayman
    kayman New Altair Community Member
    Answer ✓
    You didn't apply any crawling rules so basically your operator is just doing nothing even if it reads the content. If you do not state which patterns to follow and which of these to store the system just crawls clueless. 

    Try the get page operator first and see if you get any result then. This loads the actual url and returns the data. This way you can already validate the connection. 

    Next define which pages to crawl and store (patterns) with the crawl web operator, or use the get pages and provide a list of url's.

    Typically that's a bit more trustworthy as webstructures can be pretty complex for 'blind' crawling, and quite some sites will just kick you of their server if you do this too obviously