🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"[Solved]Crawling rules"

User: "ArnoG"
New Altair Community Member
Updated by Jocelyn
I'm trying to crawl a bookingsite for hotels. I want to crawl the reviews. For example the url: http://www.tripadvisor.nl/Hotel_Review-g188590-d2333086-Reviews-EasyHotel_Amsterdam-Amsterdam_North_Holland_Province.html#REVIEWS

I use Crawl web as a operater but I don't get output.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="web:crawl_web" compatibility="5.3.000" expanded="true" height="60" name="Crawl Web" width="90" x="112" y="75">
       <parameter key="url" value="http://www.tripadvisor.nl/Hotel_Review-g188590-d2333086-Reviews-EasyHotel_Amsterdam-Amsterdam_North_Holland_Province.html#REVIEWS"/>
       <list key="crawling_rules">
         <parameter key="store_with_matching_url" value=".+Reviews-EasyHotel_Amsterdam-Amsterdam_North_Holland.+"/>
         <parameter key="follow_link_with_matching_url" value=".+Reviews-or10.+"/>
       </list>
       <parameter key="output_dir" value="C:\Improve Your Business\Qing\Pilot\test\crawl"/>
       <parameter key="extension" value="html"/>
     </operator>
     <connect from_op="Crawl Web" from_port="Example Set" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>


Can anybody tell me what I,m doing wrong?

Thanxs, Arno

Find more posts tagged with