[SOLVED] Crawl Web not producing any results!

stringer_bell
stringer_bell New Altair Community Member
edited November 5 in Community Q&A
Trying to crawl and save every boxscore from http://www.pro-football-reference.com/years/2007/games.htm

It produces no results. Process starts and finishes in 0s.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
   <process expanded="true" height="190" width="279">
     <operator activated="true" class="web:crawl_web" compatibility="5.2.003" expanded="true" height="60" name="Crawl Web" width="90" x="179" y="75">
       <parameter key="url" value="http://www.pro-football-reference.com/years/2007/games.htm"/>
       <list key="crawling_rules">
         <parameter key="follow_link_with_matching_url" value=".*boxscores/2007.*"/>
         <parameter key="store_with_matching_url" value=".*boxscores/2007.*"/>
       </list>
       <parameter key="output_dir" value="C:\Users\Stringer Bell\Desktop\scrape"/>
       <parameter key="extension" value="html"/>
       <parameter key="max_depth" value="3"/>
       <parameter key="obey_robot_exclusion" value="false"/>
       <parameter key="really_ignore_exclusion" value="true"/>
     </operator>
     <connect from_op="Crawl Web" from_port="Example Set" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>

If anyone can help it would be appreciated. I have spent hours on this and cannot figure it out.
Tagged:

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    Hi, you have to increase the max_page_size.

    Best, Marius
  • stringer_bell
    stringer_bell New Altair Community Member
    Thank you Marius!  :)
  • Soumitra
    Soumitra New Altair Community Member

    Hi Marius I am also facing the same issue I tried running the code shared by stranger bell but no luck