[SOLVED] Crawl Web not producing any results!
stringer_bell
New Altair Community Member
Trying to crawl and save every boxscore from http://www.pro-football-reference.com/years/2007/games.htm
It produces no results. Process starts and finishes in 0s.
If anyone can help it would be appreciated. I have spent hours on this and cannot figure it out.
It produces no results. Process starts and finishes in 0s.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="190" width="279">
<operator activated="true" class="web:crawl_web" compatibility="5.2.003" expanded="true" height="60" name="Crawl Web" width="90" x="179" y="75">
<parameter key="url" value="http://www.pro-football-reference.com/years/2007/games.htm"/>
<list key="crawling_rules">
<parameter key="follow_link_with_matching_url" value=".*boxscores/2007.*"/>
<parameter key="store_with_matching_url" value=".*boxscores/2007.*"/>
</list>
<parameter key="output_dir" value="C:\Users\Stringer Bell\Desktop\scrape"/>
<parameter key="extension" value="html"/>
<parameter key="max_depth" value="3"/>
<parameter key="obey_robot_exclusion" value="false"/>
<parameter key="really_ignore_exclusion" value="true"/>
</operator>
<connect from_op="Crawl Web" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
If anyone can help it would be appreciated. I have spent hours on this and cannot figure it out.
Tagged:
0
Answers
-
Hi, you have to increase the max_page_size.
Best, Marius0 -
Thank you Marius!0
-
Hi Marius I am also facing the same issue I tried running the code shared by stranger bell but no luck
0