Web crawling on google page
Juju147
New Altair Community Member
Hi everyone,
I have a question about the operator web crawling.
I am trying to use it on a google research page but unfortunatly, I cannot reach the link provide by the research.
For example, my google page is : https://www.google.fr/search?q=F&oq=f&aqs=chrome.4.69i60l3j69i59l3.2352j0j8&sourceid=chrome&espv=210&es_sm=93&ie=UTF-8
This my process :
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="web:crawl_web" compatibility="5.3.001" expanded="true" height="60" name="Crawl Web" width="90" x="45" y="30">
<parameter key="url" value="https://www.google.fr/search?q=F&oq=f&aqs=chrome.4.69i60l3j69i59l3.2352j0j8&sourceid=chrome&espv=210&es_sm=93&ie=UTF-8"/>
<list key="crawling_rules">
<parameter key="store_with_matching_url" value=".+facebook.+"/>
<parameter key="follow_link_with_matching_url" value=".+facebook.+"/>
</list>
<parameter key="output_dir" value="C:\Users\Julien\Desktop\S5\WEBMINING"/>
<parameter key="max_pages" value="100"/>
<parameter key="max_depth" value="1"/>
<parameter key="delay" value="500"/>
<parameter key="max_threads" value="10"/>
</operator>
<connect from_op="Crawl Web" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I am trying to reach the facebook link but it doesn't work.
Can you help me ?
Sincerly,
Ju
I have a question about the operator web crawling.
I am trying to use it on a google research page but unfortunatly, I cannot reach the link provide by the research.
For example, my google page is : https://www.google.fr/search?q=F&oq=f&aqs=chrome.4.69i60l3j69i59l3.2352j0j8&sourceid=chrome&espv=210&es_sm=93&ie=UTF-8
This my process :
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="web:crawl_web" compatibility="5.3.001" expanded="true" height="60" name="Crawl Web" width="90" x="45" y="30">
<parameter key="url" value="https://www.google.fr/search?q=F&oq=f&aqs=chrome.4.69i60l3j69i59l3.2352j0j8&sourceid=chrome&espv=210&es_sm=93&ie=UTF-8"/>
<list key="crawling_rules">
<parameter key="store_with_matching_url" value=".+facebook.+"/>
<parameter key="follow_link_with_matching_url" value=".+facebook.+"/>
</list>
<parameter key="output_dir" value="C:\Users\Julien\Desktop\S5\WEBMINING"/>
<parameter key="max_pages" value="100"/>
<parameter key="max_depth" value="1"/>
<parameter key="delay" value="500"/>
<parameter key="max_threads" value="10"/>
</operator>
<connect from_op="Crawl Web" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I am trying to reach the facebook link but it doesn't work.
Can you help me ?
Sincerly,
Ju
Tagged:
0