nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Siemens Community Catalyst Program

The Siemens Community Catalyst program was co-created with our community to acknowledge technology leaders who consistently contribute to the Siemens Community. Nominations are accepted on a rolling basis.

Nominate Now

⚠️Please Note

Technical discussions have been migrated to the Siemens Support Center as Knowledge Base (KB) articles; please note that this content is no longer maintained and may be outdated, so for the latest information, log in to the Siemens Support Center, search online, or contact our support team.

Search for Content in Siemens Support Center

Web crawling on google page

Juju147

Hi everyone,

I have a question about the operator web crawling.

I am trying to use it on a google research page but unfortunatly, I cannot reach the link provide by the research.

For example, my google page is : https://www.google.fr/search?q=F&oq=f&aqs=chrome.4.69i60l3j69i59l3.2352j0j8&sourceid=chrome&espv=210&es_sm=93&ie=UTF-8

This my process :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="web:crawl_web" compatibility="5.3.001" expanded="true" height="60" name="Crawl Web" width="90" x="45" y="30">
<parameter key="url" value="https://www.google.fr/search?q=F&oq=f&aqs=chrome.4.69i60l3j69i59l3.2352j0j8&sourceid=chrome&espv=210&es_sm=93&ie=UTF-8"/>
<list key="crawling_rules">
<parameter key="store_with_matching_url" value=".+facebook.+"/>
<parameter key="follow_link_with_matching_url" value=".+facebook.+"/>
</list>
<parameter key="output_dir" value="C:\Users\Julien\Desktop\S5\WEBMINING"/>
<parameter key="max_pages" value="100"/>
<parameter key="max_depth" value="1"/>
<parameter key="delay" value="500"/>
<parameter key="max_threads" value="10"/>
</operator>
<connect from_op="Crawl Web" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

I am trying to reach the facebook link but it doesn't work.

Can you help me ?

Sincerly,

Ju