🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Problem with extensional Operator "Get Pages"

User: "jhiller"
New Altair Community Member
Updated by Jocelyn

Hi,

 

I have a problem with the Operator "Get Pages" from Web Mining Extension.

It seems like that the operator is having a coding problem with UTF-8 charakters such aus "Ü".

With Mozilla Firefox I get a json-response with results after calling the URL "https://itunes.apple.com/search?term="Google Übersetzer"&entity=software&country=de&media=software&limit=5".

By calling this URL via Operator "Get Pages" I get a json-result but without an search-result.

 

Thats my test-process:

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="7.5.001" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
<parameter key="target_function" value="random"/>
<parameter key="number_examples" value="1"/>
<parameter key="number_of_attributes" value="1"/>
<parameter key="attributes_lower_bound" value="-10.0"/>
<parameter key="attributes_upper_bound" value="10.0"/>
<parameter key="gaussian_standard_deviation" value="10.0"/>
<parameter key="largest_radius" value="10.0"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.5.001" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
<list key="function_descriptions">
<parameter key="att1" value="&quot;https://itunes.apple.com/search?term=\&quot;Google Übersetzer\&quot;&amp;entity=software&amp;country=de&amp;media=software&amp;limit=5&quot;"/>
</list>
<parameter key="keep_all" value="true"/>
</operator>
<operator activated="true" class="web:retrieve_webpages" compatibility="7.3.000" expanded="true" height="68" name="getPage" width="90" x="313" y="34">
<parameter key="link_attribute" value="att1"/>
<parameter key="page_attribute" value="html"/>
<parameter key="random_user_agent" value="false"/>
<parameter key="user_agent" value="Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0"/>
<parameter key="connection_timeout" value="2000"/>
<parameter key="read_timeout" value="2000"/>
<parameter key="follow_redirects" value="true"/>
<parameter key="accept_cookies" value="none"/>
<parameter key="cookie_scope" value="global"/>
<parameter key="request_method" value="POST"/>
<parameter key="delay" value="random"/>
<parameter key="delay_amount" value="5000"/>
<parameter key="min_delay_amount" value="2000"/>
<parameter key="max_delay_amount" value="5000"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="getPage" to_port="Example Set"/>
<connect from_op="getPage" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

Can you reproduce the issue and do you think that this is a bug of the operator or do I have to escape the url and if yes in which way?

 

Regards

Johannes

Find more posts tagged with