🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Enrich data from Web Service - Xpath Access

User: "kludikovsky"
New Altair Community Member
Updated by Jocelyn

Simple question:

What's wrong with this Xpath ?

Now a a little more on information:

I am trying to add information to already available data. Therefore the 'Enrich Data from Web Service' seemed the proper tool.

But I can't get the data I am looking for.

As I found out so far, the Xpath does not work as expected. (This might have to do with my understanding of Xpath ;-) ) 

Therefore I created a test, which is attached below.

This contains 4 slightly different test cases:

  • test_1..3
  • test_4..6
  • head_1..4
  • html

My question.

Why are only some cases delivering data and others not?    Especially those where there are elements directly addressed.

 

Any solutions or hints are welcome.

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="operator_toolbox:create_exampleset_from_doc" compatibility="0.5.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85">
<parameter key="Column Separator" value=","/>
<parameter key="Input Csv" value="a&#10;1"/>
</operator>
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="380" y="85">
<parameter key="query_type" value="XPath"/>
<list key="string_machting_queries"/>
<parameter key="attribute_type" value="Nominal"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries">
<parameter key="test_1" value="//*[@id=&amp;quot;main-container&quot;]//*[@class=&amp;quot;result-content&quot;]//*[@class=&amp;quot;address&quot;]"/>
<parameter key="test_2" value="//*[@id=&amp;quot;main-container&quot;]//*[@class=&amp;quot;result-content&quot;]/div"/>
<parameter key="test_3" value="//*[@id=&amp;quot;main-container&quot;]//*[@class=&amp;quot;result-content&quot;]/div[1]"/>
<parameter key="test_4" value="//*[@id=&amp;quot;main-container&quot;]//*[@class=&amp;quot;result-content&quot;]//*[@itemprop=&amp;quot;url&quot;]"/>
<parameter key="test_5" value="//*[@id=&amp;quot;main-container&quot;]//*[@class=&amp;quot;result-content&quot;]/a"/>
<parameter key="test_6" value="//*[@id=&amp;quot;main-container&quot;]//*[@class=&amp;quot;result-content&quot;]/a[1]"/>
<parameter key="head_1" value="//html"/>
<parameter key="head_2" value="//head"/>
<parameter key="head_3" value="//*/head"/>
<parameter key="head_4" value="//*"/>
<parameter key="html" value="html"/>
</list>
<list key="namespaces"/>
<parameter key="ignore_CDATA" value="true"/>
<parameter key="assume_html" value="true"/>
<list key="index_queries"/>
<list key="jsonpath_queries"/>
<parameter key="request_method" value="GET"/>
<parameter key="url" value="http://www.firmenabc.at/result.aspx?what=haniger+benesch+versicherungs+makler+gmbh&amp;where=&amp;exact=false&amp;inTitleOnly=false&amp;l=&amp;si=0&amp;iid=&amp;sid=-1&amp;did=&amp;cc="/>
<parameter key="delay" value="500"/>
<list key="request_properties"/>
<parameter key="encoding" value="UTF-8"/>
<description align="center" color="transparent" colored="false" width="126">Get the data from FirmenABC</description>
</operator>
<connect from_op="Create ExampleSet" from_port="output" to_op="Enrich Data by Webservice" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "kludikovsky"
    New Altair Community Member
    OP
    Accepted Answer

    After severals days of experiementation and searching the web:

     

    There are two reasons why this does not work properly:

     

    1) http: 301 Page moved

    RM does not handle moved pages. So if you are looking for a  page which responds with http 301 - which the browser will forward you to - RM will not. 

    Found that thanks to @sgenzer here http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Open-File-not-returning-data-from-url/m-p/41351#M28008

     

     

    2) h: namespace tag required

    All html tags need to be prefixed with the 'h:'-namespace-prefix. Even as the html is per default set and need not to be specified in the namespace-definition it need to be specified in the xpath-queries. 

    (It might be an improvement idea for this operator to have the 'h:'-namespace as a preset, so that xpath's from browsers can be used without any modifications)

    Foudn this thanks to a small note here