Enrich data from Web Service - Xpath Access

User: "kludikovsky"
New Altair Community Member
Updated by Jocelyn

Simple question:

What's wrong with this Xpath ?

Now a a little more on information:

I am trying to add information to already available data. Therefore the 'Enrich Data from Web Service' seemed the proper tool.

But I can't get the data I am looking for.

As I found out so far, the Xpath does not work as expected. (This might have to do with my understanding of Xpath ;-) ) 

Therefore I created a test, which is attached below.

This contains 4 slightly different test cases:

  • test_1..3
  • test_4..6
  • head_1..4
  • html

My question.

Why are only some cases delivering data and others not?    Especially those where there are elements directly addressed.

 

Any solutions or hints are welcome.

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="operator_toolbox:create_exampleset_from_doc" compatibility="0.5.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85">
<parameter key="Column Separator" value=","/>
<parameter key="Input Csv" value="a&#10;1"/>
</operator>
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="380" y="85">
<parameter key="query_type" value="XPath"/>
<list key="string_machting_queries"/>
<parameter key="attribute_type" value="Nominal"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries">
<parameter key="test_1" value="//*[@id=&amp;quot;main-container&quot;]//*[@class=&amp;quot;result-content&quot;]//*[@class=&amp;quot;address&quot;]"/>
<parameter key="test_2" value="//*[@id=&amp;quot;main-container&quot;]//*[@class=&amp;quot;result-content&quot;]/div"/>
<parameter key="test_3" value="//*[@id=&amp;quot;main-container&quot;]//*[@class=&amp;quot;result-content&quot;]/div[1]"/>
<parameter key="test_4" value="//*[@id=&amp;quot;main-container&quot;]//*[@class=&amp;quot;result-content&quot;]//*[@itemprop=&amp;quot;url&quot;]"/>
<parameter key="test_5" value="//*[@id=&amp;quot;main-container&quot;]//*[@class=&amp;quot;result-content&quot;]/a"/>
<parameter key="test_6" value="//*[@id=&amp;quot;main-container&quot;]//*[@class=&amp;quot;result-content&quot;]/a[1]"/>
<parameter key="head_1" value="//html"/>
<parameter key="head_2" value="//head"/>
<parameter key="head_3" value="//*/head"/>
<parameter key="head_4" value="//*"/>
<parameter key="html" value="html"/>
</list>
<list key="namespaces"/>
<parameter key="ignore_CDATA" value="true"/>
<parameter key="assume_html" value="true"/>
<list key="index_queries"/>
<list key="jsonpath_queries"/>
<parameter key="request_method" value="GET"/>
<parameter key="url" value="http://www.firmenabc.at/result.aspx?what=haniger+benesch+versicherungs+makler+gmbh&amp;where=&amp;exact=false&amp;inTitleOnly=false&amp;l=&amp;si=0&amp;iid=&amp;sid=-1&amp;did=&amp;cc="/>
<parameter key="delay" value="500"/>
<list key="request_properties"/>
<parameter key="encoding" value="UTF-8"/>
<description align="center" color="transparent" colored="false" width="126">Get the data from FirmenABC</description>
</operator>
<connect from_op="Create ExampleSet" from_port="output" to_op="Enrich Data by Webservice" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "kludikovsky"
    New Altair Community Member
    OP
    Accepted Answer

    After severals days of experiementation and searching the web:

     

    There are two reasons why this does not work properly:

     

    1) http: 301 Page moved

    RM does not handle moved pages. So if you are looking for a  page which responds with http 301 - which the browser will forward you to - RM will not. 

    Found that thanks to @sgenzer here http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Open-File-not-returning-data-from-url/m-p/41351#M28008

     

     

    2) h: namespace tag required

    All html tags need to be prefixed with the 'h:'-namespace-prefix. Even as the html is per default set and need not to be specified in the namespace-definition it need to be specified in the xpath-queries. 

    (It might be an improvement idea for this operator to have the 'h:'-namespace as a preset, so that xpath's from browsers can be used without any modifications)

    Foudn this thanks to a small note here