"Read XML, reading Parent attributes"

francis_sathiak
francis_sathiak New Altair Community Member
edited November 5 in Community Q&A
Hey All,  

Have been using the Data import Wizard, and on Step 4 when you define you xpaths i have an attribute set up using "../" to get a parent attribute of all the entries. In step 4 it actually shows the correct current value but in step 5 when it shows the preview of the 100 rows that defind attribute is blank.. It is also blank when i export the data. 

My data looks like this 
<routeCondition>
      <spatialRuleCode>INCLUSION</spatialRuleCode>
      <segment>
        <persistentIdentifier>NSW5</persistentIdentifier>
        <segmentText>UNNAMED</segmentText>
      </segment>
      <segment>
        <persistentIdentifier>NSW5128</persistentIdentifier>
        <segmentText>FEDERAL</segmentText>
      </segment>
<routeCondition>
      <spatialRuleCode>EXCLUSION</spatialRuleCode>
      <segment>
        <persistentIdentifier>NSW5005</persistentIdentifier>
        <segmentText>FEDERAL</segmentText>
      </segment>
      <segment>
        <persistentIdentifier>NSW5025</persistentIdentifier>
        <segmentText>FEDERAL</segmentText>
      </segment>
      <segment>
        <persistentIdentifier>NSW5505</persistentIdentifier>
        <segmentText>FEDERAL</segmentText>
      </segment>
      <segment>
        <persistentIdentifier>NSW500517065</persistentIdentifier>
        <segmentText>FEDERAL</segmentText>
      </segment>
<routeCondition>
      <spatialRuleCode>INCLUSION</spatialRuleCode> 
      <segment>
        <persistentIdentifier>1706</persistentIdentifier>
        <segmentText>FEDERAL</segmentText>
      </segment>
      <segment>
        <persistentIdentifier>7030</persistentIdentifier>
        <segmentText>FEDERAL</segmentText>
      </segment>
Because the number of segments change for each route condition, i had selected my xpath for examples to be //routeCondition/segment
and set up my attributes as follows 
../spatialRuleCode/text()

persistentIdentifier[1]/text()

segmentText[1]/text()

and was hoping for an output of 

INCLUSION  NSW5 UNNAMED
INCLUSION NSW5128 FEDERAL
EXCLUSION NSW5005 FEDERAL
EXCLUSION NSW5025 FEDERAL
EXCLUSION NSW5505 FEDERAL
INCLUSION  NSW1706 UNNAMED
INCLUSION NSW7030 FEDERAL

But the attributes for spatialRuleCode are blank any help?? hopefully its just the notation of '..' that is wrong.. 

Tagged:

Best Answer

  • kayman
    kayman New Altair Community Member
    Answer ✓
    When you have nested and repetitive XML I think it's better to use the XSLT operator (part of the text mining extension). 

    Find attached a working example based on your data (thought the XML you provided is not properly build so I modified it a bit)

    <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="XML" width="90" x="112" y="34">
            <parameter key="text" value="&lt;root&gt;&#10;&#9;&lt;routeCondition&gt;&#10;&#9;&#9;&lt;spatialRuleCode&gt;EXCLUSION&lt;/spatialRuleCode&gt;&#10;&#9;&#9;&lt;segment&gt;&#10;&#9;&#9;&#9;&lt;persistentIdentifier&gt;NSW5005&lt;/persistentIdentifier&gt;&#10;&#9;&#9;&#9;&lt;segmentText&gt;FEDERAL&lt;/segmentText&gt;&#10;&#9;&#9;&lt;/segment&gt;&#10;&#9;&#9;&lt;segment&gt;&#10;&#9;&#9;&#9;&lt;persistentIdentifier&gt;NSW5025&lt;/persistentIdentifier&gt;&#10;&#9;&#9;&#9;&lt;segmentText&gt;FEDERAL&lt;/segmentText&gt;&#10;&#9;&#9;&lt;/segment&gt;&#10;&#9;&#9;&lt;segment&gt;&#10;&#9;&#9;&#9;&lt;persistentIdentifier&gt;NSW5505&lt;/persistentIdentifier&gt;&#10;&#9;&#9;&#9;&lt;segmentText&gt;FEDERAL&lt;/segmentText&gt;&#10;&#9;&#9;&lt;/segment&gt;&#10;&#9;&#9;&lt;segment&gt;&#10;&#9;&#9;&#9;&lt;persistentIdentifier&gt;NSW500517065&lt;/persistentIdentifier&gt;&#10;&#9;&#9;&#9;&lt;segmentText&gt;FEDERAL&lt;/segmentText&gt;&#10;&#9;&#9;&lt;/segment&gt;&#10;&#9;&lt;/routeCondition&gt;&#10;&#9;&lt;routeCondition&gt;&#10;&#9;&#9;&lt;spatialRuleCode&gt;INCLUSION&lt;/spatialRuleCode&gt;&#10;&#9;&#9;&lt;segment&gt;&#10;&#9;&#9;&#9;&lt;persistentIdentifier&gt;1706&lt;/persistentIdentifier&gt;&#10;&#9;&#9;&#9;&lt;segmentText&gt;FEDERAL&lt;/segmentText&gt;&#10;&#9;&#9;&lt;/segment&gt;&#10;&#9;&#9;&lt;segment&gt;&#10;&#9;&#9;&#9;&lt;persistentIdentifier&gt;7030&lt;/persistentIdentifier&gt;&#10;&#9;&#9;&#9;&lt;segmentText&gt;FEDERAL&lt;/segmentText&gt;&#10;&#9;&#9;&lt;/segment&gt;&#10;&#9;&lt;/routeCondition&gt;&#10;&lt;/root&gt;"/>
          </operator>
          <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="XSLT" width="90" x="112" y="136">
            <parameter key="text" value="&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;&#10;&lt;xsl:stylesheet version=&quot;1.0&quot; xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;&gt;&#10;&#9;&lt;xsl:output method=&quot;xml&quot; version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; indent=&quot;yes&quot;/&gt;&#10;&#9;&#9;&lt;xsl:template match=&quot;/&quot;&gt;&#10;&#9;&#9;&lt;root&gt;&#10;&#9;&#9;&lt;xsl:for-each select=&quot;//routeCondition&quot;&gt;&#10;&#9;&#9;&#9;&lt;xsl:variable name=&quot;spatialRuleCode&quot; select=&quot;spatialRuleCode&quot;/&gt;&#10;&#9;&#9;&#9;&lt;xsl:for-each select=&quot;segment&quot;&gt;&#10;&#9;&#9;&#9;&lt;row spatialRuleCode=&quot;{$spatialRuleCode}&quot; persistentIdentifier=&quot;{persistentIdentifier}&quot; segmentText=&quot;{segmentText}&quot;/&gt;&#10;&#9;&#9;&#9;&lt;/xsl:for-each&gt;&#10;&#9;&#9;&lt;/xsl:for-each&gt;&#10;&#9;&#9;&lt;/root&gt;&#10;&#9;&lt;/xsl:template&gt;&#10;&lt;/xsl:stylesheet&gt;"/>
          </operator>
          <operator activated="true" class="text:process_xslt" compatibility="8.1.000" expanded="true" height="82" name="Process XSLT" width="90" x="246" y="34"/>
          <operator activated="true" class="text:cut_document" compatibility="8.1.000" expanded="true" height="68" name="Cut Document" width="90" x="380" y="34">
            <parameter key="query_type" value="XPath"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries"/>
            <list key="regular_region_queries"/>
            <list key="xpath_queries">
              <parameter key="row" value="//row"/>
            </list>
            <list key="namespaces"/>
            <parameter key="ignore_CDATA" value="false"/>
            <parameter key="assume_html" value="false"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries"/>
            <process expanded="true">
              <operator activated="true" class="text:extract_information" compatibility="8.1.000" expanded="true" height="68" name="Extract Information" width="90" x="179" y="34">
                <parameter key="query_type" value="XPath"/>
                <list key="string_machting_queries"/>
                <list key="regular_expression_queries"/>
                <list key="regular_region_queries"/>
                <list key="xpath_queries">
                  <parameter key="spatialRuleCode" value=".//@spatialRuleCode"/&gt;
                  <parameter key="persistentIdentifier" value=".//@persistentIdentifier"/&gt;
                  <parameter key="segmentText" value=".//@segmentText"/&gt;
                </list>
                <list key="namespaces"/>
                <parameter key="ignore_CDATA" value="false"/>
                <parameter key="assume_html" value="false"/>
                <list key="index_queries"/>
                <list key="jsonpath_queries"/>
              </operator>
              <connect from_port="segment" to_op="Extract Information" to_port="document"/>
              <connect from_op="Extract Information" from_port="document" to_port="document 1"/>
              <portSpacing port="source_segment" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="8.1.000" expanded="true" height="82" name="Documents to Data" width="90" x="514" y="34">
            <parameter key="text_attribute" value="tmp"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.0.003" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="query_key|tmp"/>
            <parameter key="invert_selection" value="true"/>
          </operator>
          <connect from_op="XML" from_port="output" to_op="Process XSLT" to_port="document"/>
          <connect from_op="XSLT" from_port="output" to_op="Process XSLT" to_port="xslt document"/>
          <connect from_op="Process XSLT" from_port="document" to_op="Cut Document" to_port="document"/>
          <connect from_op="Cut Document" from_port="documents" to_op="Documents to Data" to_port="documents 1"/>
          <connect from_op="Documents to Data" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    


Answers

  • kayman
    kayman New Altair Community Member
    Answer ✓
    When you have nested and repetitive XML I think it's better to use the XSLT operator (part of the text mining extension). 

    Find attached a working example based on your data (thought the XML you provided is not properly build so I modified it a bit)

    <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="XML" width="90" x="112" y="34">
            <parameter key="text" value="&lt;root&gt;&#10;&#9;&lt;routeCondition&gt;&#10;&#9;&#9;&lt;spatialRuleCode&gt;EXCLUSION&lt;/spatialRuleCode&gt;&#10;&#9;&#9;&lt;segment&gt;&#10;&#9;&#9;&#9;&lt;persistentIdentifier&gt;NSW5005&lt;/persistentIdentifier&gt;&#10;&#9;&#9;&#9;&lt;segmentText&gt;FEDERAL&lt;/segmentText&gt;&#10;&#9;&#9;&lt;/segment&gt;&#10;&#9;&#9;&lt;segment&gt;&#10;&#9;&#9;&#9;&lt;persistentIdentifier&gt;NSW5025&lt;/persistentIdentifier&gt;&#10;&#9;&#9;&#9;&lt;segmentText&gt;FEDERAL&lt;/segmentText&gt;&#10;&#9;&#9;&lt;/segment&gt;&#10;&#9;&#9;&lt;segment&gt;&#10;&#9;&#9;&#9;&lt;persistentIdentifier&gt;NSW5505&lt;/persistentIdentifier&gt;&#10;&#9;&#9;&#9;&lt;segmentText&gt;FEDERAL&lt;/segmentText&gt;&#10;&#9;&#9;&lt;/segment&gt;&#10;&#9;&#9;&lt;segment&gt;&#10;&#9;&#9;&#9;&lt;persistentIdentifier&gt;NSW500517065&lt;/persistentIdentifier&gt;&#10;&#9;&#9;&#9;&lt;segmentText&gt;FEDERAL&lt;/segmentText&gt;&#10;&#9;&#9;&lt;/segment&gt;&#10;&#9;&lt;/routeCondition&gt;&#10;&#9;&lt;routeCondition&gt;&#10;&#9;&#9;&lt;spatialRuleCode&gt;INCLUSION&lt;/spatialRuleCode&gt;&#10;&#9;&#9;&lt;segment&gt;&#10;&#9;&#9;&#9;&lt;persistentIdentifier&gt;1706&lt;/persistentIdentifier&gt;&#10;&#9;&#9;&#9;&lt;segmentText&gt;FEDERAL&lt;/segmentText&gt;&#10;&#9;&#9;&lt;/segment&gt;&#10;&#9;&#9;&lt;segment&gt;&#10;&#9;&#9;&#9;&lt;persistentIdentifier&gt;7030&lt;/persistentIdentifier&gt;&#10;&#9;&#9;&#9;&lt;segmentText&gt;FEDERAL&lt;/segmentText&gt;&#10;&#9;&#9;&lt;/segment&gt;&#10;&#9;&lt;/routeCondition&gt;&#10;&lt;/root&gt;"/>
          </operator>
          <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="XSLT" width="90" x="112" y="136">
            <parameter key="text" value="&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;&#10;&lt;xsl:stylesheet version=&quot;1.0&quot; xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;&gt;&#10;&#9;&lt;xsl:output method=&quot;xml&quot; version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; indent=&quot;yes&quot;/&gt;&#10;&#9;&#9;&lt;xsl:template match=&quot;/&quot;&gt;&#10;&#9;&#9;&lt;root&gt;&#10;&#9;&#9;&lt;xsl:for-each select=&quot;//routeCondition&quot;&gt;&#10;&#9;&#9;&#9;&lt;xsl:variable name=&quot;spatialRuleCode&quot; select=&quot;spatialRuleCode&quot;/&gt;&#10;&#9;&#9;&#9;&lt;xsl:for-each select=&quot;segment&quot;&gt;&#10;&#9;&#9;&#9;&lt;row spatialRuleCode=&quot;{$spatialRuleCode}&quot; persistentIdentifier=&quot;{persistentIdentifier}&quot; segmentText=&quot;{segmentText}&quot;/&gt;&#10;&#9;&#9;&#9;&lt;/xsl:for-each&gt;&#10;&#9;&#9;&lt;/xsl:for-each&gt;&#10;&#9;&#9;&lt;/root&gt;&#10;&#9;&lt;/xsl:template&gt;&#10;&lt;/xsl:stylesheet&gt;"/>
          </operator>
          <operator activated="true" class="text:process_xslt" compatibility="8.1.000" expanded="true" height="82" name="Process XSLT" width="90" x="246" y="34"/>
          <operator activated="true" class="text:cut_document" compatibility="8.1.000" expanded="true" height="68" name="Cut Document" width="90" x="380" y="34">
            <parameter key="query_type" value="XPath"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries"/>
            <list key="regular_region_queries"/>
            <list key="xpath_queries">
              <parameter key="row" value="//row"/>
            </list>
            <list key="namespaces"/>
            <parameter key="ignore_CDATA" value="false"/>
            <parameter key="assume_html" value="false"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries"/>
            <process expanded="true">
              <operator activated="true" class="text:extract_information" compatibility="8.1.000" expanded="true" height="68" name="Extract Information" width="90" x="179" y="34">
                <parameter key="query_type" value="XPath"/>
                <list key="string_machting_queries"/>
                <list key="regular_expression_queries"/>
                <list key="regular_region_queries"/>
                <list key="xpath_queries">
                  <parameter key="spatialRuleCode" value=".//@spatialRuleCode"/&gt;
                  <parameter key="persistentIdentifier" value=".//@persistentIdentifier"/&gt;
                  <parameter key="segmentText" value=".//@segmentText"/&gt;
                </list>
                <list key="namespaces"/>
                <parameter key="ignore_CDATA" value="false"/>
                <parameter key="assume_html" value="false"/>
                <list key="index_queries"/>
                <list key="jsonpath_queries"/>
              </operator>
              <connect from_port="segment" to_op="Extract Information" to_port="document"/>
              <connect from_op="Extract Information" from_port="document" to_port="document 1"/>
              <portSpacing port="source_segment" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="8.1.000" expanded="true" height="82" name="Documents to Data" width="90" x="514" y="34">
            <parameter key="text_attribute" value="tmp"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.0.003" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="query_key|tmp"/>
            <parameter key="invert_selection" value="true"/>
          </operator>
          <connect from_op="XML" from_port="output" to_op="Process XSLT" to_port="document"/>
          <connect from_op="XSLT" from_port="output" to_op="Process XSLT" to_port="xslt document"/>
          <connect from_op="Process XSLT" from_port="document" to_op="Cut Document" to_port="document"/>
          <connect from_op="Cut Document" from_port="documents" to_op="Documents to Data" to_port="documents 1"/>
          <connect from_op="Documents to Data" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    


  • francis_sathiak
    francis_sathiak New Altair Community Member
    Such an awesome and scalable solution, thanks so much. Is there any documentation around for the text mining extension would love to get more info.