"READ XML operator: how to handle repeating tags?"

Fran_ois-Paul_S
Fran_ois-Paul_S New Altair Community Member
edited November 5 in Altair RapidMiner
Hi,

consider the following XML file, where each "example" ("RECORD" tag) may contain zero, one or more "KEYWORD" sub-tags:

<?xml version="1.0" encoding="UTF-8"?>

<ROOT>
  <RECORD>
    <ID>1</ID> 
    <TEXT>blah blah etc</TEXT>
    <KEYWORD>kw1</KEYWORD>
    <KEYWORD>This is kw2</KEYWORD>
  </RECORD>
  <RECORD>
    <ID>2</ID> 
    <TEXT>other blah</TEXT>
    <KEYWORD>kw3</KEYWORD>
  </RECORD>
</ROOT>

How can I handle the repeating "KEYWORD" tag?

With:
<parameter key="xpath_for_examples" value="//RECORD"/>
and
<parameter key="xpath_for_attribute" value="KEYWORD/text()"/>

I get for the keyword attribute of the first record: "kw1This is kw2"

I tried to add a separator using the XPATH 2.0 expression:
string-join(KEYWORD, ";")
but this doesn't seem to be supported

Is it possible to get a properly "keywords" attribute (that I could later handle with text processing tools, or with the "split" operator)?

TIA

fps

PS Here's the complete process:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_xml" compatibility="5.3.013" expanded="true" height="60" name="Read XML" width="90" x="45" y="30">
        <parameter key="file" value="/Users/fps/_fps/Data/XMLTest.xml"/>
        <parameter key="xpath_for_examples" value="//RECORD"/>
        <enumeration key="xpaths_for_attributes">
          <parameter key="xpath_for_attribute" value="ID/text()"/>
          <parameter key="xpath_for_attribute" value="TEXT/text()"/>
          <parameter key="xpath_for_attribute" value="KEYWORD/text()"/>
        </enumeration>
        <list key="namespaces"/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
      </operator>
      <connect from_op="Read XML" from_port="output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>



Tagged: