"READ XML operator: how to handle repeating tags?"
Fran_ois-Paul_S
New Altair Community Member
Hi,
consider the following XML file, where each "example" ("RECORD" tag) may contain zero, one or more "KEYWORD" sub-tags:
<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<RECORD>
<ID>1</ID>
<TEXT>blah blah etc</TEXT>
<KEYWORD>kw1</KEYWORD>
<KEYWORD>This is kw2</KEYWORD>
</RECORD>
<RECORD>
<ID>2</ID>
<TEXT>other blah</TEXT>
<KEYWORD>kw3</KEYWORD>
</RECORD>
</ROOT>
How can I handle the repeating "KEYWORD" tag?
With:
<parameter key="xpath_for_examples" value="//RECORD"/>
and
<parameter key="xpath_for_attribute" value="KEYWORD/text()"/>
I get for the keyword attribute of the first record: "kw1This is kw2"
I tried to add a separator using the XPATH 2.0 expression:
string-join(KEYWORD, ";")
but this doesn't seem to be supported
Is it possible to get a properly "keywords" attribute (that I could later handle with text processing tools, or with the "split" operator)?
TIA
fps
PS Here's the complete process:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_xml" compatibility="5.3.013" expanded="true" height="60" name="Read XML" width="90" x="45" y="30">
<parameter key="file" value="/Users/fps/_fps/Data/XMLTest.xml"/>
<parameter key="xpath_for_examples" value="//RECORD"/>
<enumeration key="xpaths_for_attributes">
<parameter key="xpath_for_attribute" value="ID/text()"/>
<parameter key="xpath_for_attribute" value="TEXT/text()"/>
<parameter key="xpath_for_attribute" value="KEYWORD/text()"/>
</enumeration>
<list key="namespaces"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<connect from_op="Read XML" from_port="output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
consider the following XML file, where each "example" ("RECORD" tag) may contain zero, one or more "KEYWORD" sub-tags:
<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<RECORD>
<ID>1</ID>
<TEXT>blah blah etc</TEXT>
<KEYWORD>kw1</KEYWORD>
<KEYWORD>This is kw2</KEYWORD>
</RECORD>
<RECORD>
<ID>2</ID>
<TEXT>other blah</TEXT>
<KEYWORD>kw3</KEYWORD>
</RECORD>
</ROOT>
How can I handle the repeating "KEYWORD" tag?
With:
<parameter key="xpath_for_examples" value="//RECORD"/>
and
<parameter key="xpath_for_attribute" value="KEYWORD/text()"/>
I get for the keyword attribute of the first record: "kw1This is kw2"
I tried to add a separator using the XPATH 2.0 expression:
string-join(KEYWORD, ";")
but this doesn't seem to be supported
Is it possible to get a properly "keywords" attribute (that I could later handle with text processing tools, or with the "split" operator)?
TIA
fps
PS Here's the complete process:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_xml" compatibility="5.3.013" expanded="true" height="60" name="Read XML" width="90" x="45" y="30">
<parameter key="file" value="/Users/fps/_fps/Data/XMLTest.xml"/>
<parameter key="xpath_for_examples" value="//RECORD"/>
<enumeration key="xpaths_for_attributes">
<parameter key="xpath_for_attribute" value="ID/text()"/>
<parameter key="xpath_for_attribute" value="TEXT/text()"/>
<parameter key="xpath_for_attribute" value="KEYWORD/text()"/>
</enumeration>
<list key="namespaces"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<connect from_op="Read XML" from_port="output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0