Create XML FIles From Master XML

neodjandre
neodjandre New Altair Community Member
edited November 5 in Community Q&A
Hey all, newbie alert! Just came across Rapid Miner today and I must say it's an excellent tool.

I have a big XML file which contains 30,000 records. There is an element <city> </city> for each record.

I would like to extract all records with the city name <city>London</city> and create a separate XML file for this.

Any ideas would be much appreciated!

thanks
Andy
Tagged:

Answers

  • Marco_Boeck
    Marco_Boeck New Altair Community Member
    Hi,

    you can use the process below. Just make sure to change the filenames in the Read XML and Write Document operators to something matching your file system.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.007">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_xml" compatibility="5.3.007" expanded="true" height="60" name="Read XML" width="90" x="45" y="30">
            <parameter key="file" value="C:\Users\username\Desktop\test.xml"/>
            <parameter key="xpath_for_examples" value="//city"/>
            <enumeration key="xpaths_for_attributes">
              <parameter key="xpath_for_attribute" value="node()"/>
            </enumeration>
            <list key="namespaces"/>
            <parameter key="use_default_namespace" value="false"/>
            <list key="annotations"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="att1.true.binominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="set_macro" compatibility="5.3.007" expanded="true" height="76" name="Set Macro" width="90" x="179" y="30">
            <parameter key="macro" value="i"/>
            <parameter key="value" value="0"/>
          </operator>
          <operator activated="true" class="loop_values" compatibility="5.3.007" expanded="true" height="76" name="Loop Values" width="90" x="313" y="30">
            <parameter key="attribute" value="att1"/>
            <process expanded="true">
              <operator activated="true" class="text:create_document" compatibility="5.3.001" expanded="true" height="60" name="Create Document" width="90" x="45" y="30">
                <parameter key="text" value="%{loop_value}"/>
              </operator>
              <operator activated="true" class="text:write_document" compatibility="5.3.001" expanded="true" height="76" name="Write Document" width="90" x="179" y="30">
                <parameter key="file" value="C:\Users\username\Desktop\output%{i}.xml"/>
              </operator>
              <operator activated="true" class="set_macro" compatibility="5.3.007" expanded="true" height="76" name="Set Macro (2)" width="90" x="313" y="30">
                <parameter key="macro" value="i"/>
                <parameter key="value" value="%{a}"/>
              </operator>
              <connect from_op="Create Document" from_port="output" to_op="Write Document" to_port="document"/>
              <connect from_op="Write Document" from_port="document" to_op="Set Macro (2)" to_port="through 1"/>
              <connect from_op="Set Macro (2)" from_port="through 1" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Read XML" from_port="output" to_op="Set Macro" to_port="through 1"/>
          <connect from_op="Set Macro" from_port="through 1" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Regards,
    Marco