Extract data from XML files

New Altair Community Member

Nov 13, 2021

Updated Nov 5, 2024 by Jocelyn

I have many XML files. They have similar structure but are different in some details.

The xml structure is similar as follow:

<article>
<art-front>

<title>Integrated phytoremediation</title>

</titlegrp>
<abstract>

<p>Phytoremediation is green rehabilitation technology .</p>
</abstract>
</art-front>
<art-body>
<section>
<title>One thing</title>
<p>the main technologies 1...</p>
<p>the main technologies 2...</p>
</section>
<section>

<title>Others</title>
<subsect1>
<p>the main technologies 3...</p>
<p>the main technologies 4...</p>
<p>the main technologies 5...</p>
</subsect1>

</section>
</art-body>
<art-back>
<biblist title="References">
<citauth>

</citauth>

</biblist>

</art-body>

</abstract>

The xml file differences take place between <art-body> and </art-body>. Some xml files have four <section>, some have five..., the numbers of <p> in <section> tag also can be different. In addition, some xml files have not <subsect> contents, only have multiple <section> contents.

I want to extract <art-front> and <art-body> contents, but not <art-back> content.

I know that read xml operator can be used to extract content from xml file and also read document operator can finish it. Because my xml files are not totally same, I have no idea to deal with it. Is there any way to do that?

Thanks

Find more posts tagged with

AI Studio

Sort by:

1 - 1 of 11

BalazsBaranyRM

New Altair Community Member

Accepted Answer

Nov 15, 2021

Hi!

In these cases I usually build the process with multiple Read XML operators.

One would extract the common information, e. g. from the constant header. Another the variable information, like the repeating entries. I can then join the results e. g. based on the file name or some other common attribute.

Use the most specific XPath for selecting what you need in each Read XML and figure out which join is the best for the task.

Regards,
Balázs

View in context

Extract data from XML files

Find more posts tagged with

Quick Links