I have many XML files. They have similar structure but are different in some details.
The xml structure is similar as follow:
<article>
<art-front>
<titlegrp>
<title>Integrated phytoremediation</title>
</titlegrp>
<abstract>
<p>Phytoremediation is green rehabilitation technology .</p>
</abstract>
</art-front>
<art-body>
<section>
<title>One thing</title>
<p>the main technologies 1...</p>
<p>the main technologies 2...</p>
</section>
<section>
<title>Others</title>
<subsect1>
<p>the main technologies 3...</p>
<p>the main technologies 4...</p>
<p>the main technologies 5...</p>
</subsect1>
</section>
</art-body>
<art-back>
<biblist title="References">
<citauth>
<fname>H.</fname>
<surname>Ali</surname>
</citauth>
</biblist>
</art-body>
</abstract>
The xml file differences take place between <art-body> and </art-body>. Some xml files have four <section>, some have five..., the numbers of <p> in <section> tag also can be different. In addition, some xml files have not <subsect> contents, only have multiple <section> contents.
I want to extract <art-front> and <art-body> contents, but not <art-back> content.
I know that read xml operator can be used to extract content from xml file and also read document operator can finish it. Because my xml files are not totally same, I have no idea to deal with it. Is there any way to do that?
Thanks