"XML parser seems to lack robustness"

aruberutou
aruberutou New Altair Community Member
edited November 2024 in Community Q&A
Hello,

Rapidminer is a lovely tool and has helped my work tremendously. One glaring weakness, however, is the limited capacity of "Read XML".

It chokes on even moderately sized xml files. As a work-around, I have taken to using BASEX to process my giant XML files (1-2GB or so), into lighter-weight Excel-readable XML files. I then load that Excel file into Rapidminer.

Obviously, this is not a deal-breaker, but it would be nicer if I could simply do everything within Rapidminer.

Thanks,
Tagged:

Answers

  • JEdward
    JEdward New Altair Community Member
    Which version of RapidMiner are you using? 
    The old version 5.3 does have a problem with reading XML files & I know that the library was updated for 6.4 so it should be better now. 

    However, if you are having problems still with the speed of it running try exploring some of the XML parsing features in Groovy Script, they're pretty good. 
    I had to read large XML files with 5.3 and solved the issue by writing a short groovy script to parse the files for me as needed and return an example set back to RM. 

    Good luck!
  • aruberutou
    aruberutou New Altair Community Member
    Hi,

    Thanks for the follow-up. I am actually not at all familiar with Groovy script. How would I got about setting that up? I am indeed using the most current version of Rapidminer, but I still get performance issues. Perhaps part of the problem is my using the wizard interface, rather than something more programatic.

    Thanks for the tip!