XPath empty results

Question

Hello. I 'm trying to mine data using XPath from Google Scholar pages. I ' trying to get the name ,h-index and the first 20 publications I am using the following queries substring-before(//title, " - Google Scholar Citations") //*[contains(.,"h-index")]/../tr[3]//td[2] //a[contains(@href,'citation_for_view')] All of them work in Google Docs and in Java but none of them does in Rapidminer. I can't figure out what's wrong...

fras · Answer

The implementation of XPath in RapidMiner works a little bit different. The following process uses "Cut Documents" in combination with "Extract Information". This approach seems to be better in your case. Please check and take into account the use of nested processes. (\d+) "/> (.+) "/>