Syntax error: I am getting ? while extracting data using XPATH

Shahzad
Shahzad New Altair Community Member
edited November 5 in Community Q&A
Hi, I am trying to extract some data from donedeal.ie website. But I am getting ? instead of values. I am not sure if my syntax is correct or not.

I have extracted XPATH using google chrome. Right-click and inspect the element and copy the Xpath. For example, I have extracted following following Xpath 
/html/body/main/div/div[1]/div/div[2]/div[2]/div[3]/div[1]/div/div[1]/div/h1

I have used h: before div and html but didnt help

Can you please help?

Regards
/Shahzad

Answers

  • sgenzer
    sgenzer
    Altair Employee
    hi @Shahzad can you please post your XML?

    Scott

  • Shahzad
    Shahzad New Altair Community Member
    edited November 2018
    Hello Scott

    XML is pasted below. I have two processes Adverts Process and Donedeal Process. In Adverts process I am not able to fetch "Year" rest all other attributes are OK.

    From Donedeal process, i cant fetch any attribute from the web page. Any help will be helpful.

    Regards
    /Shahzad 
  • sgenzer
    sgenzer
    Altair Employee
    hi @Shahzad so for some weird reason your .txt file has no <> symbols in it - hence impossible to paste into RapidMiner. Can you please just insert the XML into this thread by using the ¶ and then choose "Code"?

    Thank you.

    Scott

  • Shahzad
    Shahzad New Altair Community Member
    Hello Scott

    I have tried to paste the code but web page is not allowing me to post the comment. I have attached file including xml tag. Hope that will help.

    Regards
    /Shahzad
  • sgenzer
    sgenzer
    Altair Employee
    hello @Shahzad so thank you for this. Some thoughts...

    - For Adverts, if you want the year of the car why not just create a new attribute which is the prefix of your Vehicle Name or Description fields which have that information? As years are always in the beginning and four digits, you could simply do this:



    - For Donedeal, the issue is that your information is in JSON format, not XML. Just use the Json path option instead of XPath in your Extract Information operator:



    If you're not familiar with JSONPath, this is always my go-to resource: https://goessner.net/articles/JsonPath/

    Scott

  • kayman
    kayman New Altair Community Member
    http://jsonpath.com/  is an easy to use online tool to test your json path.
    Combined with Scott's link it saved me a lot of time already
  • Shahzad
    Shahzad New Altair Community Member
    Thanks for update guys. In few cases year is not the part of the Vehicle name. Hence JSON wont work. I have used cut operator to extract year from Vehicle name but as mentioned if year is not mentioned in Vehicle title then I am back to square one :(

    I am not sure if the website is badly designed or information in GRID cannot be accessible via XPath.

    Regards
    /Shahzad