Web crawling a difficult webpage (Airbnb)

21763289
21763289 New Altair Community Member
edited November 2024 in Community Q&A

Hello,

 

I need to webscrap Airbnb webpage. I need to get all the punctuations from all the acommodations in a city ("Veracidad":5,"Comunicacion":5, etc.). airbnb.jpg

First, I thought about getting all the urls for all the acommodations in a city, for example . Then make the web crawler do the scraping to all those links and get the individual punctuations.

But when I use a max crawl depth of 1 with the url in the example link I don't get the acommodations' urls ...

 

Could you help me, please? :womanhappy:

 

 

 

 

 

Answers

  • sgenzer
    sgenzer
    Altair Employee

    Hello @21763289 please note that webscraping commercial websites is generally illegal and/or violates the Terms of Service of these companies. Here is the specific language from airbnb.com:

     

    14.1 You are solely responsible for compliance with any and all laws, rules, regulations, and Tax obligations that may apply to your use of the Airbnb Platform. In connection with your use of the Airbnb Platform, you will not and will not assist or enable others to:
    ...
    use any robots, spider, crawler, scraper or other automated means or processes to access, collect data or other content from or otherwise interact with the Airbnb Platform for any purpose;

    (source: https://www.airbnb.com/terms)

     

    I STRONGLY advise any RapidMiner users to please check the Terms of Service of any website when using our software or any other means of webscraping.

     

    Scott

     

  • 21763289
    21763289 New Altair Community Member

    Ok, thanks, I understand.

     

    So, if someone'd want to reply me privately about how to do it hypothetically... It is just for doing a research for my university.

  • rfuentealba
    rfuentealba New Altair Community Member

    Hi @21763289,

     

    Have you checked if you can do it legally through the AirBnB API? Looks like they do have one:

    https://www.airbnb.com/partner?c=tumblr&af=746240

     

    I haven't worked with it, but this might be a good beginning.

     

    All the best,

     

    Rodrigo.

  • 21763289
    21763289 New Altair Community Member

    Nice idea. Thanks!!