Web crawling a difficult webpage (Airbnb)
Hello,
I need to webscrap Airbnb webpage. I need to get all the punctuations from all the acommodations in a city ("Veracidad":5,"Comunicacion":5, etc.).
First, I thought about getting all the urls for all the acommodations in a city, for example . Then make the web crawler do the scraping to all those links and get the individual punctuations.
But when I use a max crawl depth of 1 with the url in the example link I don't get the acommodations' urls ...
Could you help me, please? :womanhappy:
Answers
-
Hello @21763289 please note that webscraping commercial websites is generally illegal and/or violates the Terms of Service of these companies. Here is the specific language from airbnb.com:
14.1 You are solely responsible for compliance with any and all laws, rules, regulations, and Tax obligations that may apply to your use of the Airbnb Platform. In connection with your use of the Airbnb Platform, you will not and will not assist or enable others to:
...
use any robots, spider, crawler, scraper or other automated means or processes to access, collect data or other content from or otherwise interact with the Airbnb Platform for any purpose;(source: https://www.airbnb.com/terms)
I STRONGLY advise any RapidMiner users to please check the Terms of Service of any website when using our software or any other means of webscraping.
Scott
0 -
Ok, thanks, I understand.
So, if someone'd want to reply me privately about how to do it hypothetically... It is just for doing a research for my university.
0 -
Hi @21763289,
Have you checked if you can do it legally through the AirBnB API? Looks like they do have one:
https://www.airbnb.com/partner?c=tumblr&af=746240
I haven't worked with it, but this might be a good beginning.
All the best,
Rodrigo.
2 -
Nice idea. Thanks!!
1