Web Crawling for contact directory

Cash · March 2020

I'm trying to crawl this site to create an Excel document containing the the names, locations, phone numbers, and specialty type of individual practitioners on https://www.psychologytoday.com/us/therapists

The link above has links underneath for each state, and each state has about 50 pages or so of contacts. I'm just trying to get the html pulled so I can later pull the contact data out, likely with Tableau Prep. The CSS tags I have from selector gadget are span , h1 , .location-address-phone

This is the operator I'm using, and it's returning absolutely nothing. Can someone please help me figure this out? Thanks!

<?xml version="1.0" encoding="UTF-8"?><process version="9.5.001">

</list>

</operator>

</process>

Telcontar120 · March 2020

Unfortunately the Crawl Web operator doesn't work with https pages (and has several other known problems besides). You can replicate its functionality by using Get Pages and preparing a csv file with the page links you want to store. Since the page links seem to follow a regular pattern you can easily create such a list using Excel or even using RapidMiner. That should enable you to store the data you want (also assuming it is not in violation of that site's T&C of use).

Telcontar120 · March 2020

Unfortunately the Crawl Web operator doesn't work with https pages (and has several other known problems besides). You can replicate its functionality by using Get Pages and preparing a csv file with the page links you want to store. Since the page links seem to follow a regular pattern you can easily create such a list using Excel or even using RapidMiner. That should enable you to store the data you want (also assuming it is not in violation of that site's T&C of use).

Cash · March 2020

Thank you, Brian. That's disappointing to hear. I don't think I'll be able to do this in RM, and I don't really know how to do the process you're referring to. I verified within the T&C's that scraping was okay. I was able to find a different SW that allowed me to scrape the site very easily. So I have the information I was looking for. Thank you again for the response!

Web Crawling for contact directory

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories