Hello community members,
I am looking for a way to do web crawling. Now I have read in the forums that https websites cannot easily be crawled using the operator "Web Crawl". You would have to use a combination of "get pages" and "loop", like described (from Telconstar) , but I haven't found anything about this approach yet.
I will briefly explain what I want to crawl. I would like to crawl the properties displayed from a german real estate website (immowelt.de).
Typically, the location can be accessed via a link; Room from; Roomto; buy or rent; the order of the sorter:
immowelt.de/liste/muenchen/wohnungen/kaufen?roomi=2&rooma=2&sort=relevanz
The properties displayed are then listed, the link is made up of the constant expose and the ID of the offer, see below:
immowelt.de/projekte/expose/k2rb332
With the "web crawl" operator it would be easy, one would simply give the statement "expose" as a parameter for the crawl
How about "get pages" and "loop"? The ID doesn't count up, I would be very grateful if you could help me.
I wish you and your families a nice weekend
Regards
TB161