A program to recognize and reward our most engaged community members
I can scrape in python, but how do download and store hyperlinked pdf or other files in their native format using RapidMiner?
Is the "Open File" operator not doing what you want? It allows you to get files from any URL or file path and have them as a file object, which can then be stored. If you have multiple files then you can use macros and put this in a loop.
If you want to scrape actual web pages, then use "Get Page" or "Get Pages" instead.
hello @gary_molloy - if you use the "Crawl Web" operator (Web Mining extension), there is an option to "write pages to disk". This will save the PDFs like normal. I have done this many times.
Scott