"Web Crawl from POST Method form newspaper search engine that uses Javascript var"

thankyourapid-i
thankyourapid-i New Altair Community Member
edited November 2024 in Altair RapidMiner
Dear Rapid-I,

I succeed to populate a database for successive analysis using Your amazing data mining tool ( Rapid Miner ! ).

Now for a scientific research I need to get earthquake related italian article data from a freely available newspaper article archive search engine

http://sitesearch.corriere.it/siteSearchEngine?q=terremoto%20scosse .


Searching for these words: " terremoto scosse" You will find 670 articles.

The pagination system uses a javascript script to generate the pageNumber variable.

The form uses POST Method and hidden inputed variables, instead of GET method web crawling articles.

Maybe for You is a simple question, but I am a newbe in data mining field, so please explain to me how can I proceed.

What Rapid Miner operators have I to use?

How can I set the javascript pageNumber variable to loop the article extraction?

You could also write a new article about Web Crawling from on line data archive search engines that uses POST Method forms and Javascript, because it seems a not trivial topic.

I wait for Your kind answer and wish to Rapid-I a logarithmic success!

Have a good day,
Alex

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.