"Web Crawl from POST Method form newspaper search engine that uses Javascript var"

thankyourapid-i
thankyourapid-i New Altair Community Member
edited November 5 in Altair RapidMiner
Dear Rapid-I,

I succeed to populate a database for successive analysis using Your amazing data mining tool ( Rapid Miner ! ).

Now for a scientific research I need to get earthquake related italian article data from a freely available newspaper article archive search engine

http://sitesearch.corriere.it/siteSearchEngine?q=terremoto%20scosse .


Searching for these words: " terremoto scosse" You will find 670 articles.

The pagination system uses a javascript script to generate the pageNumber variable.

The form uses POST Method and hidden inputed variables, instead of GET method web crawling articles.

Maybe for You is a simple question, but I am a newbe in data mining field, so please explain to me how can I proceed.

What Rapid Miner operators have I to use?

How can I set the javascript pageNumber variable to loop the article extraction?

You could also write a new article about Web Crawling from on line data archive search engines that uses POST Method forms and Javascript, because it seems a not trivial topic.

I wait for Your kind answer and wish to Rapid-I a logarithmic success!

Have a good day,
Alex