Web page selection.
ratheesan
New Altair Community Member
Hi,
How can I select the contents of a particular web page using RM.I tried it with crawler,but getting more pages than I specified.
Thanks,
Ratheesan
How can I select the contents of a particular web page using RM.I tried it with crawler,but getting more pages than I specified.
Thanks,
Ratheesan
Tagged:
0
Answers
-
Hi,
the question is unclear. What exactly do you mean by "contents"? Do you want only a specific (list of) web pages? Do you want to extract information from the Web page?
Please specify?
Cheers,
Simon0 -
Hi Simon,
I want to extract information from web page.If I can copy the contents in the web page as a text file,then I will apply text mining algorithms.So now I need to copy the web page in to a text file.
Thanks
Ratheesan.0 -
Hi,
I guess you might change the "max_depth" parameter to zero. The crawler shouldn't then follow any links.
With RapidMiner 5 there will soon be a web mining extension making this more easily.
Greetings,
Sebastian0 -
Hai,
I have tried with the above method and I saved it as a text file. The saved text contains html tags and image url's etc... Is there any way to save only the texts (the text that is seen by a user when he opens a web page).
Thanks,
Ratheesan0 -
Hi,
with 5.0 this would be easy, in 4.x you can only set the TextInput to contenttype html, so that all tags are filtered out.
Greetings,
Sebastian0