Home
Discussions
Community Q&A
Web page selection.
ratheesan
Hi,
How can I select the contents of a particular web page using RM.I tried it with crawler,but getting more pages than I specified.
Thanks,
Ratheesan
Find more posts tagged with
AI Studio
Accepted answers
All comments
fischer
Hi,
the question is unclear. What exactly do you mean by "contents"? Do you want only a specific (list of) web pages? Do you want to extract information from the Web page?
Please specify?
Cheers,
Simon
ratheesan
Hi Simon,
I want to extract information from web page.If I can copy the contents in the web page as a text file,then I will apply text mining algorithms.So now I need to copy the web page in to a text file.
Thanks
Ratheesan.
land
Hi,
I guess you might change the "max_depth" parameter to zero. The crawler shouldn't then follow any links.
With RapidMiner 5 there will soon be a web mining extension making this more easily.
Greetings,
Sebastian
ratheesan
Hai,
I have tried with the above method and I saved it as a text file. The saved text contains html tags and image url's etc... Is there any way to save only the texts (the text that is seen by a user when he opens a web page).
Thanks,
Ratheesan
land
Hi,
with 5.0 this would be easy, in 4.x you can only set the TextInput to contenttype html, so that all tags are filtered out.
Greetings,
Sebastian
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)