Hello,
I have been very impressed with Rapidminer and I can see how powerful it is.
The difficulty I have with it - and I can see this is not unusual - is that I do not have the technical to use it fully, nor do I have even a basic understanding of the terms you use to explain how to use it.
At the moment I am mainly interested in the Web Mining and Text Processing operators.
I have tried using Crawl Web, and my attempt was successful. But of course, if I allow the depth to be more than about 2 I begin to crawl all sorts of sites I am not interested in so I need to restrict it.
Unfortunately I do not understand how to apply the rules.
store_with_matching_url
store_with_matching_content
follow_link_with_matching_url
follow_link_with_matching_text
I have tried playing with each of them but when I do I get no results.
For example, I want to crawl the business listings site
http://www.domainname.comSo I set that as the URL parameter, and I open the Crawling Rules.
I set: follow_link_with_matching_url with the value
http://www.domainname.com because I only want to follow onsite links. But when I do that and press 'Run' it goes to the
http://www.domainname.com address and finishes.
So I tried using the 'set regular expression' dialog box, and added a variety of constructs and shortcuts suggested there. But each time I get no results. I tried all kinds of different arrangements including
http://www.domainname.com* http://www.domainname.com/* and tried the period and most of the others in some arrangement or other. But never got any results for any.
So I am left with using the Crawl Web operator at about 2 or 3 depth, and then extracting the relevant URLs, then searching again on each of them to get to the necessary depth. This is proving very slow and laborious and I am certain that all I need is for some one to say - use this rule = xx or whatever, and I'll be able to use it properly.
I can see this software wasn't created for beginners. It is clear that a user really needs to understand the technical language you use (I've read your manual all the way through and watched the vids I can understand, but in everything there is a basic assumption of knowledge that I simply don't have).
A simple step by step for each of the operators, just to get them functioning, would be really useful here. But first of all, can someone please tell me what rule I need to type so I don't crawl the whole web?
Thanks,