"Problem with RapidMiner Crawler"
Hello,
I started using RapidMiner recently for crawling web sites. However, I have been facing some problems with some web sites when I use RM, I really like RapidMiner's performance including the ease with which you can configure to suit your needs, and I want to stick with it, so any help would be appreciated.
Here is a snapshot from my log file
P May 20, 2009 3:29:59 PM: Initialising process setup
P May 20, 2009 3:29:59 PM: [NOTE] No filename given for result file, using stdout for logging results!
P May 20, 2009 3:29:59 PM: Checking properties...
P May 20, 2009 3:29:59 PM: Properties are ok.
P May 20, 2009 3:29:59 PM: Checking process setup...
P May 20, 2009 3:29:59 PM: Inner operators are ok.
P May 20, 2009 3:29:59 PM: Checking i/o classes...
P May 20, 2009 3:29:59 PM: i/o classes are ok. Process output: ExampleSet, NumericalMatrix.
P May 20, 2009 3:29:59 PM: Process ok.
P May 20, 2009 3:29:59 PM: Process initialised
P May 20, 2009 3:29:59 PM: [NOTE] Process starts
P May 20, 2009 3:29:59 PM: Process:
Root[0] (Process)
+- Crawler[0] (Crawler)
G May 20, 2009 3:29:59 PM: [Fatal] ArrayIndexOutOfBoundsException occured in 1st application of Crawler (Crawler)
G May 20, 2009 3:29:59 PM: [Fatal] Process failed: operator cannot be executed (0). Check the log messages...
Root[1] (Process)
here ==> +- Crawler[1] (Crawler)
I started using RapidMiner recently for crawling web sites. However, I have been facing some problems with some web sites when I use RM, I really like RapidMiner's performance including the ease with which you can configure to suit your needs, and I want to stick with it, so any help would be appreciated.
Here is a snapshot from my log file
P May 20, 2009 3:29:59 PM: Initialising process setup
P May 20, 2009 3:29:59 PM: [NOTE] No filename given for result file, using stdout for logging results!
P May 20, 2009 3:29:59 PM: Checking properties...
P May 20, 2009 3:29:59 PM: Properties are ok.
P May 20, 2009 3:29:59 PM: Checking process setup...
P May 20, 2009 3:29:59 PM: Inner operators are ok.
P May 20, 2009 3:29:59 PM: Checking i/o classes...
P May 20, 2009 3:29:59 PM: i/o classes are ok. Process output: ExampleSet, NumericalMatrix.
P May 20, 2009 3:29:59 PM: Process ok.
P May 20, 2009 3:29:59 PM: Process initialised
P May 20, 2009 3:29:59 PM: [NOTE] Process starts
P May 20, 2009 3:29:59 PM: Process:
Root[0] (Process)
+- Crawler[0] (Crawler)
G May 20, 2009 3:29:59 PM: [Fatal] ArrayIndexOutOfBoundsException occured in 1st application of Crawler (Crawler)
G May 20, 2009 3:29:59 PM: [Fatal] Process failed: operator cannot be executed (0). Check the log messages...
Root[1] (Process)
here ==> +- Crawler[1] (Crawler)