🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Rapid miner does not display any errors still fails

User: "Kausty88"
New Altair Community Member
Updated by Jocelyn
Hi All,

Though I am new here, I have used out web scrapping software with ease, reason being, I am able to pin point any issue using the logs. Here I am finding it very difficult to get it through. I referred to the video to do a simple scrapping from the site:

http://www.altusinsite.com/index_en.php?page=searchengine&;attri_40_1641=4230&attri_20_11%5B%5D=920&attri_20_11%5B%5D=921&attri_20_11%5B%5D=922&location=Greater+Vancouver+%2F+Downtown+Vancouver&UpdateCompany2=&format=&contact=&attri_40_1740_1=1&attri_40_1740_2=100%2C000&searchbasicbtn1=Find+Space

I hope its a simple web crawler, but I am not getting any error message as well. Can anyone please help me with that? I also dont see the green dot below glowing.

Here is my xml code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <parameter key="logverbosity" value="all"/>
    <parameter key="logfile" value="D:\Rapidminer\Scrape.txt"/>
    <parameter key="parallelize_main_process" value="true"/>
    <process expanded="true" height="145" width="145">
      <operator activated="true" class="web:crawl_web" compatibility="5.2.003" expanded="true" height="60" name="Crawl Web" width="90" x="45" y="75">
        <parameter key="url" value="http://www.altusinsite.com/index_en.php?page=searchengine&amp;amp;attri_40_1641=4230&amp;amp;attri_20_11[]=920&amp;amp;attri_20_11[]=921&amp;amp;attri_20_11[]=922&amp;amp;location=Greater+Vancouver+/+Downtown+Vancouver&amp;amp;UpdateCompany2=&amp;amp;format=&amp;amp;contact=&amp;amp;attri_40_1740_1=1&amp;amp;attri_40_1740_2=100,000&amp;amp;searchbasicbtn1=Find+Space"/>
        <list key="crawling_rules">
          <parameter key="store_with_matching_url" value=".+suiteid.+"/>
          <parameter key="follow_link_with_matching_url" value=".+pagenum.+|.+suiteid.+"/>
        </list>
        <parameter key="output_dir" value="D:\Rapidminer"/>
        <parameter key="extension" value="html"/>
        <parameter key="max_depth" value="1"/>
        <parameter key="delay" value="100"/>
        <parameter key="user_agent" value="Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20100101 Firefox/17.0"/>
        <parameter key="really_ignore_exclusion" value="true"/>
      </operator>
      <connect from_op="Crawl Web" from_port="Example Set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Another issue with JAVA:

Dec 26, 2012 1:07:05 PM WARNING: Operator recommendations unavailable: Failed to access the WSDL at: http://recommender.rapid-i.com:80/OperatorRecommenderService/RecommenderService?wsdl. It failed with:
Network is unreachable: connect.

Find more posts tagged with

Comments

No comments on this post.