🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Getting CTA DB error message with web crawler - help!"

User: "michael_crowdes"
New Altair Community Member
Updated by Jocelyn

I'm trying to use the crawl web operator in studio version 7.6.001 and as soon as the crawler starts I get the following in the log "Not able to connect to the CTA DB" and then after a while the operation just times out. I can't figure out what's going on. It seems to be related to these specific operators because I can use others without a problem. Anyone have any ideas?

Find more posts tagged with

Sort by:
1 - 3 of 31
    User: "sgenzer"
    Altair Employee

    hello @michael_crowdes - welcome to the RapidMiner Community.  Could you post your XML in this thread so we can see your process?  Please use the </> tool.

     

    Thanks.


    Scott

     

    User: "michael_crowdes"
    New Altair Community Member
    OP
    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="web:crawl_web_modern" compatibility="7.3.000" expanded="true" height="68" name="Crawl Web" width="90" x="246" y="187">
    <parameter key="url" value="http://parker.com"/>
    <list key="crawling_rules">
    <parameter key="store_with_matching_url" value="*parker.com*"/>
    </list>
    <parameter key="max_crawl_depth" value="3"/>
    <parameter key="retrieve_as_html" value="true"/>
    <parameter key="write_pages_to_disk" value="true"/>
    <parameter key="include_binary_content" value="true"/>
    <parameter key="output_dir" value="C:\Users\533537\Desktop\webPages"/>
    <parameter key="ignore_robot_exclusion" value="true"/>
    </operator>
    <connect from_op="Crawl Web" from_port="example set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
    User: "FBT"
    New Altair Community Member

    Hi Michael,

     

    it seems that you have an error in the regex field of the crawling rules. "*parker.com*" is not a valid expression. What exactly have been your intentions with that expression? Capture everything that contains "parker.com", no matter what elements are before or after that? If so, try this expression (without quotation marks): ".*parker.com.*"