🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"embedded crawler (websphinx) and RegEx"

User: "tsschmidt"
New Altair Community Member
Updated by Jocelyn
(How) can I use RegEx within that crawler? It did not work...

I tried this several times as follows (see also attachement):
visit_content: ^water$
or
visit_content: \<water\>
or
visit_content: (?s)\<water\>
...

(I don't want waterfall...)

Please don't suggest HTTRACK. As far as I know HTTRACK can not filter the content of pages but only URLs.

[attachment deleted by admin]

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "land"
    New Altair Community Member
    Hi,
    the crawler does not support regular expressions. This are the only condition types are supported to specify which links to follow:
    follow_url A link is only followed, if the target URL contains all terms stated in this parameter.
    link_text A link is only followed, if the link text contains all terms stated in this parameter.

    The conditions that state whether to store a page or not allow for the following expressions:
    visit_url A page is only stored if its URL contains all terms stated in this parameter.
    visit_content A page is only stored if its content contains all terms stated in this parameter.

    Further informations could be found on http://nemoz.org/joomla/content/view/64/53/lang,de/

    Greetings,
      Sebastian