"embedded crawler (websphinx) and RegEx"
tsschmidt
New Altair Community Member
(How) can I use RegEx within that crawler? It did not work...
I tried this several times as follows (see also attachement):
visit_content: ^water$
or
visit_content: \<water\>
or
visit_content: (?s)\<water\>
...
(I don't want waterfall...)
Please don't suggest HTTRACK. As far as I know HTTRACK can not filter the content of pages but only URLs.
[attachment deleted by admin]
I tried this several times as follows (see also attachement):
visit_content: ^water$
or
visit_content: \<water\>
or
visit_content: (?s)\<water\>
...
(I don't want waterfall...)
Please don't suggest HTTRACK. As far as I know HTTRACK can not filter the content of pages but only URLs.
[attachment deleted by admin]
Tagged:
0
Answers
-
Hi,
the crawler does not support regular expressions. This are the only condition types are supported to specify which links to follow:
follow_url A link is only followed, if the target URL contains all terms stated in this parameter.
link_text A link is only followed, if the link text contains all terms stated in this parameter.
The conditions that state whether to store a page or not allow for the following expressions:
visit_url A page is only stored if its URL contains all terms stated in this parameter.
visit_content A page is only stored if its content contains all terms stated in this parameter.
Further informations could be found on http://nemoz.org/joomla/content/view/64/53/lang,de/
Greetings,
Sebastian0