Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
"embedded crawler (websphinx) and RegEx"
tsschmidt
(How) can I use RegEx within that crawler? It did not work...
I tried this several times as follows (see also attachement):
visit_content: ^water$
or
visit_content: \<water\>
or
visit_content: (?s)\<water\>
...
(I don't want waterfall...)
Please don't suggest HTTRACK. As far as I know HTTRACK can not filter the content of pages but only URLs.
[attachment deleted by admin]
Find more posts tagged with
AI Studio
Web Mining
RegEx
Accepted answers
All comments
land
Hi,
the crawler does not support regular expressions. This are the only condition types are supported to specify which links to follow:
follow_url A link is only followed, if the target URL contains all terms stated in this parameter.
link_text A link is only followed, if the link text contains all terms stated in this parameter.
The conditions that state whether to store a page or not allow for the following expressions:
visit_url A page is only stored if its URL contains all terms stated in this parameter.
visit_content A page is only stored if its content contains all terms stated in this parameter.
Further informations could be found on
http://nemoz.org/joomla/content/view/64/53/lang,de/
Greetings,
Sebastian
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups