I am trying to use a regular expression in the Generate Attributes operator to find a portion of text that contains a date. I want to use the index() function to find the start of the text block containing the date. The text block always looks something like this:
pn - 2013-03-21
and it always starts on a new line and the line ends right after the date.
index(text, "\nad ")
works to find the start of the text block.
However, I'd like to have something more robust that specifies the date format, to make sure I don't pick up any old line of text that starts with "ad ". So I tried:
index(text,"ad.{3}20[0-1][0-9]-[0-9]{2}-[0-9]{2}")
and it finds no match in Rapidminer. But if I use the same expression in Expresso, it does find a match in a text sample like:
Blah blah
Innovation Export
ad - 2013-03-21
pd - 2011-20-32
blah, blah
done
We also tried the same sort of reg exp with the Generate Extract operator and that did not find the matching text either.
What am I doing wrong?