"Issues with regular expressions"
IngoRM
New Altair Community Member
Original message from SourceForge forum at http://sourceforge.net/forum/forum.php?thread_id=2034662&;forum_id=390413
The following regular expression works in RapidMiner: '[A-Z][a-z]+', when applied to any text, to extract words that begin with an upper case.
However, if I add any space definition, it does not work. For example: '[A-Z][a-z][ ][A-Z][a-z]+', does not get recognized as a valid regular expression.
The same expressions work well in other regex text editors.
Any ideas on why RapidMiner is not recognizing the space definiton?
Thanks,
FDR
Edit by topic starter:
I found the answer shortly after posting this; spaces seem to be defined by \s as in:
'\s[A-Z][a-z]+\s[A-Z][a-z]+'
The expression above works. However, it does find only the first occurrence of the match. Any ideas on how to get all occurrences?
Answer by Ingo Mierswa:
Hi,
the regular expressions should be the same as they are supported by Java as explained here:
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html
I am not too sure but it might be that capturing groups can help here:
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html#cg
Cheers,
Ingo
The following regular expression works in RapidMiner: '[A-Z][a-z]+', when applied to any text, to extract words that begin with an upper case.
However, if I add any space definition, it does not work. For example: '[A-Z][a-z][ ][A-Z][a-z]+', does not get recognized as a valid regular expression.
The same expressions work well in other regex text editors.
Any ideas on why RapidMiner is not recognizing the space definiton?
Thanks,
FDR
Edit by topic starter:
I found the answer shortly after posting this; spaces seem to be defined by \s as in:
'\s[A-Z][a-z]+\s[A-Z][a-z]+'
The expression above works. However, it does find only the first occurrence of the match. Any ideas on how to get all occurrences?
Answer by Ingo Mierswa:
Hi,
the regular expressions should be the same as they are supported by Java as explained here:
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html
I am not too sure but it might be that capturing groups can help here:
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html#cg
Cheers,
Ingo
0