how to exclude words from returns of a regex in a replace with dictionary operator
EL75
New Altair Community Member
hi everybody,
after many hours on internet, I must acknowledge I can't find any solution in order to exclude words from results a REGEX.
I use regular expressions in a spreadsheet connecte to a "replace with dictionary" operator.
Some REGEX capture too many words.
one example:
- the REGEX: (?i)\b(([l|d]['])*ap+(l*i*e*o*|cation)*s*)\b
- returns all words I need (app, applie, application, l'app, d'application, etc.)
- but also "apple", "appel", "l'appel", etc
I failed in my different tries with the "look behind" expression...
it's ok while I split the problem and create two REGEX (one for "app" another for application and of variants of both):
but the goal was to find a smarter way within one REGEX
see example set and regex in this google sheet : https://docs.google.com/spreadsheets/d/14hyPlwrPLxDv-F4yAVOXH8wlN-RMtumnZYOZh1gOOPs/edit?usp=sharing
thanks for your help!
after many hours on internet, I must acknowledge I can't find any solution in order to exclude words from results a REGEX.
I use regular expressions in a spreadsheet connecte to a "replace with dictionary" operator.
Some REGEX capture too many words.
one example:
- the REGEX: (?i)\b(([l|d]['])*ap+(l*i*e*o*|cation)*s*)\b
- returns all words I need (app, applie, application, l'app, d'application, etc.)
- but also "apple", "appel", "l'appel", etc
I failed in my different tries with the "look behind" expression...
it's ok while I split the problem and create two REGEX (one for "app" another for application and of variants of both):
(?i)\b(([l|d][' ]*)*ap+s*)\b |
(?i)\b(([l|d][' ])*ap+l+(i|ie|ic+ation|oc+ation)*s*)\b |
see example set and regex in this google sheet : https://docs.google.com/spreadsheets/d/14hyPlwrPLxDv-F4yAVOXH8wlN-RMtumnZYOZh1gOOPs/edit?usp=sharing
thanks for your help!
0
Best Answer
-
Yeah, you could capture most of these also with some adaptations, like this :
(?i)\b([ld]')?ap+([lie]+)?(cation)?s?\b
but you'll also get again unwanted ones as apple etc.
Anyway, it is always better to have a few simple regex replacements in your dictionary than one overly complex one as the computational requirements are much higher for the latter and it would slow down your process also.
Also here the golden rule remains : Just keep it simple1
Answers
-
Try with this
(?i)\b([ld]')?(ap+([lie]+cations?)?)\b
0 -
Hi kayman,
thank you for your help, you're always on board!
unfortunately, the solution doesn't fit all cases I've put in the excel file.
I need to capture "ap", "aplie" applie, appli, apps.. etc.
people write this word (in french) with so many misspellings...
splitting with two regex still looks better till now.
best,0 -
Yeah, you could capture most of these also with some adaptations, like this :
(?i)\b([ld]')?ap+([lie]+)?(cation)?s?\b
but you'll also get again unwanted ones as apple etc.
Anyway, it is always better to have a few simple regex replacements in your dictionary than one overly complex one as the computational requirements are much higher for the latter and it would slow down your process also.
Also here the golden rule remains : Just keep it simple1