how to exclude words from returns of a regex in a replace with dictionary operator

EL75
EL75 New Altair Community Member
edited November 5 in Community Q&A
hi everybody, 
after many hours on internet, I must acknowledge I can't find any solution in order to exclude words from results a REGEX.
I use regular expressions in a spreadsheet connecte to a "replace with dictionary" operator. 
Some REGEX capture too many words.
one example:
- the REGEX: (?i)\b(([l|d]['])*ap+(l*i*e*o*|cation)*s*)\b
- returns all words I need (app, applie, application, l'app, d'application, etc.)
- but also "apple", "appel", "l'appel", etc

I failed in my different tries with the "look behind" expression...
it's ok while I split the problem and create two REGEX (one for "app" another for application and of variants of both):
(?i)\b(([l|d][' ]*)*ap+s*)\b
(?i)\b(([l|d][' ])*ap+l+(i|ie|ic+ation|oc+ation)*s*)\b

 but the goal was to find a smarter way within one REGEX :)

see example set and regex in this google sheet : https://docs.google.com/spreadsheets/d/14hyPlwrPLxDv-F4yAVOXH8wlN-RMtumnZYOZh1gOOPs/edit?usp=sharing

thanks for your help!

Best Answer

  • kayman
    kayman New Altair Community Member
    Answer ✓
    Yeah, you could capture most of these also with some adaptations, like this : 

    (?i)\b([ld]')?ap+([lie]+)?(cation)?s?\b

    but you'll also get again unwanted ones as apple etc.

    Anyway, it is always better to have a few simple regex replacements in your dictionary than one overly complex one as the computational requirements are much higher for the latter and it would slow down your process also.

    Also here the golden rule remains : Just keep it simple

Answers

  • kayman
    kayman New Altair Community Member
    edited December 2020
    Try with this

    (?i)\b([ld]')?(ap+([lie]+cations?)?)\b
  • EL75
    EL75 New Altair Community Member
    Hi kayman,
    thank you for your help, you're always on board!
    unfortunately, the solution doesn't fit all cases I've put in the excel file.
    I need to capture "ap", "aplie" applie, appli, apps.. etc.
    people write this word (in french) with so many misspellings...
    splitting with two regex still looks better till now.
    best,
  • kayman
    kayman New Altair Community Member
    Answer ✓
    Yeah, you could capture most of these also with some adaptations, like this : 

    (?i)\b([ld]')?ap+([lie]+)?(cation)?s?\b

    but you'll also get again unwanted ones as apple etc.

    Anyway, it is always better to have a few simple regex replacements in your dictionary than one overly complex one as the computational requirements are much higher for the latter and it would slow down your process also.

    Also here the golden rule remains : Just keep it simple