[SOLVED] Splitting nominal attribute values by unparenthesized commas

tennenrishin
tennenrishin New Altair Community Member
edited November 5 in Community Q&A
Hi

I would like to split a nominal attribute into multiple attributes. The nominal values need to be split by all the internal commas, except for those commas that are inside parentheses. The same way one would split a function argument list into the arguments (which may themselves contain function calls).

Does anyone have any ideas for what regex I could use to match those commas, or any other way to perform this split?
Tagged:

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    Hi, you can't model recursive structures with regular expressions - if you have a fixed depth, however, creating a regular expression will be possible.
  • tennenrishin
    tennenrishin New Altair Community Member
    Thank you.

    I'm thinking of...
    1. Replacing all commas and parentheses within substrings that match \([^\(\)]*\) by respective special tokens.
    2. Repeating step 1 until there are no more parentheses (or simply for max_depth number of times)
    3. Splitting by commas
    4. Replacing those special tokens back with their original characters again.

    But step 1 requires a capability to search and replace within all substrings that match some given regex. Is there a way to do this?

    Any help appreciated.
  • tennenrishin
    tennenrishin New Altair Community Member
    Disregarding my last post, I'm now trying this regex
    ,(?!([^\(\)]*\(([^\(\)]*\(([^\(\)]*\([^\(\)]*\))*[^\(\)]*\))*[^\(\)]*\))*[^\(\)]*\))
    with the assumption that nesting does not exceed a depth of 3 levels.

    It seems to be working but of course it is not easy to test comprehensively. Can you spot any obvious mistakes? Is it unnecessarily complicated?

    Here is a more readable version:

    ,(?!
    (
    [^\(\)]*
    \(
    (
    [^\(\)]*
    \(
    (
    [^\(\)]*
    \(
    [^\(\)]*
    \)
    )*
    [^\(\)]*
    \)
    )*
    [^\(\)]*
    \)
    )*
    [^\(\)]*
    \)
    )
  • MariusHelf
    MariusHelf New Altair Community Member
    Quite possible that it works like this - if it works, then it works ;) Maybe you can simplify the expression itself, if you add some process logic like loops and Branches around it, as you proposed in your previous post.

    Best,
    Marius