Add new column if then in existing column

syazwaan
syazwaan New Altair Community Member
edited November 5 in Community Q&A
I have a dataset something like this:

I want to make a new column if data in column_1 is start with a number and ends with a letter and return 'Complete' or 'Incomplete' in a new column.

Expected output:

Thank you.
Tagged:

Best Answer

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi!

    You could use Generate Attributes for this.
    Determining if a text starts with a number and ends with a letter is the domain of regular expressions. These are usable in different places in RapidMiner, there's even a nice regular expression editor for example in the Replace operator.

    In Generate Attributes, you would use the if() and matches() functions. Here's an example for your use case:

    if(matches(column_1, "^[0-9]+.+[a-zA-Z]$"), "Complete", "Incomplete") 

    The regexp is a bit terrifying in the beginning but it is simple to explain.

    ^    beginning of the string
    [0-9]  character class: Numbers from 0 to 9
    +    at least one of the element before (the character class)
    .+   anything
    [a-zA-Z]   letters (if you need additional characters, enter them after the Z)
    $    end of the string

    So the logic goes: If the value of column_1 matches the regular expression, the if function returns the second argument (Complete), else the third one.

    Regards,
    Balázs

Answers

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi!

    You could use Generate Attributes for this.
    Determining if a text starts with a number and ends with a letter is the domain of regular expressions. These are usable in different places in RapidMiner, there's even a nice regular expression editor for example in the Replace operator.

    In Generate Attributes, you would use the if() and matches() functions. Here's an example for your use case:

    if(matches(column_1, "^[0-9]+.+[a-zA-Z]$"), "Complete", "Incomplete") 

    The regexp is a bit terrifying in the beginning but it is simple to explain.

    ^    beginning of the string
    [0-9]  character class: Numbers from 0 to 9
    +    at least one of the element before (the character class)
    .+   anything
    [a-zA-Z]   letters (if you need additional characters, enter them after the Z)
    $    end of the string

    So the logic goes: If the value of column_1 matches the regular expression, the if function returns the second argument (Complete), else the third one.

    Regards,
    Balázs

  • Telcontar120
    Telcontar120 New Altair Community Member
    You could also use the Generate Copy operator and then use Generate Attributes with PREFIX and SUFFIX functions if you simply want to keep separate attributes for both the first and last character of the raw data (this may be helpful depending on what else you want to do with it or if those codes have some meaning).