Remove all HTML labels in a message field

bea11005
bea11005 New Altair Community Member
edited November 5 in Community Q&A

Hi everyone!

I want to delete all HTML labels in a message field, so I could count characters from the message without them with lenght operator.

How can I do it?

 

Tagged:

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    You'll need the Web Mining extension for that. It has the ability to get rid of HTML tags. 

  • bea11005
    bea11005 New Altair Community Member

    I have to remove HTML labels of an attribute of a dataset.

    Which operator should I use?

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Depends on how your data is set up but I would look at the Extract Content, Unescape HTML, or Unescape HTML Document operators .

  • bea11005
    bea11005 New Altair Community Member

    I will try. Can i do it with a regular expressions tha delete everything between <> symbols?

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Yes you can do RegEx. Just use the Replace operator. 

  • bea11005
    bea11005 New Altair Community Member

    What RegEx can I use?

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Without seeing your data, I would guess something like this: \<.*\>

     

    and replace with a space or something else.

  • kayman
    kayman New Altair Community Member

    That's a greedy regex, so that would eat all your tags in one go and leave you with not much content.

     

    Remove tags either with <.*?> (note the question mark that makes it a non greedy regex) or <[^>]+>