Remove all HTML labels in a message field
Hi everyone!
I want to delete all HTML labels in a message field, so I could count characters from the message without them with lenght operator.
How can I do it?
Answers
-
You'll need the Web Mining extension for that. It has the ability to get rid of HTML tags.
1 -
I have to remove HTML labels of an attribute of a dataset.
Which operator should I use?
0 -
Depends on how your data is set up but I would look at the Extract Content, Unescape HTML, or Unescape HTML Document operators .
1 -
I will try. Can i do it with a regular expressions tha delete everything between <> symbols?
0 -
Yes you can do RegEx. Just use the Replace operator.
1 -
What RegEx can I use?
0 -
Without seeing your data, I would guess something like this: \<.*\>
and replace with a space or something else.
1 -
That's a greedy regex, so that would eat all your tags in one go and leave you with not much content.
Remove tags either with <.*?> (note the question mark that makes it a non greedy regex) or <[^>]+>
2