🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Regex to extract Twitter handles from post

User: "esboyles"
New Altair Community Member
Updated by Jocelyn

Searching high and low for a regex to use with a Generate Attributes operator to pull @handle from twitter posts.

Anyone have an efficient and accurate expression that can do that?

Note, there might be multiple @handles so I want them all.

Many thanks.

Find more posts tagged with

Sort by:
1 - 3 of 31
    User: "Thomas_Ott"
    New Altair Community Member

    I do all that within Text Processing as part of the tokenization, but if you want to extract to do it on a tweet level I would use one of the Replace type operators with a capturing group. 

    User: "esboyles"
    New Altair Community Member
    OP

    The search continues... trying this in a Generate Attibute operator.  (?:\s|\A)[##]+([A-Za-z0-9-_]+) thinking that I would start with generating hashtags to a store for later analysis.

     

    This checks out on http://www.regexplanet.com/ but when implemented as an expression, RM states a Token recognition error at ?

    Any tips on making this seemingly simple approach work? Does RM impelment the Java Regex specification or something else?

     

    Thanks

     

     

     

    User: "sgenzer"
    Altair Employee

    hi @esboyles - so a quick search here on the community for "regex not working" revealed a nice post by Ingo where he talks about the differences between javascript regex parser vs java regex parser: http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/regex-not-working/m-p/35676

     

    Scott