🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Problem with collecting specific information using RegEx

User: "lukei_11"
New Altair Community Member
Updated by Jocelyn

Hey RapidMiner community,

 

I have a problem with the use of a RegEx:

 

I'd like to collect information about the adress of different institutions and companies. For this reason I use the crawl web operator and collect the sites that have the adress information on them. This step is working perfectly. In the next step I want to retrieve the street and the Zipcode + city. For that I use the following RegEx in the "Extract Information" operator:

 

(.+\s)((D|d|DE)?\-?[6-7][0-9]{4}\s[A-Z][a-z]{1,})

 

With this RegEx I'd like to collect following:

For example from this site http://www.vfb.de/de/1893/club/service/formales/impressum/

I want "Mercedesstraße 109" and "70372 Stuttgart" as the result.

 

For the part with the Zipcode (starting with either the number 6 or 7) and the name of the city it is working. Because of that I want to look for the line above that. But as soon as I add the first part (.+\s) to collect the line above the Zipcode and city, the result in the result-section of my process is just a ? (Questionmark). Is there any mistake in my RegEx or does RapidMiner require a special format? Because when I test my RegEx in a free online RegEx-Tester it is working properly...

 

Thank you!

lukei_11

Find more posts tagged with