Problem with collecting specific information using RegEx
Hey RapidMiner community,
I have a problem with the use of a RegEx:
I'd like to collect information about the adress of different institutions and companies. For this reason I use the crawl web operator and collect the sites that have the adress information on them. This step is working perfectly. In the next step I want to retrieve the street and the Zipcode + city. For that I use the following RegEx in the "Extract Information" operator:
(.+\s)((D|d|DE)?\-?[6-7][0-9]{4}\s[A-Z][a-z]{1,})
With this RegEx I'd like to collect following:
For example from this site http://www.vfb.de/de/1893/club/service/formales/impressum/
I want "Mercedesstraße 109" and "70372 Stuttgart" as the result.
For the part with the Zipcode (starting with either the number 6 or 7) and the name of the city it is working. Because of that I want to look for the line above that. But as soon as I add the first part (.+\s) to collect the line above the Zipcode and city, the result in the result-section of my process is just a ? (Questionmark). Is there any mistake in my RegEx or does RapidMiner require a special format? Because when I test my RegEx in a free online RegEx-Tester it is working properly...
Thank you!
lukei_11