Dear RapidMiners,
I am having an issue with the Get Page operator and UTF-8 encoding.
I am scraping the content of this web page:
According to the html code I get out of Get Page, this page uses UTF-8:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
The problem is that for example:
FDA’s turns out as FDAâ
s.
I tried enforcing the right encoding by checking the "override encoding" box in the Get Page operator, but if I do that, I get an error message:
"Encoding 'SYSTEM' is not supported"
Any idea how to solve this (without having to manually search and replace the unwanted characters please!) ?
Many thanks in advance for any kind of input!
Snežana