"Crawl Web", "Get Page[s]","Extract Content" and document encoding/charset

avk
avk New Altair Community Member
edited November 5 in Altair RapidMiner
Hi all

I crawl web sites in Russian. Some of them return content in UTF-8, other use Windows-1251 encoding. Is there a way to convert retrieved pages to any single (preferably UTF-8) encoding based on Content-Type server headers and META tags in the document?
Tagged: