"Crawl Web", "Get Page[s]","Extract Content" and document encoding/charset
avk
New Altair Community Member
Hi all
I crawl web sites in Russian. Some of them return content in UTF-8, other use Windows-1251 encoding. Is there a way to convert retrieved pages to any single (preferably UTF-8) encoding based on Content-Type server headers and META tags in the document?
I crawl web sites in Russian. Some of them return content in UTF-8, other use Windows-1251 encoding. Is there a way to convert retrieved pages to any single (preferably UTF-8) encoding based on Content-Type server headers and META tags in the document?
Tagged:
0