🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Why UTF-8 is not working?

User: "heron_oliveira"
New Altair Community Member
Updated by Jocelyn
Today I converted a pdf to txt, and I'm trying to analyse some therms frequency in the text. Despite the txt is in UTF-8 and I've already changed the program's encoding into the default (SYSTEM) or into 'UTF-8' before tokenizing, generating n_grams, it keeps showing incorrect words. For example, the word should've been 'abrangência' inetead of 'abrangãºncia'.

Find more posts tagged with