Hướng dẫn python ansi to utf-8
I have found this link but the
asked Jul 17, 2015 at 8:05
Why don't you read the file and write it as UTF-8? You can do that in Python.
answered Jul 17, 2015 at 8:13
3231 silver badge9 bronze badges
I appreciate that this is an old question but having just resolved a similar problem recently I thought I would share my solution.
I had a file being prepared by one program that I needed to import in to an sqlite3 database but the text file was always 'ANSI' and sqlite3 requires UTF-8.
The ANSI encoding is recognised as 'mbcs' in python and therefore the code I have used, ripping off something else I found is:
The below link contains some information on the encoding types that I found on my research
answered Dec 19, 2018 at 17:27
MS Notepad gives the user a choice of 4 encodings, expressed in clumsy confusing terminology:
"Unicode" is UTF-16, written little-endian. "Unicode big endian" is UTF-16, written big-endian. In both UTF-16 cases, this means that the appropriate BOM will be written. Use
"UTF-8" is UTF-8; Notepad explicitly writes a "UTF-8 BOM". Use
"ANSI" is a shocker. This is MS terminology for "whatever the default legacy encoding is on this computer".
Here is a list of Windows encodings that I know of and the languages/scripts that they are used for:
If the file has been created on the computer where it is being read, then you can obtain the "ANSI" encoding by
Be careful using
Putting it all together:
Sample text file, saved with all 4 encoding choices, looks like this in Notepad:
Here is some demo code:
and here is the output when
run in a Windows "Command Prompt" window using the command
Things to be aware of:
(1) "mbcs" is a file-system pseudo-encoding which has no relevance at all to decoding the contents of files. On a system where the default encoding is