Hướng dẫn python ansi to utf-8
I have found this link but the
asked Jul 17, 2015 at 8:05
6 Có thể bạn quan tâmWhy don't you read the file and write it as UTF-8? You can do that in Python.
answered Jul 17, 2015 at 8:13
3Ducker3Ducker 3231 silver badge9 bronze badges 3 I appreciate that this is an old question but having just resolved a similar problem recently I thought I would share my solution. I had a file being prepared by one program that I needed to import in to an sqlite3 database but the text file was always 'ANSI' and sqlite3 requires UTF-8. The ANSI encoding is recognised as 'mbcs' in python and therefore the code I have used, ripping off something else I found is:
The below link contains some information on the encoding types that I found on my research https://docs.python.org/2.4/lib/standard-encodings.html answered Dec 19, 2018 at 17:27
MS Notepad gives the user a choice of 4 encodings, expressed in clumsy confusing terminology: "Unicode" is UTF-16, written little-endian. "Unicode big endian" is UTF-16, written big-endian. In both UTF-16 cases, this means that the appropriate BOM will be written. Use "UTF-8" is UTF-8; Notepad explicitly writes a "UTF-8 BOM". Use "ANSI" is a shocker. This is MS terminology for "whatever the default legacy encoding is on this computer". Here is a list of Windows encodings that I know of and the languages/scripts that they are used for:
If the file has been created on the computer where it is being read, then you can obtain the "ANSI" encoding by Be careful using Putting it all together: Sample text file, saved with all 4 encoding choices, looks like this in Notepad:
Here is some demo code:
and here is the output when
run in a Windows "Command Prompt" window using the command
Things to be aware of: (1) "mbcs" is a file-system pseudo-encoding which has no relevance at all to decoding the contents of files. On a system where the default encoding is
(2) |