What encoding is this string python?
What do I have to do in Python to figure out which encoding a string has? Show
asked Feb 13, 2011 at 22:27
2 In Python 3, all strings are sequences of Unicode characters. There is a In Python
2, a string may be of type
This does not distinguish "Unicode or ASCII"; it only distinguishes Python types. A Unicode string may consist of purely characters in the ASCII range, and a bytestring may contain ASCII, encoded Unicode, or even non-textual data.
user2357112 241k26 gold badges392 silver badges468 bronze badges answered Feb 13, 2011 at 22:40
Greg HewgillGreg Hewgill 908k177 gold badges1131 silver badges1267 bronze badges 2 How to tell if an object is a unicode string or a byte stringYou can use In Python 2:
In Python 2, In Python 3:
In
Python 3, How to tell if a byte string is valid utf-8 or asciiYou can call
answered Feb 13, 2011 at 22:33
MikelMikel 24.1k8 gold badges63 silver badges66 bronze badges 5 In python 3.x all strings are sequences of Unicode characters. and doing the isinstance check for str (which means unicode string by default) should suffice.
With regards to python 2.x, Most people seem to be using an if statement that has two checks. one for str and one for unicode. If you want to check if you have a 'string-like' object all with one statement though, you can do the following:
answered Sep 9, 2013 at 20:24
ThinkBonoboThinkBonobo 14.3k8 gold badges58 silver badges80 bronze badges 6 Unicode is not an encoding - to quote Kumar McMillan:
Have a read of McMillan's Unicode In Python, Completely Demystified talk from PyCon 2008, it explains things a lot better than most of the related answers on Stack Overflow. answered May 21, 2012 at 14:12
Alex DeanAlex Dean 15.1k11 gold badges61 silver badges73 bronze badges 1 If your code needs to be compatible with both Python 2 and Python 3, you can't directly use things like
There are some ugly workarounds. An extremely ugly one is to compare the name of the type, instead of comparing the type itself. Here's an example:
An arguably slightly less ugly workaround is to check the Python version number, e.g.:
Those are both unpythonic, and most of the time there's probably a better way. answered Aug 14, 2012 at 12:33
Dave BurtonDave Burton 2,79428 silver badges18 bronze badges 4 use:
inside the six library it is represented as:
answered Aug 8, 2016 at 8:50
madjardimadjardi 5,2152 gold badges36 silver badges36 bronze badges 2 Note that on Python 3, it's not really fair to say any of:
Python's Even on Python 3, it's not as simple to answer this question as you might imagine. An obvious way to test for ASCII-compatible strings is by an attempted encode:
The error distinguishes the cases. In Python 3, there are even some strings that contain invalid Unicode code points:
The same method to distinguish them is used. answered Jul 9, 2014 at 2:35
VeedracVeedrac 55.6k14 gold badges108 silver badges165 bronze badges 0 This may help someone else, I started out testing for the string type of the variable s, but for my application, it made more sense to simply return s as utf-8. The process calling return_utf, then knows what it is dealing with and can handle the string appropriately. The code is not pristine, but I intend for it to be Python version agnostic without a version test or importing six. Please comment with improvements to the sample code below to help other people.
answered Dec 23, 2015 at 22:16
1 You could use Universal Encoding Detector, but be aware that it will just give you best guess, not the actual encoding, because it's impossible to know encoding of a string "abc" for example. You will need to get encoding information elsewhere, eg HTTP protocol uses Content-Type header for that.
Tom Morris 3,9592 gold badges24 silver badges42 bronze badges answered Feb 13, 2011 at 22:34
SebSeb 16.4k7 gold badges37 silver badges27 bronze badges 0 In Python-3, I had to understand if string is like
Worked for me, I hope works for someone needed answered Apr 7, 2021 at 16:05
Ali KatkarAli Katkar 4894 silver badges6 bronze badges For py2/py3 compatibility simply use
answered May 28, 2018 at 11:56
Vishvajit PathakVishvajit Pathak 3,0131 gold badge20 silver badges16 bronze badges One simple approach is to check if
answered Sep 18, 2019 at 14:24
duhaimeduhaime 23.8k14 gold badges153 silver badges199 bronze badges What encoding is Python string?By default, Python uses utf-8 encoding.
How do I check if a string is Unicode?You can call decode .. str s are UTFx for any x (eg. UTF8). str s are Unicode.. str s are ordered collections of Unicode characters.. Are Python strings UTFIn Python, Strings are by default in utf-8 format which means each alphabet corresponds to a unique code point.
What is decode (' UTFDecoding UTF-8 Strings in Python
To decode a string encoded in UTF-8 format, we can use the decode() method specified on strings. This method accepts two arguments, encoding and error . encoding accepts the encoding of the string to be decoded, and error decides how to handle errors that arise during decoding.
How do you get the UTFUTF-8 is a variable-length encoding, so I'll assume you really meant "Unicode code point". Use chr() to convert the character code to a character, decode it, and use ord() to get the code point. In Python 2, chr only supports ASCII, so only numbers in the [0.. 255] range.
|