Python json remove double backslash

Why Is This Happening?

The error is when str is used to convert a bytes object to a str. This does not do the conversion in the desired way.

a = re.findall[r']* href="[[^"]*]"',str[response]]
#                                           ^^^

For example, if the response is the word "Tanıtım", you it would be expressed in UTF-8 as b'Tan\xc4\xb1t\xc4\xb1m'. If you then use str on that, you get:

In [1]: response = b'Tan\xc4\xb1t\xc4\xb1m'

In [2]: str[response]
Out[2]: "b'Tan\\xc4\\xb1t\\xc4\\xb1m'"

If you convert this to JSON, you'll see double backslashes [which are really just ordinary backslashes, encoded as JSON].

In [3]: import json

In [4]: print[json.dumps[str[response]]]
"b'Tan\\xc4\\xb1t\\xc4\\xb1m'"

The correct way to convert a bytes object back to a str is by using the decode method, with the appropriate encoding:

In [5]: response.decode['UTF-8']
Out[5]: 'Tanıtım'

Note that the response is not valid UTF-8, unfortunately. The website operators appear to be serving corrupted data.

Quick Fix

Replace every call to str[response] with response.decode['UTF-8', 'replace'] and update the regular expressions to match.

a = re.findall[
    # "r" prefix to string is unnecessary
    ']* href="[[^"]*]"',
    response.decode['UTF-8', 'replace']]
sub_req = Request['//www.manga-tr.com/'+a[3], 
                  headers=headers]
sub_response = urlopen[sub_req].read[]
manga = {}
manga['manga'] = []
manga_subject = re.findall[
    # "r" prefix to string is unnecessary
    '

Tanıtım

[[^

Chủ Đề