Why Is This Happening?
The error is when str
is used to convert a bytes
object to a str
. This does not do the conversion in the desired way.
a = re.findall[r']* href="[[^"]*]"',str[response]]
# ^^^
For example, if the response is the word "Tanıtım", you it would be expressed in UTF-8 as b'Tan\xc4\xb1t\xc4\xb1m'
. If you then use str
on that, you get:
In [1]: response = b'Tan\xc4\xb1t\xc4\xb1m'
In [2]: str[response]
Out[2]: "b'Tan\\xc4\\xb1t\\xc4\\xb1m'"
If you convert this to JSON, you'll see double backslashes [which are really just ordinary backslashes, encoded as JSON].
In [3]: import json
In [4]: print[json.dumps[str[response]]]
"b'Tan\\xc4\\xb1t\\xc4\\xb1m'"
The
correct way to convert a bytes
object back to a str
is by using the decode
method, with the appropriate encoding:
In [5]: response.decode['UTF-8']
Out[5]: 'Tanıtım'
Note that the response is not valid UTF-8, unfortunately. The website operators appear to be serving corrupted data.
Quick Fix
Replace every call to str[response]
with response.decode['UTF-8', 'replace']
and update the regular expressions to match.
a = re.findall[
# "r" prefix to string is unnecessary
']* href="[[^"]*]"',
response.decode['UTF-8', 'replace']]
sub_req = Request['//www.manga-tr.com/'+a[3],
headers=headers]
sub_response = urlopen[sub_req].read[]
manga = {}
manga['manga'] = []
manga_subject = re.findall[
# "r" prefix to string is unnecessary
'Tanıtım
[[^