I am trying to create a duplicate CSV without a header. When I attempt this I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 1895: invalid start byte.
I've read the python CSV
documentation on Unicode
and UTF-8
encoding and have implemented it. However, my
output file is being generated with no data in it. Not sure what I am doing wrong here.
import csv
path = '/Users/johndoe/file.csv'
with open[path, 'r'] as infile, open[path + 'final.csv', 'w'] as outfile:
def unicode_csv[infile, outfile]:
inputs = csv.reader[utf_8_encoder[infile]]
output = csv.writer[outfile]
for index, row in enumerate[inputs]:
yield [unicode[cell, 'utf-8'] for cell in row]
if index == 0:
continue
output.writerow[row]
def utf_8_encoder[infile]:
for line in infile:
yield line.encode['utf-8']
unicode_csv[infile, outfile]
asked Sep 4, 2015 at 17:04
user3062459user3062459
1,5376 gold badges26 silver badges36 bronze badges
The solution was to simply include two additional parameters to the
with open[path, 'r'] as infile:
The two parameters are encoding ='UTF-8' and errors='ignore'. This allowed me to create a duplicate of original CSV without the headers and without the UnicodeDecodeError. Below is the completed code.
import csv
path = '/Users/johndoe/file.csv'
with open[path, 'r', encoding='utf-8', errors='ignore'] as infile, open[path + 'final.csv', 'w'] as outfile:
inputs = csv.reader[infile]
output = csv.writer[outfile]
for index, row in enumerate[inputs]:
# Create file with no header
if index == 0:
continue
output.writerow[row]
answered Sep 5, 2015 at 2:08
user3062459user3062459
1,5376 gold badges26 silver badges36 bronze badges
Since the line
unicode_csv[infile,outfile]
isn't indented, it is out of the
scope of the with
command, and when it called, then infile and outfile are both closed.
The files should be opened when they are used, not when the functions are defined, so have:
with open[path, 'r'] as infile, open[path + 'final.csv', 'w'] as outfile:
unicode_csv[infile,outfile]
answered Sep 4, 2015 at 19:39
James KJames K
3,5621 gold badge29 silver badges36 bronze badges
If you are able to use pandas, and you know the exact encoding of your file, you could try this:
import pandas as pd
path = '/Users/johndoe/file.csv'
df = pd.read_csv[path, encoding='ISO-8859-1']
df.to_csv[path, encoding='utf-8', index=False]
answered May 18, 2020 at 10:37
This article concerns the conversion and handling of CSV file formats in combination with the UTF-8 encoding standard. 💡 The Unicode Transformation Format 8-Bit
[UTF-8] is a variable-width character encoding used for electronic communication. UTF-8 can encode more than 1 million [more or less weird] characters using 1 to 4 byte code units. Example UTF-8 characters: ☈,☇,★,☃,☄,☍ UTF-8 is the default encoding standard on Windows, Linux, and macOS. If you write a CSV file using Python’s standard file handling operations such as open[] and file.write[], Python will automatically create a UTF-8 file. So if you
came to this website searching for “CSV to UTF-8”, my guess is that you read a different encoded CSV file format such as ASCII, ANSI, or
UTF-16 with some “weird” characters. Say, you want to read this ANSI file:
Now, you can simply convert this to an UTF-8 CSV file via the following approach:
- CSV to UTF-8 Conversion in Python
- CSV Reader/Writer – CSV to UTF-8 Conversion
- Pandas – CSV to UTF-8 Conversion
- ANSI to UTF-8
CSV to UTF-8 Conversion in Python
The no-library approach to convert a CSV file to a CSV
UTF-8 file is to open the first file in the non-UTF-8 format and write its contents back in an UTF-8 file right away. You can use the open[]
function’s encoding
argument to set the encoding of the file to be read.
with open['my_file.csv', 'r', encoding='ANSI', errors='ignore'] as infile: with open['my_file_utf8.csv', 'w'] as outfile: outfile.write[infile.read[]]
After conversion from ANSI to UTF-8 using the given approach, the new CSV file is now UTF-8 formatted:
CSV Reader/Writer – CSV to UTF-8 Conversion
You don’t need a CSV reader to convert a
CSV to UTF-8 as shown in the previous example. However, if you wish to do so, make sure to pass the encoding
argument when opening the file reader used to create the CSV
Reader object.
import csv with open['my_file.csv', 'r', encoding='ANSI', errors='ignore'] as infile: with open['my_file_utf8.csv', 'w', newline=''] as outfile: reader = csv.reader[infile] writer = csv.writer[outfile] for row in reader: print[row] writer.writerow[row]
The extra newline
argument is there to prevent Windows adding an extra newline when writing each row.
The output is the same UTF-8 encoded CSV:
Pandas – CSV to UTF-8 Conversion
You can use the pandas.read_csv[]
and to_csv[]
functions to
read and write a CSV file using various encodings [e.g., UTF-8, ASCII, ANSI, ISO] as defined in the encoding
argument of both functions.
Here’s an example:
import pandas as pd df = pd.read_csv['my_file.csv', encoding='ANSI'] df.to_csv['my_file_utf8.csv', encoding='utf-8', index=False]
ANSI to UTF-8
The no-library approach to convert an ANSI-encoded CSV file to a UTF-8-encoded CSV file is to open the first file in the ANSI format and write its contents back in an UTF-8 file. Use the open[]
function’s encoding
argument to set the encoding of the file
to be read.
Here’s an example:
with open['my_file.csv', 'r', encoding='ANSI', errors='ignore'] as infile: with open['my_file_utf8.csv', 'w'] as outfile: outfile.write[infile.read[]]
This converts the following ANSI file to an UTF-8 file:
Related Tu
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com. He’s author of the popular programming book Python One-Liners [NoStarch 2020], coauthor of the Coffee Break Python series of self-published books, computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.