How do i change the encoding of a csv file in python?

I am trying to create a duplicate CSV without a header. When I attempt this I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 1895: invalid start byte.

I've read the python CSV documentation on Unicode and UTF-8 encoding and have implemented it. However, my output file is being generated with no data in it. Not sure what I am doing wrong here.

import csv

path =  '/Users/johndoe/file.csv'

with open(path, 'r') as infile, open(path + 'final.csv', 'w') as outfile:

    def unicode_csv(infile, outfile):
        inputs = csv.reader(utf_8_encoder(infile))
        output = csv.writer(outfile)

        for index, row in enumerate(inputs):
            yield [unicode(cell, 'utf-8') for cell in row]
            if index == 0:
                 continue
        output.writerow(row)

    def utf_8_encoder(infile):
        for line in infile:
            yield line.encode('utf-8')

unicode_csv(infile, outfile)

asked Sep 4, 2015 at 17:04

user3062459user3062459

1,5376 gold badges26 silver badges36 bronze badges

The solution was to simply include two additional parameters to the

with open(path, 'r') as infile:

The two parameters are encoding ='UTF-8' and errors='ignore'. This allowed me to create a duplicate of original CSV without the headers and without the UnicodeDecodeError. Below is the completed code.

import csv

path =  '/Users/johndoe/file.csv'

with open(path, 'r', encoding='utf-8', errors='ignore') as infile, open(path + 'final.csv', 'w') as outfile:
     inputs = csv.reader(infile)
     output = csv.writer(outfile)

     for index, row in enumerate(inputs):
         # Create file with no header
         if index == 0:
             continue
         output.writerow(row)

answered Sep 5, 2015 at 2:08

user3062459user3062459

1,5376 gold badges26 silver badges36 bronze badges

Since the line

unicode_csv(infile,outfile)

isn't indented, it is out of the scope of the with command, and when it called, then infile and outfile are both closed.

The files should be opened when they are used, not when the functions are defined, so have:

with open(path, 'r') as infile, open(path + 'final.csv', 'w') as outfile:
    unicode_csv(infile,outfile)

answered Sep 4, 2015 at 19:39

How do i change the encoding of a csv file in python?

James KJames K

3,5621 gold badge29 silver badges36 bronze badges

If you are able to use pandas, and you know the exact encoding of your file, you could try this:

import pandas as pd

path =  '/Users/johndoe/file.csv'

df = pd.read_csv(path, encoding='ISO-8859-1')
df.to_csv(path, encoding='utf-8', index=False)

answered May 18, 2020 at 10:37

This article concerns the conversion and handling of CSV file formats in combination with the UTF-8 encoding standard.

💡 The Unicode Transformation Format 8-Bit (UTF-8) is a variable-width character encoding used for electronic communication. UTF-8 can encode more than 1 million (more or less weird) characters using 1 to 4 byte code units. Example UTF-8 characters: ☈,☇,★,☃,☄,☍

UTF-8 is the default encoding standard on Windows, Linux, and macOS.

If you write a CSV file using Python’s standard file handling operations such as open() and file.write(), Python will automatically create a UTF-8 file.

So if you came to this website searching for “CSV to UTF-8”, my guess is that you read a different encoded CSV file format such as ASCII, ANSI, or UTF-16 with some “weird” characters.

Say, you want to read this ANSI file:

How do i change the encoding of a csv file in python?

Now, you can simply convert this to an UTF-8 CSV file via the following approach:

  • CSV to UTF-8 Conversion in Python
  • CSV Reader/Writer – CSV to UTF-8 Conversion
  • Pandas – CSV to UTF-8 Conversion
  • ANSI to UTF-8

CSV to UTF-8 Conversion in Python

The no-library approach to convert a CSV file to a CSV UTF-8 file is to open the first file in the non-UTF-8 format and write its contents back in an UTF-8 file right away. You can use the open() function’s encoding argument to set the encoding of the file to be read.

with open('my_file.csv', 'r', encoding='ANSI', errors='ignore') as infile:
    with open('my_file_utf8.csv', 'w') as outfile:
     outfile.write(infile.read())

After conversion from ANSI to UTF-8 using the given approach, the new CSV file is now UTF-8 formatted:

How do i change the encoding of a csv file in python?

CSV Reader/Writer – CSV to UTF-8 Conversion

You don’t need a CSV reader to convert a CSV to UTF-8 as shown in the previous example. However, if you wish to do so, make sure to pass the encoding argument when opening the file reader used to create the CSV Reader object.

import csv


with open('my_file.csv', 'r', encoding='ANSI', errors='ignore') as infile:
    with open('my_file_utf8.csv', 'w', newline='') as outfile:
        reader = csv.reader(infile)
        writer = csv.writer(outfile)
        for row in reader:
            print(row)
            writer.writerow(row)

The extra newline argument is there to prevent Windows adding an extra newline when writing each row.

The output is the same UTF-8 encoded CSV:

How do i change the encoding of a csv file in python?

Pandas – CSV to UTF-8 Conversion

You can use the pandas.read_csv() and to_csv() functions to read and write a CSV file using various encodings (e.g., UTF-8, ASCII, ANSI, ISO) as defined in the encoding argument of both functions.

Here’s an example:

import pandas as pd


df = pd.read_csv('my_file.csv', encoding='ANSI')
df.to_csv('my_file_utf8.csv', encoding='utf-8', index=False)

ANSI to UTF-8

The no-library approach to convert an ANSI-encoded CSV file to a UTF-8-encoded CSV file is to open the first file in the ANSI format and write its contents back in an UTF-8 file. Use the open() function’s encoding argument to set the encoding of the file to be read.

Here’s an example:

with open('my_file.csv', 'r', encoding='ANSI', errors='ignore') as infile:
    with open('my_file_utf8.csv', 'w') as outfile:
     outfile.write(infile.read())

This converts the following ANSI file to an UTF-8 file:

How do i change the encoding of a csv file in python?

Related Tu

How do i change the encoding of a csv file in python?

While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.

To help students reach higher levels of Python success, he founded the programming education website Finxter.com. He’s author of the popular programming book Python One-Liners (NoStarch 2020), coauthor of the Coffee Break Python series of self-published books, computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.

His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.

How do I change the encoding of a CSV file?

UTF-8 Encoding in Microsoft Excel (Windows).
Open your CSV file in Microsoft Excel..
Click File in the top-left corner of your screen..
Select Save as....
Click the drop-down menu next to File format..
Select CSV UTF-8 (Comma delimited) (. csv) from the drop-down menu..
Click Save..

How do I check the encoding of a CSV file in Python?

The evaluated encoding of the open file will display on the bottom bar, far right side. The encodings supported can be seen by going to Settings -> Preferences -> New Document/Default Directory and looking in the drop down.

How do I fix encoding in Python?

The best way to attack the problem, as with many things in Python, is to be explicit. That means that every string that your code handles needs to be clearly treated as either Unicode or a byte sequence. The most systematic way to accomplish this is to make your code into a Unicode-only clean room.

What is UTF

UTF-8, or "Unicode Transformation Format, 8 Bit" is a marketing operations pro's best friend when it comes to data imports and exports. It refers to how a file's character data is encoded when moving files between systems.