Need to import a CSV file into Python?
If so, you’ll see the complete steps to import a CSV file into Python using Pandas.
To start, here is a simple template that you may use to import a CSV file into Python:
import pandas as pd df = pd.read_csv [r'Path where the CSV file is stored\File name.csv'] print [df]
Next, you’ll see an example with the steps needed to import your file.
Importing the Data into Python
So let’s begin with a simple example, where you have the following client list and some additional sales information stored in a CSV file [where the file name is ‘Clients‘]:
Person Name | Country | Product | Purchase Price |
Jon | Japan | Computer | $800 |
Bill | US | Tablet | $450 |
Maria | Canada | Printer | $150 |
Rita | Brazil | Laptop | $1,200 |
Jack | UK | Monitor | $300 |
Ron | Spain | Laptop | $1,200 |
Jeff | China | Laptop | $1,200 |
Carrie | Italy | Computer | $800 |
Marry | Peru | Computer | $800 |
Ben | Russia | Printer | $150 |
Step 1: Capture the File Path
Firstly, capture the full path where your CSV file is stored.
For example, let’s suppose that a CSV file is stored under the following path:
C:\Users\Ron\Desktop\Clients.csv
You’ll need to modify the Python code below to reflect the path where the CSV file is stored on your computer. Don’t forget to include the:
- File name [as highlighted in green]. You may choose a different file name, but make sure that the file name specified in the code matches with the actual file name
- File extension [as highlighted in blue]. The file extension should always be ‘.csv’ when importing CSV files
Step 2: Apply the Python code
Type/copy the following code into Python, while making the necessary changes to your path.
Here is the code for our example [you can find additional comments within the code itself]:
import pandas as pd df = pd.read_csv [r'C:\Users\Ron\Desktop\Clients.csv'] #read the csv file [put 'r' before the path string to address any special characters in the path, such as '\']. Don't forget to put the file name at the end of the path + ".csv" print [df]
Step 3: Run the Code
Finally, run the Python code and you’ll get:
Person Name Country Product Purchase Price
0 Jon Japan Computer $800
1 Bill US Tablet $450
2 Maria Canada Printer $150
3 Rita Brazil Laptop $1,200
4 Jack UK Monitor $300
5 Ron Spain Laptop $1,200
6 Jeff China Laptop $1,200
7 Carrie Italy Computer $800
8 Marry Peru Computer $800
9 Ben Russia Printer $150
Optional Step: Select Subset of Columns
Now what if you want to select a subset of columns from the CSV file?
For example, what if you want to select only the Person Name and Country columns. If that’s the case, you can specify those columns names as captured below:
import pandas as pd data = pd.read_csv [r'C:\Users\Ron\Desktop\Clients.csv'] df = pd.DataFrame[data, columns= ['Person Name','Country']] print [df]
You’ll need to make sure that the column names specified in the code exactly match with the column names within the CSV file. Otherwise, you’ll get NaN values.
Once you’re ready, run the code [after adjusting the file path], and you would get only the Person Name and Country columns:
Person Name Country
0 Jon Japan
1 Bill US
2 Maria Canada
3 Rita Brazil
4 Jack UK
5 Ron Spain
6 Jeff China
7 Carrie Italy
8 Marry Peru
9 Ben Russia
Additional Resources
You just saw how to import a CSV file into Python using Pandas. At times, you may need to import Excel files into Python. If that’s the case, you can check the following tutorial that explains how to import an Excel file into Python.
Once you imported your file into Python, you can start calculating some statistics using Pandas. Alternatively, you can easily export Pandas DataFrame into a CSV.
To find out more about using Pandas in order to import a CSV file, please visit the Pandas Documentation.
Summary: in this tutorial, you’ll learn how to read a CSV file in Python using the built-in csv
module.
What is a CSV file
CSV stands for comma-separated values. A CSV file is a delimited text file that uses a comma to separate values.
A CSV file consists of one or more lines. Each line is a data record. And each data record consists of one or more values separated by commas. In addition, all the lines of a CSV file have the same number of values.
Typically, you use a CSV file to store tabular data in plain text. The CSV file format is quite popular and supported by many software applications such as Microsoft Excel and Google Spreadsheet.
Reading a csv file in Python
To read a CSV file in Python, you follow these steps:
First, import the csv module:
Code language: Python [python]
import csv
Second, open the CSV file using the built-in open[] function in the read mode:
Code language: Python [python]
f = open['path/to/csv_file']
If the CSV contains UTF8 characters, you need to specify the encoding like this:
Code language: Python [python]
f = open['path/to/csv_file', encoding='UTF8']
Third, pass the file object [f
] to the reader[]
function of the csv
module. The reader[]
function returns a csv reader object:
Code language: Python [python]
csv_reader = csv.reader[f]
The csv_reader
is an iterable object of lines from
the CSV file. Therefore, you can iterate over the lines of the CSV file using a for
loop:
Code language: Python [python]
for line in csv_reader: print[line]
Each line is a list of values. To access each value, you use the square bracket notation []
. The first value has an index of 0. The second value has an index of 1, and so on.
For example, the following accesses the first value of a particular line:
Code language: Python [python]
line[0]
Finally, always close the file
once you’re no longer access it by calling the close[]
method of the file object:
Code language: Python [python]
f.close[]
It’ll be easier to use the with
statement so that you don’t need to explicitly call the close[]
method.
The following illustrates all the steps for reading a CSV file:
Code language: Python [python]
import csv with open['path/to/csv_file', 'r'] as f: csv_reader = csv.reader[f] for line in csv_reader: # process each line print[line]
Reading a CSV file examples
We’ll use the country.csv
file that contains country information including name, area, 2-letter country code, 3-letter country code:
Download country.csv file
The following shows how to read the country.csv
file and display each line to the screen:
Code language: Python [python]
import csv with open['country.csv', encoding="utf8"] as f: csv_reader = csv.reader[f] for line in csv_reader: print[line]
Output:
Code language: Python [python]
['name', 'area', 'country_code2', 'country_code3'] ['Afghanistan', '652090.00', 'AF', 'AFG'] ['Albania', '28748.00', 'AL', 'ALB'] ['Algeria', '2381741.00', 'DZ', 'DZA'] ['American Samoa', '199.00', 'AS', 'ASM'] ...
The country.csv
has the first line as the header. To separate the header and data, you
use the enumerate[]
function to get the index of each line:
Code language: Python [python]
import csv with open['country.csv', encoding="utf8"] as f: csv_reader = csv.reader[f] for line_no, line in enumerate[csv_reader, 1]: if line_no == 1: print['Header:'] print[line] # header print['Data:'] else: print[line] # data
In this example, we use the enumerate[]
function and specify the index of the first line as 1.
Inside the loop, if the line_no
is 1, the line is the header. Otherwise, it’s a data line.
Another way to skip the header is to use the next[]
function. The next[]
function forwards to the reader to the next line. For example:
Code language: Python [python]
import csv with open['country.csv', encoding="utf8"] as f: csv_reader = csv.reader[f] # skip the first row next[csv_reader] # show the data for line in csv_reader: print[line]
The following reads the country.csv
file and calculate the total areas of all
countries:
Code language: Python [python]
import csv total_area = 0 # calculate the total area of all countries with open['country.csv', encoding="utf8"] as f: csv_reader = csv.reader[f] # skip the header next[csv_reader] # calculate total for line in csv_reader: total_area += float[line[1]] print[total_area]
Output:
Code language: Python [python]
148956306.9
Reading a CSV file using the DictReader class
When you use the csv.reader[]
function, you can access values of the CSV file using the bracket notation such as line[0]
, line[1]
, and so on. However, using the csv.reader[]
function has two main limitations:
- First, the way to access the values from the CSV file is not so obvious. For example, the
line[0]
implicitly means the country name. It would be more expressive if you can access the country name likeline['country_name']
. - Second, when the order of columns from the CSV file is changed or new columns are added, you need to modify the code to get the right data.
This is where the DictReader
class comes into play. The DictReader class also comes from the csv
module.
The DictReader
class allows you to create an object like a regular CSV reader. But it maps the information of each line to a
dictionary [dict
] whose keys are specified by the values of the first line.
By using the DictReader
class, you can access values in the country.csv
file like line['name']
, line['area']
, line['country_code2']
, and line['country_code3']
.
The following example uses the DictReader
class to read the country.csv
file:
Code language: Python [python]
import csv with open['country.csv', encoding="utf8"] as f: csv_reader = csv.DictReader[f] # skip the header next[csv_reader] # show the data for line in csv_reader: print[f"The area of {line['name']} is {line['area']} km2"]
Output:
Code language: Python [python]
The area of Afghanistan is 652090.00 km2 The area of Albania is 28748.00 km2 The area of Algeria is 2381741.00 km2 ...
If you want to have different field names other than the ones specified in the
first line, you can explicitly specify them by passing a list of field names to the DictReader[]
constructor like this:
Code language: Python [python]
import csv fieldnames = ['country_name', 'area', 'code2', 'code3'] with open['country.csv', encoding="utf8"] as f: csv_reader = csv.DictReader[f, fieldnames] next[csv_reader] for line in csv_reader: print[f"The area of {line['country_name']} is {line['area']} km2"]
In this example, instead of using values from the first line as the field names, we explicitly pass a list of field names to the DictReader
constructor.
Summary
- Use
csv.reader[]
function orcsv.DictReader
class to read data from a CSV file.
Did you find this tutorial helpful ?