Need to import a CSV file into Python?
If so, you’ll see the complete steps to import a CSV file into Python using Pandas.
To start, here is a simple template that you may use to import a CSV file into Python:
import pandas as pd df = pd.read_csv [r'Path where the CSV file is stored\File name.csv'] print [df]
Next, you’ll see an example with the steps needed to import your file.
Importing the Data into Python
So let’s begin with a simple example, where you have the following client list and some additional sales information stored in a CSV file [where the file name is ‘Clients‘]:
Person Name | Country | Product | Purchase Price |
Jon | Japan | Computer | $800 |
Bill | US | Tablet | $450 |
Maria | Canada | Printer | $150 |
Rita | Brazil | Laptop | $1,200 |
Jack | UK | Monitor | $300 |
Ron | Spain | Laptop | $1,200 |
Jeff | China | Laptop | $1,200 |
Carrie | Italy | Computer | $800 |
Marry | Peru | Computer | $800 |
Ben | Russia | Printer | $150 |
Step 1: Capture the File Path
Firstly, capture the full path where your CSV file is stored.
For example, let’s suppose that a CSV file is stored under the following path:
C:\Users\Ron\Desktop\Clients.csv
You’ll need to modify the Python code below to reflect the path where the CSV file is stored on your computer. Don’t forget to include the:
- File name [as highlighted in green]. You may choose a different file name, but make sure that the file name specified in the code matches with the actual file name
- File extension [as highlighted in blue]. The file extension should always be ‘.csv’ when importing CSV files
Step 2: Apply the Python code
Type/copy the following code into Python, while making the necessary changes to your path.
Here is the code for our example [you can find additional comments within the code itself]:
import pandas as pd df = pd.read_csv [r'C:\Users\Ron\Desktop\Clients.csv'] #read the csv file [put 'r' before the path string to address any special characters in the path, such as '\']. Don't forget to put the file name at the end of the path + ".csv" print [df]
Step 3: Run the Code
Finally, run the Python code and you’ll get:
Person Name Country Product Purchase Price
0 Jon Japan Computer $800
1 Bill US Tablet $450
2 Maria Canada Printer $150
3 Rita Brazil Laptop $1,200
4 Jack UK Monitor $300
5 Ron Spain Laptop $1,200
6 Jeff China Laptop $1,200
7 Carrie Italy Computer $800
8 Marry Peru Computer $800
9 Ben Russia Printer $150
Optional Step: Select Subset of Columns
Now what if you want to select a subset of columns from the CSV file?
For example, what if you want to select only the Person Name and Country columns. If that’s the case, you can specify those columns names as captured below:
import pandas as pd data = pd.read_csv [r'C:\Users\Ron\Desktop\Clients.csv'] df = pd.DataFrame[data, columns= ['Person Name','Country']] print [df]
You’ll need to make sure that the column names specified in the code exactly match with the column names within the CSV file. Otherwise, you’ll get NaN values.
Once you’re ready, run the code [after adjusting the file path], and you would get only the Person Name and Country columns:
Person Name Country
0 Jon Japan
1 Bill US
2 Maria Canada
3 Rita Brazil
4 Jack UK
5 Ron Spain
6 Jeff China
7 Carrie Italy
8 Marry Peru
9 Ben Russia
Additional Resources
You just saw how to import a CSV file into Python using Pandas. At times, you may need to import Excel files into Python. If that’s the case, you can check the following tutorial that explains how to import an Excel file into Python.
Once you imported your file into Python, you can start calculating some statistics using Pandas. Alternatively, you can easily export Pandas DataFrame into a CSV.
To find out more about using Pandas in order to import a CSV file, please visit the Pandas Documentation.
In this post, we’ll go over how to import a CSV File into Python.
Short Answer
The easiest way to do this :
import pandas as pddf = pd.read_csv ['file_name.csv']
print[df]
If you want to import a subset of columns, simply addusecols=['column_name']
;
pd.read_csv['file_name.csv', usecols= ['column_name1','column_name2']]
If you want to use another separator, simply add sep='\t'
; Default separator is ','
.
pd.read_csv['file_name.csv', sep='\t']
Recap on Pandas DataFrame
Pandas DataFrames is an excel like data structure with labeled axes [rows and columns]. Here is an example of pandas DataFrame that we will use as an example below:
Code to generate DataFrame:
Importing a CSV file into the DataFrame
Pandas read_csv[]
function imports a CSV file to DataFrame format.
Here are some options:
filepath_or_buffer: this is the file name or file path
df.read_csv['file_name.csv’] # relative position
df.read_csv['C:/Users/abc/Desktop/file_name.csv']
header: this allows you to specify which row will be used as column names for your dataframe. Expected an int value or a list of int values.
Default value is header=0
, which means the first row of the CSV file will be treated as column names.
If your file doesn’t have a header, simply set header=None
.
df.read_csv['file_name.csv’, header=None] # no header
The output of no header:
sep: Specify a custom delimiter for the CSV input, the default is a comma.
pd.read_csv['file_name.csv',sep='\t'] # Use Tab to separate
index_col: This is to allow you to set which columns to be used as the index of the dataframe. The default value is None, and pandas will add a new column start from 0 to specify the index column.
It can be set as a column name or column index, which will be used as the index column.
pd.read_csv['file_name.csv',index_col='Name'] # Use 'Name' column as index
nrows: Only read the number of first rows from the file. Needs an int value.
usecols: Specify which columns to import to the dataframe. It can a list of int values or column names.
pd.read_csv['file_name.csv',usecols=[1,2,3]] # Only reads col1, col2, col3. col0 will be ignored.
pd.read_csv['file_name.csv',usecols=['Name']] # Only reads 'Name' column. Other columns will be ignored.
converters: Helps to convert values in the columns by defined functions.
na_values: The default missing values will be NaN. Use this if you want other strings to be considered as NaN. The expected input is a list of strings.
pd.read_csv['file_name.csv',na_values=['a','b']] # a and b values will be treated as NaN after importing into dataframe.