In this post, we’ll go over how to import a CSV File into Python.
Short Answer
The easiest way to do this :
import pandas as pddf = pd.read_csv ['file_name.csv']
print[df]
If you want to import a subset of columns, simply addusecols=['column_name']
;
pd.read_csv['file_name.csv', usecols= ['column_name1','column_name2']]
If you want to use another separator, simply add sep='\t'
; Default separator is ','
.
pd.read_csv['file_name.csv', sep='\t']
Recap on Pandas DataFrame
Pandas DataFrames is an excel like data structure with labeled axes [rows and columns]. Here is an example of pandas DataFrame that we will use as an example below:
Code to generate DataFrame:
Importing a CSV file into the DataFrame
Pandas read_csv[]
function imports a CSV file to DataFrame format.
Here are some options:
filepath_or_buffer: this is the file name or file path
df.read_csv['file_name.csv’] # relative position
df.read_csv['C:/Users/abc/Desktop/file_name.csv']
header: this allows you to specify which row will be used as column names for your dataframe. Expected an int value or a list of int values.
Default value is header=0
, which means the first row of the CSV file will be treated as column names.
If your file doesn’t have a header, simply set header=None
.
df.read_csv['file_name.csv’, header=None] # no header
The output of no header:
sep: Specify a custom delimiter for the CSV input, the default is a comma.
pd.read_csv['file_name.csv',sep='\t'] # Use Tab to separate
index_col: This is to allow you to set which columns to be used as the index of the dataframe. The default value is None, and pandas will add a new column start from 0 to specify the index column.
It can be set as a column name or column index, which will be used as the index column.
pd.read_csv['file_name.csv',index_col='Name'] # Use 'Name' column as index
nrows: Only read the number of first rows from the file. Needs an int value.
usecols: Specify which columns to import to the dataframe. It can a list of int values or column names.
pd.read_csv['file_name.csv',usecols=[1,2,3]] # Only reads col1, col2, col3. col0 will be ignored.
pd.read_csv['file_name.csv',usecols=['Name']] # Only reads 'Name' column. Other columns will be ignored.
converters: Helps to convert values in the columns by defined functions.
na_values: The default missing values will be NaN. Use this if you want other strings to be considered as NaN. The expected input is a list of strings.
pd.read_csv['file_name.csv',na_values=['a','b']] # a and b values will be treated as NaN after importing into dataframe.
To access data from the CSV file, we require a function read_csv[] that retrieves data in the form of the Dataframe.
Syntax of read_csv[]
Syntax: pd.read_csv[filepath_or_buffer, sep=’ ,’ , header=’infer’, index_col=None, usecols=None, engine=None, skiprows=None, nrows=None]
Parameters:
- filepath_or_buffer: It is the location of the file which is to be retrieved using this function. It accepts any string path or URL of the file.
- sep: It stands for separator, default is ‘, ‘ as in CSV[comma separated values].
- header: It accepts int, a list of int, row numbers to use as the column names, and the start of the data. If no names are passed, i.e., header=None, then, it will display the first column as 0, the second as 1, and so on.
- usecols: It is used to retrieve only selected columns from the CSV file.
- nrows: It means a number of rows to be displayed from the dataset.
- index_col: If None, there are no index numbers displayed along with records.
- skiprows: Skips passed rows in the new data frame.
Read CSV using Pandas read_csv
Before using this function, we must import the Pandas library, we will load the CSV file.
PYTHON3
import
pandas as pd
pd.read_csv[
"example1.csv"
]
Output:
Example 1: Using sep in read_csv[]
In this example, we will manipulate our existing CSV file and then add some special characters to see how the sep parameter works.
Python3
import
pandas as pd
df
=
pd.read_csv[
'headbrain1.csv'
,
sep
=
'[:, |_]'
,
engine
=
'python'
]
df
Output:
Example 2: Using usecols in read_csv[]
Here, we are specifying only 3 columns,i.e.[“tip”, “sex”, “time”] to load and we use the header 0 as its default header.
Python3
df
=
pd.read_csv[
'example1.csv'
,
header
=
0
,
usecols
=
[
"tip"
,
"sex"
,
"time"
]]
df
Output:
Example 3: Using index_col in read_csv[]
Here, we use the “sex” index first and then the “tip” index, we can simply reindex the header with index_col parameter.
Python3
df
=
pd.read_csv[
'example1.csv'
,
header
=
0
,
index_col
=
[
"sex"
,
"tip"
],
usecols
=
[
"tip"
,
"sex"
,
"time"
]]
df
Output:
Example 4: Using nrows in read_csv[]
Here, we just display only 5 rows using nrows parameter.
Python3
df
=
pd.read_csv[
'example1.csv'
,
header
=
0
,
index_col
=
[
"tip"
,
"sex"
],
usecols
=
[
"tip"
,
"sex"
,
"time"
],
nrows
=
5
]
df
Output:
Example 5: Using skiprows in read_csv[]
The skiprows help to skip some rows in CSV, i.e, here you will observe that the upper row and the last row from the original CSV data have been skipped.
Python3
pd.read_csv[
"example1.csv"
, skiprows
=
[
1
,
12
]]
Output: