Use pandas DataFrame.astype[int]
and DataFrame.apply[] methods to convert a column to int [float/string to integer/int64/int32 dtype] data type. If you are converting float, I believe you would know float is bigger than int type, and converting into int would lose any value after the decimal.
Note that while converting a float to int, it doesn’t do any rounding and flooring and it just truncates the fraction values [anything after .]. In this article, I will explain different ways to convert columns with float values to integer values.
If you are in a hurry, below are some of the quick examples of how to convert column to integer dtype in DataFrame.
# Below are quick examples
# convert "Fee" from String to int
df = df.astype[{'Fee':'int'}]
# Convert all columns to int dtype.
# This returns error in our DataFrame
#df = df.astype['int']
# Convert single column to int dtype.
df['Fee'] = df['Fee'].astype['int']
# convert "Discount" from Float to int
df = df.astype[{'Discount':'int'}]
# Converting Multiple columns to int
df = pd.DataFrame[technologies]
df = df.astype[{"Fee":"int","Discount":"int"}]
# convert "Fee" from float to int and replace NaN values
df['Fee'] = df['Fee'].fillna[0].astype[int]
print[df]
print[df.dtypes]
Now, let’s create a DataFrame with a few rows and columns, execute some examples and validate the results. Our DataFrame contains column names Courses
, Fee
, Duration
and Discount
.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
'Fee' :["22000","25000","23000","24000","26000"],
'Duration':['30days','50days','35days', '40days','55days'],
'Discount':[1000.10,2300.15,1000.5,1200.22,2500.20]
}
df = pd.DataFrame[technologies]
print[df]
print[df.dtypes]
Yields below output. Note that Fee column is string/object hilding integer value and Discount is float64 type.
Courses Fee Duration Discount
0 Spark 22000 30days 1000.10
1 PySpark 25000 50days 2300.15
2 Hadoop 23000 35days 1000.50
3 Python 24000 40days 1200.22
4 Pandas 26000 55days 2500.20
Courses object
Fee object
Duration object
Discount float64
dtype: object
2. Convert Column to int [Integer]
Use pandas DataFrame.astype[] function to convert column to int [integer], you can apply this on a specific column or on an entire DataFrame. To cast the data type to 64-bit signed integer, you can
use numpy.int64
,numpy.int_
, int64
or int
as param. To cast to 32-bit signed integer, use numpy.int32
or int32
.
The Below example converts Fee
column from string dtype to int64
. You can also use numpy.int64
as a param to this method.
# convert "Fee" from String to int
df = df.astype[{'Fee':'int'}]
print[df.dtypes]
Yields below output.
Courses object
Fee int64
Duration object
Discount float64
dtype: object
If you have a DataFrame that has all string columns holiding integer values, you can convert it to int dtype simply using as below. If you have any column that has alpha-numeric values, this returns an error. If you run this on our DataFrame, you will get an error.
# Convert all columns to int dtype.
df = df.astype['int']
You can also use Series.astype[] to convert a specific column. since each column on DataFrame is pandas Series, I will get the column from DataFrame as Series and use astype[]
function. In the below example df.Fee
or df['Fee']
returns
Series object.
# Convert single column to int dtype.
df['Fee'] = df['Fee'].astype['int']
3. Convert Float to Int dtype
Now by using the same approaches using astype[] let’s convert the float column to int [integer] type in pandas DataFrame. Note that while converting a float to int, it doesn’t do any rounding and flooring and it just truncates the fraction values [anything after .].
The below example, converts column Discount
holiding float values to int using DataFrame.astype[] function.
# convert "Discount" from Float to int
df = df.astype[{'Discount':'int'}]
print[df.dtypes]
Yields below output
Courses object
Fee int64
Duration object
Discount int64
dtype: object
Similarly, you can also cast all columns or a single columns. Refer examples for above section for details.
4. Casting Multiple Columns to Integer
You can also convert multiple columns to integer by sending dict of column name -> data type to astype[]
method. The below example converts column Fee
from String to int and Discount
from float to int dtypes.
# Converting Multiple columns to int
df = pd.DataFrame[technologies]
df = df.astype[{"Fee":"int","Discount":"int"}]
print[df.dtypes]
Yields below output.
Courses object
Fee int32
Duration object
Discount int32
dtype: object
5. Using apply[np.int64] to Cast to Integer
You can
also use DataFrame.apply[] method to convert Fee
column from string to integer in pandas. As you see in this example we are using numpy.int64 .
import numpy as np
# convert "Fee" from float to int using DataFrame.apply[np.int64]
df["Fee"] = df["Fee"].apply[np.int64]
print[df.dtypes]
Yields below output.
Courses object
Fee int64
Duration object
Discount float64
dtype: object
6. Convert Column Containing NaNs to astype[int]
In order to demonstrate some NaN/Null
values, let’s create a DataFrame using NaN Values. To convert a column that includes a
mixture of float and NaN values to int, first replace NaN values with zero on pandas DataFrame and then use astype[]
to convert.
import pandas as pd
import numpy as np
technologies= {
'Fee' :[22000.30,25000.40,np.nan,24000.50,26000.10,np.nan]
}
df = pd.DataFrame[technologies]
print[df]
print[df.dtypes]
Use DataFrame.fillna[] to replace the NaN values with integer value zero.
# convert "Fee" from float to int and replace NaN values
df['Fee'] = df['Fee'].fillna[0].astype[int]
print[df]
print[df.dtypes]
Yields below output.
Fee
0 22000
1 25000
2 0
3 24000
4 26000
5 0
Fee int32
dtype: object
Conclusion
In this article, you have learned how to convert column string to int, float to to int using DataFrame.astype[] and DataFrame.apply[] method. Also, you have learned how to convert float and string to integers when you have Nan/null values in a column.
Happy Learning !!
You May Also Like
- How to Convert String to Float in pandas DataFrame
- How to Convert Index to Column in pandas DataFrame.
- How to Replace Nan/Null to Empty String in pandas
References
- //pandas.pydata.org/docs/reference/api/pandas.DataFrame.convert_dtypes.html