So I have a large amount of columns within a pandas dataframe and I need to pass groups of them through a function. The function is large but I'll create an example below. I am not sure how to pass the reference of df.varName to the function without getting the issue of the variable not being defined. When I try a function such as:
def bianco2[df, varX, varT]:
stdX = np.std[df.varX]
stdT = np.std[df.varT]
newVar = stdX + stdT
return newVar
I get the error that varX isn't defined. So I wrote the function where I would pass the whole phrase:
def bianco3[varX, varT]:
stdX = np.std[varX]
stdT = np.[varT]
newVar = stdX + stdT
return newVar
Where "varX = df.varX".
This worked but isn't practical for a large number of variables because I would still have to manually update each varX and varT. So I tried creating a list of variables in the format df.varX and then using a for loop to pass the list of variables. The issue is python sees it as a string and not a reference. I looked at using functools.partial but was unsuccessful.
Any ideas on how to write this in a simple format and be able to pass panda columns to a function?
asked Feb 22, 2017 at 21:13
1
You may want to try this ?
def bianco2[df, varX, varT]:
stdX = np.std[df[varX]]
stdT = np.std[df[varT]]
newVar = stdX + stdT
return newVar
print bianco2[df,'Customer','Policy']
input
Policy Customer Employee CoveredDate LapseDate
0 123 1234 1234 2011-06-01 2013-01-01
1 124 1234 1234 2016-01-01 2013-01-01
2 124 5678 5555 2014-01-01 2013-01-01
output
2095.39309492
answered Feb 22, 2017 at 21:20
ShijoShijo
8,6752 gold badges17 silver badges30 bronze badges
1
View Discussion
Improve Article
Save Article
View Discussion
Improve Article
Save Article
In this article, we will learn different ways to apply a function to single or selected columns or rows in Dataframe. We will use Dataframe/series.apply[] method to apply a function.
Syntax: Dataframe/series.apply[func, convert_dtype=True, args=[]]
Parameters: This method will take following parameters :
func: It takes a function and applies it to all values of pandas series.
convert_dtype: Convert dtype as per the function’s operation.
args=[]: Additional arguments to pass to function instead of series.Return Type: Pandas Series after applied function/operation.
Method 1: Using Dataframe.apply[]
and lambda function
.
Example 1: For Column
import
pandas as pd
import
numpy as np
matrix
=
[[
1
,
2
,
3
],
[
4
,
5
,
6
],
[
7
,
8
,
9
]
]
df
=
pd.DataFrame[matrix, columns
=
list
[
'xyz'
],
index
=
list
[
'abc'
]]
new_df
=
df.
apply
[
lambda
x: np.square[x]
if
x.name
=
=
'z'
else
x]
new_df
Output
:
Example 2: For Row.
import
pandas as pd
import
numpy as np
matrix
=
[[
1
,
2
,
3
],
[
4
,
5
,
6
],
[
7
,
8
,
9
]
]
df
=
pd.DataFrame[matrix, columns
=
list
[
'xyz'
],
index
=
list
[
'abc'
]]
new_df
=
df.
apply
[
lambda
x: np.square[x]
if
x.name
=
=
'b'
else
x,
axis
=
1
]
new_df
Output :
Method 2: Using Dataframe/series.apply[]
& [ ] Operator.
Example 1: For Column.
import
pandas as pd
import
numpy as np
matrix
=
[[
1
,
2
,
3
],
[
4
,
5
,
6
],
[
7
,
8
,
9
]
]
df
=
pd.DataFrame[matrix, columns
=
list
[
'xyz'
],
index
=
list
[
'abc'
]]
df[
'z'
]
=
df[
'z'
].
apply
[np.square]
df
Output :
Example 2: For Row.
import
pandas as pd
import
numpy as np
matrix
=
[[
1
,
2
,
3
],
[
4
,
5
,
6
],
[
7
,
8
,
9
]
]
df
=
pd.DataFrame[matrix, columns
=
list
[
'xyz'
],
index
=
list
[
'abc'
]]
df.loc[
'b'
]
=
df.loc[
'b'
].
apply
[np.square]
df
Output :
Method 3: Using numpy.square[]
method and [ ]
operator.
Example 1: For Column
import
pandas as pd
import
numpy as np
matrix
=
[[
1
,
2
,
3
],
[
4
,
5
,
6
],
[
7
,
8
,
9
]
]
df
=
pd.DataFrame[matrix, columns
=
list
[
'xyz'
],
index
=
list
[
'abc'
]]
df[
'z'
]
=
np.square[df[
'z'
]]
print
[df]
Output :
Example 2: For Row.
import
pandas as pd
import
numpy as np
matrix
=
[[
1
,
2
,
3
],
[
4
,
5
,
6
],
[
7
,
8
,
9
]
]
df
=
pd.DataFrame[matrix, columns
=
list
[
'xyz'
], index
=
list
[
'abc'
]]
df.loc[
'b'
]
=
np.square[df.loc[
'b'
]]
df
Output :
We can also apply a function to more than one column or row in the dataframe.
Example 1: For Column
import
pandas as pd
import
numpy as np
matrix
=
[[
1
,
2
,
3
],
[
4
,
5
,
6
],
[
7
,
8
,
9
]
]
df
=
pd.DataFrame[matrix, columns
=
list
[
'xyz'
],
index
=
list
[
'abc'
]]
new_df
=
df.
apply
[
lambda
x: np.square[x]
if
x.name
in
[
'x'
,
'y'
]
else
x]
new_df
Output :
Example 2: For Row.
import
pandas as pd
import
numpy as np
matrix
=
[[
1
,
2
,
3
],
[
4
,
5
,
6
],
[
7
,
8
,
9
]
]
df
=
pd.DataFrame[matrix, columns
=
list
[
'xyz'
],
index
=
list
[
'abc'
]]
new_df
=
df.
apply
[
lambda
x: np.square[x]
if
x.name
in
[
'b'
,
'c'
]
else
x,
axis
=
1
]
new_df
Output
: