I have an array of objects of this class
class CancerDataEntity[Model]:
age = columns.Text[primary_key=True]
gender = columns.Text[primary_key=True]
cancer = columns.Text[primary_key=True]
deaths = columns.Integer[]
...
When printed, array looks like this
[CancerDataEntity[age=u'80-85+', gender=u'Female', cancer=u'All cancers [C00-97,B21]', deaths=15306], CancerDataEntity[...
I want to convert this to a data frame so I can play with it in a more suitable way to me - to aggregate, count, sum and similar. How I wish this data frame to look, would be something like this:
age gender cancer deaths
0 80-85+ Female ... 15306
1 ...
Is there a way to achieve this using numpy/pandas easily, without manually processing the input array?
asked Jan 25, 2016 at 16:15
A much cleaner way to to this is to define a to_dict
method on your class and then use pandas.DataFrame.from_records
class Signal[object]:
def __init__[self, x, y]:
self.x = x
self.y = y
def to_dict[self]:
return {
'x': self.x,
'y': self.y,
}
e.g.
In [87]: signals = [Signal[3, 9], Signal[4, 16]]
In [88]: pandas.DataFrame.from_records[[s.to_dict[] for s in signals]]
Out[88]:
x y
0 3 9
1 4 16
answered Jan 20, 2017 at 11:10
OregonTrailOregonTrail
8,0666 gold badges40 silver badges56 bronze badges
2
Just use:
DataFrame[[o.__dict__ for o in my_objs]]
Full example:
import pandas as pd
# define some class
class SomeThing:
def __init__[self, x, y]:
self.x, self.y = x, y
# make an array of the class objects
things = [SomeThing[1,2], SomeThing[3,4], SomeThing[4,5]]
# fill dataframe with one row per object, one attribute per column
df = pd.DataFrame[[t.__dict__ for t in things ]]
print[df]
This prints:
x y
0 1 2
1 3 4
2 4 5
answered Mar 4, 2019 at 2:11
Shital ShahShital Shah
58.1k12 gold badges224 silver badges180 bronze badges
2
I would like to emphasize Jim Hunziker's comment.
pandas.DataFrame[[vars[s] for s in signals]]
It is far easier to write, less error-prone and you don't have to change the to_dict[]
function every time you add a new attribute.
If you want the freedom to choose which attributes to keep, the columns parameter could be used.
pandas.DataFrame[[vars[s] for s in signals], columns=['x', 'y']]
The downside is that it won't work for complex attributes, though that should rarely be the case.
answered Aug 6, 2019 at 21:19
typhon04typhon04
1,90021 silver badges22 bronze badges
1
Code that leads to desired result:
variables = arr[0].keys[]
df = pd.DataFrame[[[getattr[i,j] for j in variables] for i in arr], columns = variables]
Thanks to @Serbitar for pointing me to the right direction.
answered Jan 25, 2016 at 20:57
ezamurezamur
1,8952 gold badges21 silver badges39 bronze badges
1
try:
variables = list[array[0].keys[]]
dataframe = pandas.DataFrame[[[getattr[i,j] for j in variables] for i in array], columns = variables]
answered Jan 25, 2016 at 16:26
SerbitarSerbitar
1,99018 silver badges25 bronze badges
2