from itertools import dropwhile, takewhile
with open["test.txt"] as f:
dp = dropwhile[lambda x: not x.startswith["-"], f]
next[dp] # skip ----
names = next[dp].split[] # get headers names
next[f] # skip -----
out = []
for line in takewhile[lambda x: not x.startswith["-"], f]:
a, b = line.rsplit[None, 1]
out.append[dict[zip[names, a.split[None, 7] + [b]]]]]
Output:
from pprint import pprint as pp
pp[out]
[{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '0/0',
'Name': 'one',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'},
{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '1/1',
'Name': 'two',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'},
{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '2/2',
'Name': 'three',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'},
{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '3/3',
'Name': 'four',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'}]
If you want to maintain order use an OrderedDict
out = [OrderedDict[zip[names, line.split[]]]
for line in takewhile[lambda x: not x.startswith["-"], f]]
For missing Name values as per your edit:
from itertools import dropwhile, takewhile
with open["test.txt"] as f:
dp = dropwhile[lambda x: not x.startswith["-"], f]
next[dp] # skip ----
names = next[dp].split[] # get headers names
next[f] # skip -----
out = []
for line in takewhile[lambda x: not x.startswith["-"], f]:
a, b = line.rsplit[" ", 1]
out.append[dict[zip[names, a.rstrip[].split[None, 7] + [b.rstrip[]]]]]
Output:
[{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '0/0',
'Name': 'one',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'},
{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '1/1',
'Name': 'two',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'},
{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '2/2',
'Name': 'three',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'},
{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '3/3',
'Name': 'four',
'Size': '1.818 TB',
'State': 'Optl',
'TYPE': 'RAID1',
'sCC': '-'},
{'Access': 'RW',
'Cache': 'RWTD',
'Consist': 'No',
'DG/VD': '4/4',
'Name': '',
'Size': '4.681 TB',
'State': 'Reblg',
'TYPE': 'RAID10',
'sCC': '-'}]
Which will also handle lines with multiple spaces between TB and the Name column value 1.818 TB one
Convert the frame into a dictionary of lists, by columns.
In Python 3.6+ the order of records in the dictionary will be the same as the order of columns in the frame.
Parameters¶
Dictionary with
.ncols
records. Each record represents a single column: the key is the column’s name, and the value is the list with the column’s data.
Examples¶
DT = dt.Frame[A=[1, 2, 3], B=["aye", "nay", "tain"]]
DT.to_dict[]
{"A": [1, 2, 3], "B": ["aye", "nay", "tain"]}
See also¶
.to_list[]
: convert the frame into a list of lists.to_tuples[]
: convert the frame into a list of tuples by rows
Overview:
- A pandas DataFrame can be converted into a Python dictionary using the DataFrame instance method to_dict[]. The output can be specified of various orientations using the parameter orient.
- In dictionary orientation, for each column of the DataFrame the column value is listed against the row label in a dictionary. All these dictionaries are wrapped in another dictionary, which is indexed using column labels. Dictionary orientation is specified with the string literal “dict” for the parameter orient. Dictionary orientation is the default orientation for the conversion output.
- In list orientation, each column is made a list and the lists are added to a dictionary against the column labels. List orientation is specified with the string literal “list” for the parameter orient.
- In series orientation, each column is made a pandas Series, and the series instances are indexed against the row labels in the returned dictionary object. Series orientation is specified with the string literal “series” for the parameter orient.
- In split orientation, each row is made a list and they are wrapped in another list and indexed with the key "data" in the returned dictionary object. The row labels are stored in a list against the key "index". The columns labels are stored in a list against the key "columns". Split orientation is specified with the string literal “split” for the parameter orient.
- In records orientation, each column is made a dictionary where the column elements are stored against the column name. All the dictionaries are returned as a list. Records orientation is specified with the string literal “records” for the parameter orient.
- In index orientation, each column is made a dictionary where the column elements are stored against the column name. All the dictionaries are returned in a dictionary, which is indexed by the row labels. Index orientation is specified with the string literal “index” for the parameter orient.
Example – DataFrame to dictionary conversion in dict mode:
# Example Python program that converts a pandas DataFrame into a Python dictionary
import pandas as pds
# Data
data = [[1,2,3],
[4,5,6],
[7,8,9]];
# Create a DataFrame
dataFrame = pds.DataFrame[data, index=["R1", "R2", "R3"], columns=["C1", "C2", "C3"]];
print["Contents of the DataFrame:"];
print[dataFrame];
# Convert the DataFrame to Series
dictionaryObject = dataFrame.to_dict[];
print["DataFrame as a dictionary:"];
print[dictionaryObject];
Output:
Contents of the DataFrame:
C1 C2 C3
R1 1 2 3
R2 4 5 6
R3 7 8 9
DataFrame as a dictionary:
{'C1': {'R1': 1, 'R2': 4, 'R3': 7}, 'C2': {'R1': 2, 'R2': 5, 'R3': 8}, 'C3': {'R1': 3, 'R2': 6, 'R3': 9}}
Example – DataFrame to dictionary conversion in list mode:
# Example Python program that converts a pandas DataFrame into a
# Python dictionary in list mode
import pandas as pds
# Data
dailyTemperature = {"01/Nov/2019": [65, 62],
"02/Nov/2019": [62, 60],
"03/Nov/2019": [61, 60],
"04/Nov/2019": [62, 60],
"05/Nov/2019": [64, 62]
};
# Create DataFrame
dataFrame = pds.DataFrame[dailyTemperature, index=["max", "min"]];
print["Daily temperature from DataFrame:"];
print[dataFrame];
# Convert the DataFrame to dictionary
dictionaryInstance = dataFrame.to_dict[orient="list"];
print["DataFrame as a dictionary[List orientation]:"];
print[dictionaryInstance];
Output:
Daily temperature from DataFrame:
01/Nov/2019 02/Nov/2019 03/Nov/2019 04/Nov/2019 05/Nov/2019
max 65 62 61 62 64
min 62 60 60 60 62
DataFrame as a dictionary[List orientation]:
{'01/Nov/2019': [65, 62], '02/Nov/2019': [62, 60], '03/Nov/2019': [61, 60], '04/Nov/2019': [62, 60], '05/Nov/2019': [64, 62]}
Example - Making a dictionary of entries from a pandas DataFrame:
# Example Python program that makes a Python dictionary
# containing key-value pairs of
import pandas as pds
# Example Data
fruitCalories = [["Apple", 52, 0.2],
["Orange", 47, 0.1],
["Pineapple", 50,
0.1],
["Avocado", 160, 15.0],
["Kiwi", 61, 0.5]];
columnHeaders= ["Fruit", "Calories", "Fat content"];
# Create a DataFrame
fruitData = pds.DataFrame[data = fruitCalories, columns=columnHeaders];
# Obtain a dictionary with
entries
nutriValsAsDict = fruitData.to_dict[orient='series'];
print["Retrieving individual series from the dictionary:"];
for keys in nutriValsAsDict:
print[nutriValsAsDict[keys]];
print[type[nutriValsAsDict[keys]]];
Output:
Retrieving individual series from the dictionary:
0 Apple
1 Orange
2 Pineapple
3 Avocado
4 Kiwi
Name: Fruit, dtype: object
0 52
1 47
2 50
3 160
4 61
Name: Calories, dtype: int64
0 0.2
1 0.1
2 0.1
3 15.0
4 0.5
Name: Fat content, dtype: float64
Example - Create a dictionary from a DataFrame that stores index, columns, data as separate entries:
# Example Python program that creates a dictionary
# from a DataFrame which will have index, column labels
# and data as separate entries
import pandas as pds
riverLengths = [["Nile", "6650", 4130],
["Amazon", 6400, 3976],
["Yangtze", 6300, 3917],
["Mississippi", 6275, 3902],
["Yenisei", 5539, 3445]
];
columns = ["Name of the River", "Length[KMs]", "Length[Miles]"];
print["DataFrame:"];
riverData = pds.DataFrame[data = riverLengths, columns = columns];
print[riverData];
print["DataFrame as a dictionary with separate entries for index, column labels and data:"];
riverDataDict = riverData.to_dict[orient="split"];
print[riverDataDict];
Output:
DataFrame:
Name of the River Length[KMs] Length[Miles]
0 Nile 6650 4130
1 Amazon 6400 3976
2 Yangtze 6300 3917
3 Mississippi 6275 3902
4 Yenisei 5539 3445
DataFrame as a dictionary with separate entries for index, column labels and data:
{'index': [0, 1, 2, 3, 4], 'columns': ['Name of the River', 'Length[KMs]', 'Length[Miles]'], 'data': [['Nile', '6650', 4130], ['Amazon', 6400, 3976], ['Yangtze', 6300, 3917], ['Mississippi', 6275, 3902], ['Yenisei', 5539, 3445]]}
Example - DataFrame records stored as :
# Example Python program that creates a dictionary of
# dictionaries from a pandas DataFrame.
#
Returned dictionary stores key-value pairs in the
# form of
import pandas as pds
# Data
countryData = [["Russia", "Moscow", 6601670, 146171015],
["Canada", "Ottawa", 3855100, 38048738],
["China", "Beijing", 3705407, 1400050000],
["United States of America", "Washington, D.C.", 3796742, 331449281],
["Brazil", "Brasília", 3287956, 210147125]]
columnHeaders = ["Country", "Capital", "Area[Sq.Miles]", "Population"];
# Create a pandas DataFrame
df = pds.DataFrame[data = countryData, columns=columnHeaders];
print["DataFrame:"];
print[df];
# Obtain
data in the form of a dictionary of dictionaries
print["DataFrame in records form :"];
recs = df.to_dict[orient="index"];
print[recs];
Output:
DataFrame:
Country Capital Area[Sq.Miles] Population
0 Russia Moscow 6601670 146171015
1 Canada Ottawa 3855100 38048738
2 China Beijing 3705407 1400050000
3 United States of America Washington, D.C. 3796742 331449281
4 Brazil Brasília 3287956 210147125
DataFrame in records form :
{0: {'Country': 'Russia', 'Capital': 'Moscow', 'Area[Sq.Miles]': 6601670, 'Population': 146171015}, 1: {'Country': 'Canada', 'Capital': 'Ottawa', 'Area[Sq.Miles]': 3855100, 'Population': 38048738}, 2: {'Country': 'China', 'Capital': 'Beijing', 'Area[Sq.Miles]': 3705407, 'Population': 1400050000}, 3: {'Country': 'United States of America', 'Capital': 'Washington, D.C.', 'Area[Sq.Miles]': 3796742, 'Population': 331449281}, 4: {'Country': 'Brazil', 'Capital': 'Brasília', 'Area[Sq.Miles]': 3287956, 'Population': 210147125}}