54_Pandas converts DataFrame, Series to dictionary (to_dict)

54_Pandas converts DataFrame, Series to dictionary (to_dict)

pandas.DataFrame, pandas.Series can be converted to a dictionary (dict type object) using the to_dict() method.

For pandas.DataFrame, the parameter orient can be used to specify how the pandas.DataFrame's row label index, column label column and value are assigned to the keys and values ​​of the dictionary.

In case of pandas.Series it is converted to a dictionary with labels as keys.

The following are explained here.

  • pandas.DataFrame to_dict() method
    • Specify the format of the dictionary: Argument orient
    • Conversion to a type other than dict: Argument into
  • Generate a dictionary from any two columns of a pandas.DataFrame
  • pandas.Series to_dict method converts to dict
    • Conversion to a type other than dict: Argument into

Create the following pandas.DataFrame as an example.

import pandas as pd
import pprint
from collections import OrderedDict

df = pd.DataFrame({
    
    'col1': [1, 2, 3], 'col2': ['a', 'x', '啊']},
                  index=['row1', 'row2', 'row3'])

print(df)
#       col1 col2
# row1     1    a
# row2     2    x
# row3     3    啊

It imports pprint to make the output easier to see, and OrderedDict to interpret type specifications via parameters.

pandas.DataFrame to_dict() method

When the to_dict() method is called from a pandas.DataFrame, by default it will be converted to a dictionary (dict type object) as shown below.

d = df.to_dict()

pprint.pprint(d)
# {'col1': {'row1': 1, 'row2': 2, 'row3': 3},
#  'col2': {'row1': 'a', 'row2': 'x', 'row3': '啊'}}

print(type(d))
# <class 'dict'>

Specify the format of the dictionary: Argument orient

Through the parameter orient, you can specify how the pandas.DataFrame row label (row name) index, column label (column name) column, value value is assigned to the dictionary key and value format.

dict

If orient='dict', key is the column label and value is a dictionary of row labels and values. This is the format if the orient parameter is omitted (the default).

{column -> {index -> value}}

d_dict = df.to_dict(orient='dict')

pprint.pprint(d_dict)
# {'col1': {'row1': 1, 'row2': 2, 'row3': 3},
#  'col2': {'row1': 'a', 'row2': 'x', 'row3': '啊'}}

print(d_dict['col1'])
# {'row1': 1, 'row2': 2, 'row3': 3}

print(type(d_dict['col1']))
# <class 'dict'>

list

If orient='list', key is the column label and value is a list of values. Row name information is missing.

{column -> [values]}

d_list = df.to_dict(orient='list')

pprint.pprint(d_list)
# {'col1': [1, 2, 3], 'col2': ['a', 'x', '啊']}

print(d_list['col1'])
# [1, 2, 3]

print(type(d_list['col1']))
# <class 'list'>

series

If orient='series', keys are column labels and values ​​are pandas.Series with row labels and values.

{column -> Series(values)}

d_series = df.to_dict(orient='series')

pprint.pprint(d_series)
# {'col1': row1    1
# row2    2
# row3    3
# Name: col1, dtype: int64,
#  'col2': row1    a
# row2    x
# row3    啊
# Name: col2, dtype: object}

print(d_series['col1'])
# row1    1
# row2    2
# row3    3
# Name: col1, dtype: int64

print(type(d_series['col1']))
# <class 'pandas.core.series.Series'>

split

If orient='split', keys are 'index', 'columns', 'data', and values ​​are row labels, column labels and a list of values.

{index -> [index], columns -> [columns], data -> [values]}

d_split = df.to_dict(orient='split')

pprint.pprint(d_split)
# {'columns': ['col1', 'col2'],
#  'data': [[1, 'a'], [2, 'x'], [3, '啊']],
#  'index': ['row1', 'row2', 'row3']}

print(d_split['columns'])
# ['col1', 'col2']

print(type(d_split['columns']))
# <class 'list'>

records

If orient='records', it will be a list whose elements are dictionaries where key is the column label and value is the value. Row name information is missing.

[{column -> value}, ... , {column -> value}]

l_records = df.to_dict(orient='records')

pprint.pprint(l_records)
# [{'col1': 1, 'col2': 'a'}, {'col1': 2, 'col2': 'x'}, {'col1': 3, 'col2': '啊'}]

print(type(l_records))
# <class 'list'>

print(l_records[0])
# {'col1': 1, 'col2': 'a'}

print(type(l_records[0]))
# <class 'dict'>

index

If orient='index', key is the row label and value is a dictionary of column labels and values.

{index -> {column -> value}}

d_index = df.to_dict(orient='index')

pprint.pprint(d_index)
# {'row1': {'col1': 1, 'col2': 'a'},
#  'row2': {'col1': 2, 'col2': 'x'},
#  'row3': {'col1': 3, 'col2': '啊'}}

print(d_index['row1'])
# {'col1': 1, 'col2': 'a'}

print(type(d_index['row1']))
# <class 'dict'>

Conversion to a type other than dict: Argument into

By specifying a type for the parameter, it can be converted to a subclass, such as OrderedDict, instead of a dictionary (dict type).

The dictionary type stored in the dictionary value value will also be the specified type.

od = df.to_dict(into=OrderedDict)

pprint.pprint(od)
# OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2), ('row3', 3)])),
#              ('col2',
#               OrderedDict([('row1', 'a'), ('row2', 'x'), ('row3', '啊')]))])

print(type(od))
# <class 'collections.OrderedDict'>

print(od['col1'])
# OrderedDict([('row1', 1), ('row2', 2), ('row3', 3)])

print(type(od['col1']))
# <class 'collections.OrderedDict'>

Generate a dictionary from any two columns of a pandas.DataFrame

A dictionary can also be created by selecting any two columns from the index and data columns. Use dict() and zip().

print(df.index)
# Index(['row1', 'row2', 'row3'], dtype='object')

print(df['col1'])
# row1    1
# row2    2
# row3    3
# Name: col1, dtype: int64

d_col = dict(zip(df.index, df['col1']))

print(d_col)
# {'row1': 1, 'row2': 2, 'row3': 3}

pandas.Series to_dict method converts to dict

Take the pandas.Series below as an example.

print(df)
#       col1 col2
# row1     1    a
# row2     2    x
# row3     3    啊

s = df['col1']
print(s)
# row1    1
# row2    2
# row3    3
# Name: col1, dtype: int64

print(type(s))
# <class 'pandas.core.series.Series'>

When you call the to_dict() method on a pandas.Series, a dictionary is created where the labels are the keys and the values ​​are the values.

d = s.to_dict()
print(d)
# {'row1': 1, 'row2': 2, 'row3': 3}

print(type(d))
# <class 'dict'>

Conversion to a type other than dict: Argument into

Even with the to_dict() method of pandas.Series, by specifying the type into in the argument, you can convert it to a subclass such as OrderedDict instead of a dictionary (dict type).

od = df['col1'].to_dict(OrderedDict)
print(od)
# OrderedDict([('row1', 1), ('row2', 2), ('row3', 3)])

print(type(od))
# <class 'collections.OrderedDict'>

Guess you like

Origin blog.csdn.net/qq_18351157/article/details/128018119