Table of contents
5. Assign nested dictionaries to DataFrame
foreword
Continue with the previous section. Past content is as follows:
1. DataFrame
Rows can also be selected by location or special attribute loc, and the column reference can directly indicate the name of the column
1. Column selection
A column in a DataFrame can be retrieved as a series as a dictionary-like tag or attribute:
Example:
import pandas as pd
import numpy as np
data = {'state': ['Astrilia', 'Mexico', 'China', 'Japan'],
'years': [2000, 2001, 2002, 2003],
'pop': [1.5, 3.6, 2.4, 5.1]}
frame = pd.DataFrame(data, columns = ['years', 'state', 'pop'])
val = pd.Series([-1.2, -1.5, -1.7])
frame['debt'] = val
val_1 = pd.Series([100, 200, 300], index = [0, 1, 3])
frame['pofit'] = val_1
print(frame)
frame_1 = frame['state']
print(frame_1)
frame_2 = frame.state
print(frame_2)
#可以说,frame['state']和frame.state是等价的
result:
years state pop debt pofit
0 2000 Astrilia 1.5 -1.2 100.0
1 2001 Mexico 3.6 -1.5 200.0
2 2002 China 2.4 -1.7 NaN
3 2003 Japan 5.1 NaN 300.0
0 Astrilia
1 Mexico
2 China
3 Japan
Name: state, dtype: object
0 Astrilia
1 Mexico
2 China
3 Japan
Name: state, dtype: object
2. Selection of rows
Select by special attribute loc:
Example:
import pandas as pd
import numpy as np
data = {'state': ['Astrilia', 'Mexico', 'China', 'Japan'],
'years': [2000, 2001, 2002, 2003],
'pop': [1.5, 3.6, 2.4, 5.1]}
frame = pd.DataFrame(data, columns = ['years', 'state', 'pop'])
val = pd.Series([-1.2, -1.5, -1.7])
frame['debt'] = val
val_1 = pd.Series([100, 200, 300], index = [0, 1, 3])
frame['pofit'] = val_1
print(frame)
#当行为默认的索引标签时
frame_1row = frame.loc[1]
print(frame_1row)
#当行有自己设定的索引标签时
frame_label = pd.DataFrame(data, columns = ['years', 'state', 'pop'], index = ['one', 'two', 'three', 'four'])
print(frame_label)
frame_label_row = frame_label.loc['two']
print(frame_label_row)
result:
years state pop debt pofit
0 2000 Astrilia 1.5 -1.2 100.0
1 2001 Mexico 3.6 -1.5 200.0
2 2002 China 2.4 -1.7 NaN
3 2003 Japan 5.1 NaN 300.0
years 2001
state Mexico
pop 3.6
debt -1.5
pofit 200.0
Name: 1, dtype: object
years state pop
one 2000 Astrilia 1.5
two 2001 Mexico 3.6
three 2002 China 2.4
four 2003 Japan 5.1
years 2001
state Mexico
pop 3.6
Name: two, dtype: object
3. Column modification
Column references can be modified. For example an empty 'debt' column can be assigned a scalar value or an array of values.
Example:
import pandas as pd
import numpy as np
data = {'state': ['Astrilia', 'Mexico', 'China'],
'years': [2000, 2001, 2002],
'pop': [1.5, 3.6, 2.4]}
frame = pd.DataFrame(data, columns = ['years', 'state', 'pop'])
print(frame)
frame['debt'] = 16.2
print(frame)
frame['pofit'] = np.random.randint(100, 200, size = 3)
print(frame)
result:
years state pop
0 2000 Astrilia 1.5
1 2001 Mexico 3.6
2 2002 China 2.4
years state pop debt
0 2000 Astrilia 1.5 16.2
1 2001 Mexico 3.6 16.2
2 2002 China 2.4 16.2
years state pop debt pofit
0 2000 Astrilia 1.5 16.2 192
1 2001 Mexico 3.6 16.2 138
2 2002 China 2.4 16.2 140
When assigning a list or array to a column, the length of the value must match the length of the DataFrame.
Example:
import pandas as pd
import numpy as np
data = {'state': ['Astrilia', 'Mexico', 'China', 'Japan'],
'years': [2000, 2001, 2002, 2003],
'pop': [1.5, 3.6, 2.4, 5.1]}
frame = pd.DataFrame(data, columns = ['years', 'state', 'pop'])
print(frame)
val = pd.Series([-1.2, -1.5, -1.7])
frame['debt'] = val
print(frame)
val_1 = pd.Series([100, 200, 300], index = [0, 1, 3])
frame['pofit'] = val_1
print(frame)
result:
years state pop
0 2000 Astrilia 1.5
1 2001 Mexico 3.6
2 2002 China 2.4
3 2003 Japan 5.1
years state pop debt
0 2000 Astrilia 1.5 -1.2
1 2001 Mexico 3.6 -1.5
2 2002 China 2.4 -1.7
3 2003 Japan 5.1 NaN
years state pop debt pofit
0 2000 Astrilia 1.5 -1.2 100.0
1 2001 Mexico 3.6 -1.5 200.0
2 2002 China 2.4 -1.7 NaN
3 2003 Japan 5.1 NaN 300.0
If the copied column does not exist, a new one will be generated:
Example:
import pandas as pd
import numpy as np
data = {'state': ['Astrilia', 'Mexico', 'China', 'Japan'],
'years': [2000, 2001, 2002, 2003],
'pop': [1.5, 3.6, 2.4, 5.1]}
frame = pd.DataFrame(data, columns = ['years', 'state', 'pop'])
val = pd.Series([-1.2, -1.5, -1.7])
frame['debt'] = val
val_1 = pd.Series([100, 200, 300], index = [0, 1, 3])
frame['pofit'] = val_1
print(frame)
#给新的一列赋值
frame['date'] = np.random.randint(1, 10, size = 4)
print(frame)
result:
years state pop debt pofit
0 2000 Astrilia 1.5 -1.2 100.0
1 2001 Mexico 3.6 -1.5 200.0
2 2002 China 2.4 -1.7 NaN
3 2003 Japan 5.1 NaN 300.0
years state pop debt pofit date
0 2000 Astrilia 1.5 -1.2 100.0 7
1 2001 Mexico 3.6 -1.5 200.0 1
2 2002 China 2.4 -1.7 NaN 8
3 2003 Japan 5.1 NaN 300.0 4
4. Column deletion
use the del function
Example: First add a column consisting of boolean values:
import pandas as pd
import numpy as np
data = {'state': ['Astrilia', 'Mexico', 'China', 'Mexico'],
'years': [2000, 2001, 2002, 2003],
'pop': [1.5, 3.6, 2.4, 5.1]}
frame = pd.DataFrame(data, columns = ['years', 'state', 'pop'])
val = pd.Series([-1.2, -1.5, -1.7])
frame['debt'] = val
val_1 = pd.Series([100, 200, 300], index = [0, 1, 3])
frame['pofit'] = val_1
print(frame)
'''现在构建一个布尔值组成的数组,如果state == Mexico,则在FT列输出T,否则为F'''
#方法1
frame['TF'] = frame.state == 'Mexico'
print(frame)
print(frame.TF[0])
#方法2
Buer = []
for i in range(4):
Buer.append(frame.state[i] == 'Mexico')
frame['tf'] = Buer
print(frame)
result:
years state pop debt pofit
0 2000 Astrilia 1.5 -1.2 100.0
1 2001 Mexico 3.6 -1.5 200.0
2 2002 China 2.4 -1.7 NaN
3 2003 Mexico 5.1 NaN 300.0
years state pop debt pofit TF
0 2000 Astrilia 1.5 -1.2 100.0 False
1 2001 Mexico 3.6 -1.5 200.0 True
2 2002 China 2.4 -1.7 NaN False
3 2003 Mexico 5.1 NaN 300.0 True
False
years state pop debt pofit TF tf
0 2000 Astrilia 1.5 -1.2 100.0 False False
1 2001 Mexico 3.6 -1.5 200.0 True True
2 2002 China 2.4 -1.7 NaN False False
3 2003 Mexico 5.1 NaN 300.0 True True
进程已结束,退出代码0
Then delete the TF column:
import pandas as pd
import numpy as np
data = {'state': ['Astrilia', 'Mexico', 'China', 'Mexico'],
'years': [2000, 2001, 2002, 2003],
'pop': [1.5, 3.6, 2.4, 5.1]}
frame = pd.DataFrame(data, columns = ['years', 'state', 'pop'])
val = pd.Series([-1.2, -1.5, -1.7])
frame['debt'] = val
val_1 = pd.Series([100, 200, 300], index = [0, 1, 3])
frame['pofit'] = val_1
print(frame)
'''现在构建一个布尔值组成的数组,如果state == Mexico,则在FT列输出T,否则为F'''
#构建一个新的列
frame['TF'] = frame.state == 'Mexico'
print(frame)
#删除该列
del frame['TF']
print(frame)
result:
years state pop debt pofit
0 2000 Astrilia 1.5 -1.2 100.0
1 2001 Mexico 3.6 -1.5 200.0
2 2002 China 2.4 -1.7 NaN
3 2003 Mexico 5.1 NaN 300.0
years state pop debt pofit TF
0 2000 Astrilia 1.5 -1.2 100.0 False
1 2001 Mexico 3.6 -1.5 200.0 True
2 2002 China 2.4 -1.7 NaN False
3 2003 Mexico 5.1 NaN 300.0 True
years state pop debt pofit
0 2000 Astrilia 1.5 -1.2 100.0
1 2001 Mexico 3.6 -1.5 200.0
2 2002 China 2.4 -1.7 NaN
3 2003 Mexico 5.1 NaN 300.0
进程已结束,退出代码0
Note here:
It must be written in this way, so that it can run normally
If written like this:
Both of these ways of writing will report an error! ! !
5. Assign nested dictionaries to DataFrame
If nested dictionaries are assigned to a DataFrame, pandas will treat the keys of the dictionaries as columns and the keys of the inner dictionaries as row indices:
Example:
import pandas as pd
import numpy as np
pop = {'MZY': {2001: 2.4, 2002: 2.9},
'DRX': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
frame = pd.DataFrame(pop)
print(frame)
result:
MZY DRX
2001 2.4 1.7
2002 2.9 3.6
2000 NaN 1.5
It can be transposed using numpy-like syntax:
Example:
import pandas as pd
import numpy as np
pop = {'MZY': {2001: 2.4, 2002: 2.9},
'DRX': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
frame = pd.DataFrame(pop)
print(frame)
#转置
print(frame.T)
result:
MZY DRX
2001 2.4 1.7
2002 2.9 3.6
2000 NaN 1.5
2001 2002 2000
MZY 2.4 2.9 NaN
DRX 1.7 3.6 1.5
The keys of the inner dictionary are not sorted if an index is specified explicitly:
Example:
import pandas as pd
import numpy as np
pop = {'MZY': {2001: 2.4, 2002: 2.9},
'DRX': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
frame = pd.DataFrame(pop)
print(frame)
frame1 = pd.DataFrame(pop, index=[2000, 2002, 2001, 2003])
print(frame1)
result:
MZY DRX
2001 2.4 1.7
2002 2.9 3.6
2000 NaN 1.5
MZY DRX
2000 NaN 1.5
2002 2.9 3.6
2001 2.4 1.7
2003 NaN NaN
Summarize
Although Series and DataFrame cannot solve all problems, they provide an effective and easy-to-use foundation for most applications.