1, a data structure of explanation DataFrame
index represents the row index, column represents the column index, values represents the value, in fact, whether it is the row index or column index can be viewed as an index Index. From each row to see, DataFrame Series can be seen as a sequence of vertically stacked rows, each row is indexed Series Index [0,1,2,3]; see each column, it can be regarded as a DataFrame column the series sequence piled up around each series of the index is the row index [0,1,2].
DataFrame default way to understand is: DataFrame is actually made up of many different types of data columns Series components. For the figure, in fact, this consists of the following four DataFrame Series, which are are indexed row index [0,1,2].
A DataFrame can, in analogy to a table MySQL:
MySQL table, the data type of each column is substantially not the same field, there are many columns for each table field;
if the MySQL each column in the table to see do is a data type of the Series, a MySQL table it can be seen by a number of different data types Series composition, and above us is about the same.
2, DataFrame columns of attributes and attribute index
1) configuration of a DataFrame
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(70,100,(3,5)),
index=["地区1", "地区2", "地区3"],
columns=["北京","天津", "上海","沈阳", "广州"])
display(df)
The results are as follows:
2) index and columns attribute
df = pd.DataFrame(np.random.randint(70,100,(3,5)),
index=["地区1", "地区2", "地区3"],
columns=["北京","天津", "上海","沈阳", "广州"])
display(df)
x = df.index
display(x)
list(df.index)
y = df.columns
display(y)
list(df.columns)
The results are as follows:
① modify row index: df.index
df = pd.DataFrame(np.random.randint(70,100,(3,5)),
index=["地区1", "地区2", "地区3"],
columns=["北京","天津", "上海","沈阳", "广州"])
display(df)
df.index = ["a","b","c"]
display(df)
The results are as follows:
② modify the column index: df.columns
df = pd.DataFrame(np.random.randint(70,100,(3,5)),
index=["地区1", "地区2", "地区3"],
columns=["北京","天津", "上海","沈阳", "广州"])
display(df)
df.columns = ["a","b","c"]
display(df)
The results are as follows:
3) DataFrame indexed objects Index
Observation "DataFrame data configuration diagram" can be found in: both a row index for each df index, index columns have a column. But regardless of the row index index, or column index columns, unified both are called "Index Object." The difference is that when you create df, parameter names specified parameters, in order to facilitate regional branches and column indices, the index of the line "Index object" called the index, the index of the column "Index object" called the columns.
Remember: Index Index object elements can not be modified.
# pd.Index()用于创建一个Index对象
x = pd.Index([1,2,3])
display(x)
display(type(x))
x[0] = 1
The results are as follows:
3, name attribute
1) understand how the name attribute DataFrame
We know: Each row out DataFrame in each column is a Series, each composed sereis this DataFrame object has a name, which is the line that corresponds to the index column. As shown above the "orange, yellow, indigo Zihei" eight colors, numbered 1-8, respectively, corresponds to each number is a Series. Series1's name is "region 1", Series2's name as "area 2" ... Series8's name as "Guangzhou."
Next, we use the code test it.
df = pd.DataFrame(np.random.randint(70,100,(3,5)),
index=["地区1", "地区2", "地区3"],
columns=["北京","天津", "上海","沈阳", "广州"])
display(df)
df.loc["地区1"].name
df.loc["地区2"].name
......
df["广州"].name
The results are as follows:
2) is a row index and column index set property name Name: df.index.name and df.columns.name
df = pd.DataFrame(np.random.randint(70,100,(3,5)),
index=["地区1", "地区2", "地区3"],
columns=["北京","天津", "上海","沈阳", "广州"])
display(df)
df.index.name = "index_name"
df.columns.name = "columns_name"
display(df)
The results are as follows:
To sum up: The above presentation, we not only DataFrame each row, each column has a name name, and we can also give DataFrame row and column indices set a name name, respectively.