DataFrame is a two-dimensional data structure in which data is arranged in rows and columns
The most basic definition format for building a DataFrame is as follows
df = pd.DataFrame(data=None, index=None, columns=None)
Parameter Description
data: specific data
index: row index, if not specified, RangeIndex(0,1,2,...,n) will be automatically generated
columns: column index (header), if not specified, will automatically generate RangeIndex(0,1,2,...,n)
We can directly use pd.DataFrame() to create an empty DataFrame data frame
import pandas as pd
df = pd.DataFrame()
'''
Empty DataFrame
Columns: []
Index: []
'''
print(df)
The following is a commonly used method of constructing a DataFrame data frame
Method 1: Build a DataFrame data frame using a dictionary dict
The key in the dictionary is the column name, and the value is generally a list, tuple or ndarray array object, which is specific data
import pandas as pd
import numpy as np
data = {'a':[1, 2, 3, 4], # 列表
'b':(4, 5, 6, 7), # 元组
'c':np.array([8, 9, 10, 11]) # ndarry数组
}
# 创建Dataframe
df1 = pd.DataFrame(data)
df1
It can be seen that a new DataFrame data frame has been successfully created. The system generates a row index for us by default, and the column index is the key in the dictionary dict. We can also manually specify the row index when creating a Dataframe, just modify the parameters index can be
import pandas as pd
import numpy as np
data = {'a':[1, 2, 3, 4], # 列表
'b':(4, 5, 6, 7), # 元组
'c':np.array([8, 9, 10, 11]) # ndarry数组
}
# 创建Dataframe
df1 = pd.DataFrame(data,index=['one','two','three','four'])
df1
We can also use the dictionary composed of Series to build a DataFrame data frame
A key-value pair in the dictionary is a column of data, the key is the column name, and the value is a Series
import pandas as pd
data = {"x": pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
"y": pd.Series([5, 6, 7, 8], index=['a', 'b', 'c', 'd'])}
# 创建DataFrame
df2 = pd.DataFrame(data)
df2
Method 2: Build a Dataframe data frame using a list
We can build a DataFrame from a list of dictionaries, where each dictionary is a row of data
import pandas as pd
# 定义一个字典列表
data = [{'x':1, 'y':2, 'z':3},
{'x':4, 'y':5, 'z':6}]
# 创建DataFrame
df3 = pd.DataFrame(data, index=['a','b'])
df3
We can also create a DataFrame data frame using a two-dimensional list
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df4 = pd.DataFrame(data,columns=['Name','Age'])
df4
Tips
In actual business, we generally don’t need us to generate data, but we have already collected data sets, which can be loaded directly into DataFrame