[Pandas] Check the common properties of DataFrame

Import Data

import pandas as pd
 
df = pd.DataFrame([['L123','A',0,123],
                   ['L456','A',1,456],
                   ['L437','C',0,789],
                   ['L112','B',1,741],
                   ['L211','A',0,852],
                   ['L985','B',1,963]
                  ],columns=['Material','Level','Passing','LT'])

df


1.dtypes: View the data type of each column in the DataFrame

df.dtypes will return the data type of each field and the overall type of DataFrame 

# 查看各字段的数据类型
df.dtypes

The result is as follows

Material     object
Level         object
Passing       int64
LT                int64
dtype: object

It can be seen that the fields 'Material' and 'Level' are object, and other fields are int64 

2.values: returns the values ​​in the DataFrame 

# array(<所有值的列表矩阵>)
df.values

The result is as follows

array([['L123', 'A', 0, 123],
          ['L456', 'A', 1, 456],
          ['L437', 'C', 0, 789],
          ['L112', 'B', 1, 741],
          ['L211', 'A', 0, 852],
          ['L985', 'B', 1, 963]], dtype=object)

3.size: returns the number of elements in the DataFrame

# 元素个数  = 行数 * 列数
df.size

The result is as follows

24 

4.shape: returns the number of rows and columns of the DataFrame

Executing df.shape will return a tuple, the first element of the tuple represents the number of rows, and the second element represents the number of columns

# 返回一个元组(行数,列数)
df.shape

The result is as follows

(6, 4) 

As you can see, the above data frame df has 6 rows and 4 columns of data

5.ndim: returns the number of dimensions of the DataFrame

df.ndim

The result is as follows

6.index: returns the row index in the DataFrame

df.index

The result is as follows

RangeIndex(start=0, stop=6, step=1) 

7.columns: returns the column index in the DataFrame

df.columns

The result is as follows

Index(['Material', 'Level', 'Passing', 'LT'], dtype='object') 

8.info: returns the basic information in the DataFrame

Executing df.info will display all data types, index conditions, number of rows and columns, data types of each field, memory usage, etc.

df.info

The result is as follows

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
Material    6 non-null object
Level       6 non-null object
Passing     6 non-null int64
LT          6 non-null int64
dtypes: int64(2), object(2)
memory usage: 272.0+ bytes 

9. View some sample information of DataFrame

df.head() : Display the first few rows of data in the DataFrame, 5 by default, and the number of rows can be specified  

# 默认查看DataFrame前5条数据
df.head()

The result is as follows

    Material  Level  Passing  LT
0     L123       A          0       123
1     L456       A          1       456
2     L437       C         0       789
3     L112       B         1        741
4     L211       A          0       852 

# 查看DataFrame前3条数据
df.head(3)

The result is as follows

    Material  Level  Passing  LT
0     L123       A          0       123
1     L456       A          1       456
2     L437       C          0       789 

df.tail() : Display the data at the end of the DataFrame, the default is 5, and the number can be specified  

# 默认查看DataFrame后5条数据
df.tail()

The result is as follows

    Material  Level  Passing  LT
1     L456       A          1        456
2     L437       C          0       789
3     L112       B          1        741
4     L211       A           0       852
5     L985       B          1       963 

# 查看DataFrame后3条数据
df.tail(3)

The result is as follows

    Material  Level  Passing  LT
3     L112       B          1       741
4     L211       A          0       852
5     L985       B          1       963 

df.sample() : Display a piece of random data in DataFrame, the number of pieces can be specified

# 随机查看一条数据
df.sample()

The result is as follows

    Material  Level  Passing  LT
4     L211       A          0       852 

# 随机查看3条数据
df.sample(3)

The result is as follows

    Material  Level  Passing  LT
5     L985       B          1       963
1     L456       A          1       456
0     L123       A          0       123 

10. View the statistics of DataFrame

df.describe() displays the statistics of DataFrame (only columns of numeric type can be counted)

'''
count:这一组数据中包含数据的个数(数量)
mean:这一组数据的平均值(平均数)
std:标准差
min:最小值
max:最大值
25%,50%,75%:百分位数,其中50%是中位数
'''
df.describe()

percentile

If a set of data is sorted from small to large, and the corresponding cumulative percentile is calculated, the value of the data corresponding to a certain percentile is called the percentile of this percentile, and Pk represents the kth percentile The quantile
Pk means that at least k% of the data is less than or equal to this number, and at the same time there are (100-k)% of the data greater than or equal to this number

Using height as an example, the fifth percentile of the height distribution means that 5% of people are less than or equal to this measurement and 95% are greater than or equal to this measurement 

The result is as follows

             Passing          LT
count  6.000000    6.000000
mean  0.500000   654.000000
std      0.547723   310.368813
min     0.000000   123.000000
25%    0.000000   527.250000
50%    0.500000   765.000000
75%    1.000000   836.250000
max    1.000000   963.000000 

df.describe() will return a statistical table of all numeric columns with multiple rows, each row corresponds to a statistical indicator 

Tips:  If the DataFrame has no numeric columns, character-related statistics will be output

import pandas as pd
 
df = pd.DataFrame([['L123','L123','A'],
                   ['L456','L456','B'],
                   ['L437','L437','C'],
                   ['L112','L112','B'],
                   ['L1212','L985','B'],
                   ['L911','L985','B']
                  ],columns=['Material','New_Material','Level'])

df 

'''
count:这一组数据中包含数据的个数(数量)
unique:表示有多少种不同的值(不重复值数)
top:最大值(按首字母排序)
freq:最大值(top)的出现频率
'''
df.describe()

The result is as follows

            Material  New_Material  Level
count         6                6                6
unique       6                5                3
top          L911           L985            B
freq           1                2                 4

Guess you like

Origin blog.csdn.net/Hudas/article/details/130463469