Basic use of Pandas library for Python data analysis

Introduction to pandas: A Numpy-based tool created to solve data analysis tasks. Pandas incorporates a large number of libraries and some standard data models, providing the tools needed to efficiently manipulate large structured datasets .

1. Core data structure

1.1, Series object

Series can be understood as a one-dimensional array , but the index name can be changed by itself . Similar to a fixed-length ordered dictionary, with index and value

1.1.1 Creation of Series objects

import pandas as pd
import numpy as np
# 1、Series对象创建--空Series对象
s1 = pd.Series()
print(s1, type(s1), s1.dtype, s1.ndim)
# 2、通过ndarray创建Series对象【或者是一个容器,字典时:key值为索引】
ary1 = np.array([23, 45, 12, 34, 56])
s2 = pd.Series(ary1)
print(s2)

Output result:
insert image description here

# 3、创建Series对象时,指定index行级索引标签
ary1 = np.array([23, 45, 12, 34, 56])
s3 = pd.Series(ary1, index=['zs', 'ls', 'ww', 'll', 'tq'])
print(s3)

Output result:
insert image description here

# 5、从标量创建一个系列
s5 = pd.Series(5, index=[0, 1, 2, 3])
print(s5)

Output result:
insert image description here

1.1.2 References to Series object elements

import numpy as np
import pandas as pd
s1 = pd.Series(np.array([78, 98, 67, 100, 76]), index=['lily', 'bob', 'jim', 'jack', 'mary'])
# 方式1:使用索引检索元素
print(s1[:3])  # 返回一个Series对象
print(s1[1])   # 返回value值

Output result:
insert image description here

# 2、使用标签检索数据[可同时多个元素]
print(s1['lily'])   # 返回value值
print(s1[['bob', 'jim', 'jack']])  # 返回一个Series对象

Output result:
insert image description here

1.2 Date type

datetime64 [ns] : date type
timedelta64 [ns] : time offset type

1.2.1 Date processing

The date string format recognized by panda

import pandas as pd
# 将日期列表转为Series对象序列
dates = pd.Series(['2021', '2011-02', '2011-03-02', '2011/04/01', '2011/5/1 01:01:01', '01 Jun 2011'])
print(dates)

Output result:
insert image description here

# to_datetime() 转换日期数据类型
dates = pd.to_datetime(dates)
print(dates, '\n', dates.dtype)

Output result:
insert image description here
datetime type data supports date operations

delta = dates - pd.to_datetime('1970-01-01')
print(delta, type(delta))

insert image description here

Note – Note : At this time, the element type in Series is timedelta type

1.2.2 Date-related operations

Test Series.dt date-related operations: specific and detailed API reference help(DatetimeProperties)

import pandas as pd
from pandas.core.indexes.accessors import DatetimeProperties

dates = pd.Series(['2021', '2011-02', '2011-03-02', '2011/04/01', '2011/5/1 01:01:01', '01 Jun 2011'])
dates = pd.to_datetime(dates)
print(dates)
print("*" * 45)
# 获取当前时间的-日
print(dates.dt.day)
print("*" * 45)
# 返回当前日期是每周第几天
print(dates.dt.dayofweek)
print("*" * 45)
# 返回当前日期的秒
print(dates.dt.second)
print(dates.dt.month)
# 返回当前日期是一年的第几周
print(dates.dt.weekofyear)

In addition to the above, Series.dt also provides many date-related operations

Series.dt.year	The year of the datetime.
Series.dt.month	The month as January=1, December=12.
Series.dt.day	The days of the datetime.
Series.dt.hour	The hours of the datetime.
Series.dt.minute	The minutes of the datetime.
Series.dt.second	The seconds of the datetime.
Series.dt.microsecond	The microseconds of the datetime.
Series.dt.week	The week ordinal of the year.
Series.dt.weekofyear	The week ordinal of the year.
Series.dt.dayofweek	The day of the week with Monday=0, Sunday=6.
Series.dt.weekday	The day of the week with Monday=0, Sunday=6.
Series.dt.dayofyear	The ordinal day of the year.
Series.dt.quarter	The quarter of the date.
Series.dt.is_month_start	Indicates whether the date is the first day of the month.
Series.dt.is_month_end	Indicates whether the date is the last day of the month.
Series.dt.is_quarter_start	Indicator for whether the date is the first day of a quarter.
Series.dt.is_quarter_end	Indicator for whether the date is the last day of a quarter.
Series.dt.is_year_start	Indicate whether the date is the first day of a year.
Series.dt.is_year_end	Indicate whether the date is the last day of the year.
Series.dt.is_leap_year	Boolean indicator if the date belongs to a leap year.
Series.dt.days_in_month	The number of days in the month.

1.3 DateTimeIndex

DateTimeIndex : Create a sequence of dates using the date_range() function by specifying a period and frequency . By default, the range's frequency is days

1.3.1 Detailed Explanation of date_range Parameters

# date_range参数详解
def date_range(
    start=None,       # 生成日期的起始日期
    end=None,         # 结束日期
    periods=None,     # 生成日期序列中日期元素个数
    freq=None,        # 指定生成日期之间的间隔或频率
    tz=None,          # 时区
    normalize=False,
    name=None,
    closed=None,
    **kwargs,
) -> DatetimeIndex

1.3.2 Create DateTimeIndex

# freq="M"代表每月生成一次日期,此种情况首日期从起始日期当月最后一天开始
dates = pd.date_range('2023-5-17', periods=10, freq="M")
print(dates, dates.dtype, type(dates))

Output result:
insert image description here

1.4 DataFrame

A data type similar to a table can be understood as a two-dimensional array, and the index has two dimensions and can be changed.
Features: underlying columns are of different types; variable size; labeled axes; can perform arithmetic operations on rows and columns

1.4.1 Creation of DataFrame objects

(1) Create an empty object

# DataFrame对象创建1
df1 = pd.DataFrame()
print(df1, type(df1))

insert image description here
(2) Create a DataFrame object using a one-dimensional array

# DataFrame对象创建[通过一维数组]2
data = [1, 2, 3, 4, 5]
df2 = pd.DataFrame(data)
print(df2)

insert image description here
(3) Create a DataFrame object using a two-dimensional array

# DataFrame对象创建[通过二维数组]3
data1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).reshape(3, 3)
df3 = pd.DataFrame(data1)
print(df3)

insert image description here
(4) Set row [index], column index label [columns]

# 设置行、列索引标签
data2 = np.array([[87, 76], [67, 99], [99, 100]])
df4 = pd.DataFrame(data2, index=['zs', 'ls', 'ww'], columns=['语文', '数学'])
print(df4)

insert image description here
(5) Create a DataFrame object through a dictionary

# 通过字典创建DataFrame对象
data3 = [{
    
    'a': 1, 'b': 2}, {
    
    'a': 3, 'b': 4, 'c': 9}]
print(pd.DataFrame(data3))
data4 = {
    
    'Name': ['tom', 'jack', 'jim', 'bob'], 'Age': [23, 24, 21, 22]}
print(pd.DataFrame(data4))

insert image description here
(6) You can directly get a row or a column of data through the index label \ index

data4 = {
    
    'Name': ['tom', 'jack', 'jim', 'bob'], 'Age': [23, 24, 21, 22]}
df5 = pd.DataFrame(data4)
print(df5['Name'])     # 通过列标签拿到'Name'列

2. Core data structure operation

2.1 Column operation

2.1.1 Column access

import numpy as np
import pandas as pd

df = pd.DataFrame({
    
    'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']),
                   'two': pd.Series([1, 2, 3, 4], index={
    
    'a', 'b', 'c', 'd'})})
# 列访问
print(df['one'], '-->访问第一列')
print(df[['one', 'two']], '-->访问多列')

Output result:
insert image description here

2.1.2 Column addition

"""
import numpy as np
import pandas as pd

df = pd.DataFrame({
    
    'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']),
                   'two': pd.Series([1, 2, 3, 4], index={
    
    'a', 'b', 'c', 'd'})})
# 列添加
df['three'] = pd.Series([2, 3, 5, 6], index={
    
    'a', 'b', 'c', 'd'})
# df['six'] = pd.Series([2, 3, 5, 6])  # 使用Series对象添加列时,必须指定索引index,否则默认的0,1,2,3不匹配abcd,都是Nan
df['seven'] = pd.Series([2, 3, 5, 6], index=df.index)
df['four'] = [12, 3, 4, 5]
df['five'] = np.array([1, 4, 6, 8])
print(df)

Output result:
insert image description here

Note: When using the Series object to add columns, you must specify the index index , otherwise the default 0, 1, 2, 3 does not match abcd, all are Nan

2.1.3 Column deletion

There are two common deletion methods:
Method 1: Use the pop method provided by the DataFrame class in pandas Method 2: Use the del index to delete

df.pop('seven')
print(df, '-->删除seven列')
del (df['five'])
print(df, '-->删除five列')

Output result:
insert image description here

2.2 Row operation

2.2.1 Row access

(1) Access method 1: using slices

import pandas as pd

name = pd.Series(['zs', 'ls', 'ww', 'tq'], index=['s1', 's2', 's3', 's4'])
age = pd.Series([23, 24, 21, 10], index=['s1', 's2', 's3', 's4'])
df = pd.DataFrame({
    
    'Name1': name, 'Age': age})
print(df)
print('*' * 45)
# 行访问 使用切片的方式访问
print(df[0:1])   # 访问0行

Output result:
insert image description here
(2) Access method 2: loc method: slice method for DataFrame index name

import pandas as pd

name = pd.Series(['zs', 'ls', 'ww', 'tq'], index=['s1', 's2', 's3', 's4'])
age = pd.Series([23, 24, 21, 10], index=['s1', 's2', 's3', 's4'])
df = pd.DataFrame({
    
    'Name1': name, 'Age': age})

print(df.loc['s1'])
print('*' * 45)
print(df.loc[['s1', 's2']])

Output results:
insert image description here
(3) Access method three: iloc method, the difference between iloc and loc is that iloc must accept the position of the row index and column index.

import pandas as pd

name = pd.Series(['zs', 'ls', 'ww', 'tq'], index=['s1', 's2', 's3', 's4'])
age = pd.Series([23, 24, 21, 10], index=['s1', 's2', 's3', 's4'])
df = pd.DataFrame({
    
    'Name1': name, 'Age': age})
print(df.iloc[2], '-->2行')
print(df.iloc[[2, 3]], '-->2、3行')  # 2、3行
print(df.iloc[1, 1], '-->1行1列')  # 1行1列

Output result:
insert image description here

2.2.2 Row addition

import numpy as np
import pandas as pd

age = np.array([23, 45, 67, 89])
name = np.array(['lily', 'bob', 'jack', 'jim'])
df = pd.DataFrame({
    
    'Age_info': age, 'Name_info': name})
print(df)
# df1与df两个DataFrame对象列名一致时,合并操作
df1 = pd.DataFrame({
    
    'Age_info': pd.Series([34, 56]), 'Name_info': pd.Series(['kevin', 'Mary'])})
# print(df1)
print(df.append(df1))
# df1与df两个DataFrame对象列名不一致时,合并操作
df2 = pd.DataFrame({
    
    'sex_info': pd.Series(['W', 'M']), 'score_info': pd.Series([67.7, 89.5])})
# print(df2)
print(df.append(df2))

Output result:
insert image description here

2.2.3 Line deletion

Delete by : Delete rows from DataFrame using index labels [or index without labels]. If the label is repeated, multiple lines will be deleted
Note : After using drop to delete, an object will be regenerated, and the original object will remain unchanged

import numpy as np
import pandas as pd

age = np.array([23, 45, 67, 89])
name = np.array(['lily', 'bob', 'jack', 'jim'])
df = pd.DataFrame({
    
    'Age_info': age, 'Name_info': name}, index=['s1', 's2', 's3', 's4'])
print(df)
# 使用索引标签[或无标签使用索引]从DataFrame中删除行。如果标签重复,则会删除多行
df1 = df.drop('s1')
print(df1,'-->删除s1行')

Output result:
insert image description here

2.3 Value modification

(1) Method 1: Use loc to find the element to be modified

import numpy as np
import pandas as pd

age = np.array([23, 45, 67, 89])
name = np.array(['lily', 'bob', 'jack', 'jim'])
df = pd.DataFrame({
    
    'Age_info': age, 'Name_info': name})
print(df)
df.loc[0, 'Age_info'] = 444
df.iloc[1, 0] = 555     # 必须是索引,不可以是索引标签
print(df)

Output results:
insert image description here
(2) SettingWithCopyWarning : A value is trying to be set on a copy of a slice from a DataFrame Reason and solution
[1] Reason: Trying to change the value in a copy of DataFrame (similar to a pandas vector)
[2] Solution: Use loc to ensure that the return is itself, and no copy will be generated

2.4 Case

Masks are still available in DataFrame
Case: Change all 0 values ​​in the score column to np.nan

import numpy as np
import pandas as pd

s1 = pd.Series(['ll', 'ww', 'zz', 'qq'])
s2 = pd.Series([78, 0, 45, 0])
df = pd.DataFrame({
    
    'name': s1, 'score': s2})
print(df)
mask = df[df['score'] == 0].index   # 查找score为0的行索引,利用了掩码
print(mask)
df.loc[mask, 'score'] = np.nan
print(df)

Output result:
insert image description here

2.5 Common properties of DataFrame

insert image description here
Example code:

import pandas as pd

data = {
    
    'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['s1','s2','s3','s4'])
df['score']=pd.Series([90, 80, 70, 60], index=['s1','s2','s3','s4'])
print(df)
print(df.axes)
print(df['Age'].dtype)
print(df.empty)
print(df.ndim)
print(df.size)
print(df.values)
print(df.head(3)) # df的前三行
print(df.tail(3)) # df的后三行

Result demo:

E:\Anaconda\python.exe E:/Python达内/网络并发/data_analysis/6_pandas_study/demo12.py
     Name  Age  score
s1    Tom   28     90
s2   Jack   34     80
s3  Steve   29     70
s4  Ricky   42     60
[Index(['s1', 's2', 's3', 's4'], dtype='object'), Index(['Name', 'Age', 'score'], dtype='object')]
int64
False
2
12
[['Tom' 28 90]
 ['Jack' 34 80]
 ['Steve' 29 70]
 ['Ricky' 42 60]]
     Name  Age  score
s1    Tom   28     90
s2   Jack   34     80
s3  Steve   29     70
     Name  Age  score
s2   Jack   34     80
s3  Steve   29     70
s4  Ricky   42     60

Process finished with exit code 0

3. Descriptive statistics

Descriptive statistics of numerical data mainly includes the complete situation of computational data, minimum, maximum, median, mean, quartile, range, standard deviation, variance, covariance, etc. Some commonly used statistical functions in the Numpy library can also be used to perform descriptive statistics on data frames

3.1 Common APIs

Example code:

import pandas as pd

# Create a Dictionary of series
d = {
    
    'Name': pd.Series(['Tom', 'James', 'Ricky', 'Vin', 'Steve', 'Minsu', 'Jack',
                        'Lee', 'David', 'Gasper', 'Betina', 'Andres', 'Andres']),
     'Age': pd.Series([25, 26, 25, 23, 30, 29, 23, 34, 40, 30, 51, 46, 46]),
     'Rating': pd.Series([4.23, 3.24, 3.98, 2.56, 3.20, 4.6, 3.8, 3.78, 2.98, 4.80, 4.10, 3.65, 3.65]),
     'Score': pd.Series([3.20, 4.6, 3.8, 3.78, 2.98, 4.80, 4.80, 3.65, 4.23, 3.24, 3.98, 2.56, 2.56])}
s = pd.DataFrame({
    
    'a': pd.Series([4.23, 3.24, 3.98, 2.56, 3.20, 4.6, 3.8, 3.78, 25, 26, 25, 23]),
                  'b': pd.Series([3.20, 4.6, 3.8, 3.78, 2.98, 4.80, 3.8, 3.78, 2.98, 4.80, 4.10, 3.65])})
# Create a DataFrame
df = pd.DataFrame(d)
print(df)
print(df.mean(0))  # 计算平均值 axis代表轴向
print(df.max())
print(df.prod())
print(df.median())  # 中位数
print(df.count())  # 计数
print(df.value_counts())  # 统计每个值出现的次数  查看的是每一行数据出现几次
# print(df.cumprod(), "累积")  # 使用前手动消除非数值型
print(df.std(), '------------------------------标准差')  # 标准差
print(df.cov(), '----协方差')  # 协方差 自动忽略非数值型
print(df.var(), '--------方差')
print(df.corr(), '-------corr')  # 相关系数  任意两对之间的相关系数
print(df.corrwith(s['a']), ' - -----------------corrwith')  # 相关系数 计算每一列与指定对象之间的相关系数,返回Series对象
# print(df.describe())
# print(df.describe(include=['object']))
# print(df.describe(include=['number']))

operation result:

E:\Anaconda\python.exe E:/Python达内/网络并发/data_analysis/6_pandas_study/demo13.py
      Name  Age  Rating  Score
0      Tom   25    4.23   3.20
1    James   26    3.24   4.60
2    Ricky   25    3.98   3.80
3      Vin   23    2.56   3.78
4    Steve   30    3.20   2.98
5    Minsu   29    4.60   4.80
6     Jack   23    3.80   4.80
7      Lee   34    3.78   3.65
8    David   40    2.98   4.23
9   Gasper   30    4.80   3.24
10  Betina   51    4.10   3.98
11  Andres   46    3.65   2.56
12  Andres   46    3.65   2.56
Age       32.923077
Rating     3.736154
Score      3.706154
dtype: float64
Name      Vin
Age        51
Rating    4.8
Score     4.8
dtype: object
Age      -3.964810e+18
Rating    2.306847e+07
Score     1.894188e+07
dtype: float64
Age       30.00
Rating     3.78
Score      3.78
dtype: float64
Name      13
Age       13
Rating    13
Score     13
dtype: int64
Name    Age  Rating  Score
Andres  46   3.65    2.56     2
Betina  51   4.10    3.98     1
David   40   2.98    4.23     1
Gasper  30   4.80    3.24     1
Jack    23   3.80    4.80     1
James   26   3.24    4.60     1
Lee     34   3.78    3.65     1
Minsu   29   4.60    4.80     1
Ricky   25   3.98    3.80     1
Steve   30   3.20    2.98     1
Tom     25   4.23    3.20     1
Vin     23   2.56    3.78     1
dtype: int64
Age       9.673517
Rating    0.633989
Score     0.773892
dtype: float64 ------------------------------标准差
              Age    Rating     Score
Age     93.576923  0.226346 -3.057821
Rating   0.226346  0.401942  0.004109
Score   -3.057821  0.004109  0.598909 ----协方差
Age       93.576923
Rating     0.401942
Score      0.598909
dtype: float64 --------方差
             Age    Rating     Score
Age     1.000000  0.036907 -0.408458
Rating  0.036907  1.000000  0.008375
Score  -0.408458  0.008375  1.000000 -------corr
Age       0.775174
Rating    0.211911
Score    -0.275430
dtype: float64  - -----------------corrwith

Process finished with exit code 0

3.2 Data deduplication

(1) DataFrame uses the drop_duplicates function to de-duplicate, and the parameters are explained in detail as follows:
[1] Parameter 1: subset, by default, all column data are identified repeatedly at the same time; or the column specified by subset=[] is identified repeatedly
[2] Parameter 2: keep, three optional values ​​{'first', 'last', False}, the default is first, which means to keep the first item in the order of the index among the identified duplicates, and delete the rest; False deletes all duplicates [
3 】Parameter 3: inplace, when False, the original object will not be modified, and will be assigned to a new object; True will modify the original object data
Code example:

import pandas as pd

# 通过字典创建DataFrame对象
data = [{
    
    'name': 'lily', 'age': 24, 'sex': 'M', 'score': 89.7},
        {
    
    'name': 'jack', 'age': 22, 'sex': 'M', 'score': 76.6},
        {
    
    'name': 'mary', 'age': 24, 'sex': 'W', 'score': 69.7},
        {
    
    'name': 'bob', 'age': 22, 'sex': 'M', 'score': 99.7},
        {
    
    'name': 'james', 'age': 25, 'sex': 'W', 'score': 91},
        {
    
    'name': 'lily', 'age': 24, 'sex': 'M', 'score': 89.7}]
df = pd.DataFrame(data)
print(df)
# 去除重复数据
# 默认情况下,对于所有的列进行去重,识别重复中保留按照索引顺序的第一个内容,其余删除,不对原数据进行去重,处理结果赋予一个新的变量
df1 = df.drop_duplicates()  # 不修改原数据
print(df1)

df.drop_duplicates(subset=['age', 'sex'], inplace=True)  # 对原对象进行修改,在'age''sex'列识别重复
print(df)

Output result:

E:\Anaconda\python.exe E:/Python达内/网络并发/data_analysis/6_pandas_study/demo14.py
    name  age sex  score
0   lily   24   M   89.7
1   jack   22   M   76.6
2   mary   24   W   69.7
3    bob   22   M   99.7
4  james   25   W   91.0
5   lily   24   M   89.7
    name  age sex  score
0   lily   24   M   89.7
1   jack   22   M   76.6
2   mary   24   W   69.7
3    bob   22   M   99.7
4  james   25   W   91.0
    name  age sex  score
0   lily   24   M   89.7
1   jack   22   M   76.6
2   mary   24   W   69.7
4  james   25   W   91.0

Process finished with exit code 0

3.3 Sorting

Pandas has two sorting methods, they are sorted by label and actual value

3.3.1 Sorting by tags

With sort_index()the method, passing the axis parameter and the sort order, the rows of the DataFrame can be sorted. By default, row labels are sorted in ascending order
(1) sort_index()Detailed explanation of important parameters

axis parameter: the default value is 0, which means sorting by row label (vertical); 1 means sorting by column label (horizontal )
;
, at this time, a new variable is required to receive this object; when True, modify it in the original object

(2) Code example

import numpy as np
import pandas as pd

# np.random.randn(10, 2)生成一个10行2列二维数组
df = pd.DataFrame(np.random.randn(10, 2), index=[8, 2, 4, 6, 1, 7, 0, 5, 3, 9], columns=['col1', 'col2'])
print(df)
# 参数inplace默认False。不在原对象修改;True代表修改原对象
df.sort_index(inplace=True, ascending=False)  # ascending=False时降序
print(df)

Output result:

E:\Anaconda\python.exe E:/Python达内/网络并发/data_analysis/6_pandas_study/demo15.py
       col1      col2
8 -0.670793 -0.037655
2  0.994857 -2.152398
4  1.304834 -0.292244
6  1.360664  1.097519
1 -0.336153 -0.289120
7 -1.964574  1.090914
0 -1.339923 -1.153182
5 -0.552900  0.279713
3  0.015910 -0.582301
9 -1.666869  0.146527
       col1      col2
9 -1.666869  0.146527
8 -0.670793 -0.037655
7 -1.964574  1.090914
6  1.360664  1.097519
5 -0.552900  0.279713
4  1.304834 -0.292244
3  0.015910 -0.582301
2  0.994857 -2.152398
1 -0.336153 -0.289120
0 -1.339923 -1.153182

Process finished with exit code 0

3.3.2 Sorting by actual value

When using sort_values()the method, when referring to multi-column sorting, you can specify the sorting method separately
. Code example:

import pandas as pd

# Create a Dictionary of series
d = {
    
    'Name': pd.Series(['Tom', 'James', 'Ricky', 'Vin', 'Steve', 'Minsu', 'Jack',
                        'Lee', 'David', 'Gasper', 'Betina', 'Andres', 'Andres']),
     'Age': pd.Series([25, 26, 25, 23, 30, 29, 23, 34, 40, 30, 51, 46, 46]),
     'Rating': pd.Series([4.23, 3.24, 3.98, 2.56, 3.20, 4.6, 3.8, 3.78, 2.98, 4.80, 4.10, 3.65, 3.65]),
     'Score': pd.Series([3.20, 4.6, 3.8, 3.78, 2.98, 4.80, 4.80, 3.65, 4.23, 3.24, 3.98, 2.56, 2.56])}
df = pd.DataFrame(d)

print(df)
# 先按Age排序,相同值按Rating排序.Age升序,Rating降序
df.sort_values(by=['Age', 'Rating'], ascending=[True, False], inplace=True)
print(df)

Output result:

**E:\Anaconda\python.exe E:/Python达内/网络并发/data_analysis/6_pandas_study/demo15.py
      Name  Age  Rating  Score
0      Tom   25    4.23   3.20
1    James   26    3.24   4.60
2    Ricky   25    3.98   3.80
3      Vin   23    2.56   3.78
4    Steve   30    3.20   2.98
5    Minsu   29    4.60   4.80
6     Jack   23    3.80   4.80
7      Lee   34    3.78   3.65
8    David   40    2.98   4.23
9   Gasper   30    4.80   3.24
10  Betina   51    4.10   3.98
11  Andres   46    3.65   2.56
12  Andres   46    3.65   2.56
      Name  Age  Rating  Score
6     Jack   23    3.80   4.80
3      Vin   23    2.56   3.78
0      Tom   25    4.23   3.20
2    Ricky   25    3.98   3.80
1    James   26    3.24   4.60
5    Minsu   29    4.60   4.80
9   Gasper   30    4.80   3.24
4    Steve   30    3.20   2.98
7      Lee   34    3.78   3.65
8    David   40    2.98   4.23
11  Andres   46    3.65   2.56
12  Andres   46    3.65   2.56
10  Betina   51    4.10   3.98

Process finished with exit code 0
**

3.4 Grouping

Guess you like

Origin blog.csdn.net/m0_51489557/article/details/130731976