pandas学习笔记(第二弹)

注:本教程为系列教程此章节接前面第一弹
跳转到专题地址

9.选取多个DataFrame

9.1 用列表选取多列

# 选取不同演员的FaceBook的粉丝数
movies[["actor_1_facebook_likes","actor_2_facebook_likes","actor_3_facebook_likes"]]

# 错误示例:
# movies["actor_1_facebook_likes","actor_2_facebook_likes","actor_3_facebook_likes"]
actor_1_facebook_likes actor_2_facebook_likes actor_3_facebook_likes
0 1000.0 936.0 855.0
1 40000.0 5000.0 1000.0
2 11000.0 393.0 161.0
3 27000.0 23000.0 23000.0
... ... ... ...
4912 841.0 593.0 319.0
4913 0.0 0.0 0.0
4914 946.0 719.0 489.0
4915 86.0 23.0 16.0

4916 rows × 3 columns

9.2 选取单列

movies[["actor_1_facebook_likes"]]
actor_1_facebook_likes
0 1000.0
1 40000.0
2 11000.0
3 27000.0
... ...
4912 841.0
4913 0.0
4914 946.0
4915 86.0

4916 rows × 1 columns

!!!注意在这里要区别下面的这种取法!!!

print(type(movies["actor_1_facebook_likes"]))
print(type(movies[["actor_1_facebook_likes"]]))
# 一种取出的是DataFrame 一种取出的是Series
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>

9.3 通过类型选择DataFrame

# 这样就会筛选出字段的类型为整形和浮点型的列(“number”表示选取所有数值列)
movies.select_dtypes(include=["int","float"])
num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes ... actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
0 723.0 178.0 0.0 855.0 ... 936.0 7.9 1.78 33000
1 302.0 169.0 563.0 1000.0 ... 5000.0 7.1 2.35 0
2 602.0 148.0 0.0 161.0 ... 393.0 6.8 2.35 85000
3 813.0 164.0 22000.0 23000.0 ... 23000.0 8.5 2.35 164000
... ... ... ... ... ... ... ... ... ...
4912 43.0 43.0 NaN 319.0 ... 593.0 7.5 16.00 32000
4913 13.0 76.0 0.0 0.0 ... 0.0 6.3 NaN 16
4914 14.0 100.0 0.0 489.0 ... 719.0 6.3 2.35 660
4915 43.0 90.0 16.0 16.0 ... 23.0 6.6 1.85 456

4916 rows × 16 columns

9.4 通过过滤器选取DataFrame

9.4.1 like 过滤法

movies.filter(like="actor_1")
actor_1_facebook_likes actor_1_name
0 1000.0 CCH Pounder
1 40000.0 Johnny Depp
2 11000.0 Christoph Waltz
3 27000.0 Tom Hardy
... ... ...
4912 841.0 Natalie Zea
4913 0.0 Eva Boehnke
4914 946.0 Alan Ruck
4915 86.0 John August

4916 rows × 2 columns

9.4.2 items 选取

movies.filter(items=["actor_1_name","actor_1_facebook_likes"])
actor_1_name actor_1_facebook_likes
0 CCH Pounder 1000.0
1 Johnny Depp 40000.0
2 Christoph Waltz 11000.0
3 Tom Hardy 27000.0
... ... ...
4912 Natalie Zea 841.0
4913 Eva Boehnke 0.0
4914 Alan Ruck 946.0
4915 John August 86.0

4916 rows × 2 columns

9.4.3 正则选取

# 选取不是以actor开头的字段
movies.filter(regex="^(?!actor)")
color director_name num_critic_for_reviews duration ... title_year imdb_score aspect_ratio movie_facebook_likes
0 Color James Cameron 723.0 178.0 ... 2009.0 7.9 1.78 33000
1 Color Gore Verbinski 302.0 169.0 ... 2007.0 7.1 2.35 0
2 Color Sam Mendes 602.0 148.0 ... 2015.0 6.8 2.35 85000
3 Color Christopher Nolan 813.0 164.0 ... 2012.0 8.5 2.35 164000
... ... ... ... ... ... ... ... ... ...
4912 Color NaN 43.0 43.0 ... NaN 7.5 16.00 32000
4913 Color Benjamin Roberds 13.0 76.0 ... 2013.0 6.3 NaN 16
4914 Color Daniel Hsia 14.0 100.0 ... 2012.0 6.3 2.35 660
4915 Color Jon Gunn 43.0 90.0 ... 2004.0 6.6 1.85 456

4916 rows × 22 columns

10.按照给定的列顺序排序

disc_core = ['movie_title','title_year', 'content_rating','genres']
disc_people = ['director_name','actor_1_name', 'actor_2_name','actor_3_name']
disc_other = ['color','country','language','plot_keywords','movie_imdb_link']
cont_fb = ['director_facebook_likes','actor_1_facebook_likes','actor_2_facebook_likes','actor_3_facebook_likes', 'cast_total_facebook_likes', 'movie_facebook_likes']
cont_finance = ['budget','gross']
cont_num_reviews = ['num_voted_users','num_user_for_reviews', 'num_critic_for_reviews']
cont_other = ['imdb_score','duration', 'aspect_ratio','facenumber_in_poster']

new_col_order = disc_core + disc_people + disc_other + cont_fb + cont_finance + cont_num_reviews + cont_other

# 我们把所有的列名按照自己的顺序进行排列,最后查看和原来的所有列是否一致
print(set(new_col_order) == set(movies.columns))

# 可以看到列显示的顺序已经按照我们的要求进行显示了
movies[new_col_order]
True
movie_title title_year content_rating genres ... imdb_score duration aspect_ratio facenumber_in_poster
0 Avatar 2009.0 PG-13 Action|Adventure|Fantasy|Sci-Fi ... 7.9 178.0 1.78 0.0
1 Pirates of the Caribbean: At World's End 2007.0 PG-13 Action|Adventure|Fantasy ... 7.1 169.0 2.35 0.0
2 Spectre 2015.0 PG-13 Action|Adventure|Thriller ... 6.8 148.0 2.35 1.0
3 The Dark Knight Rises 2012.0 PG-13 Action|Thriller ... 8.5 164.0 2.35 0.0
... ... ... ... ... ... ... ... ... ...
4912 The Following NaN TV-14 Crime|Drama|Mystery|Thriller ... 7.5 43.0 16.00 1.0
4913 A Plague So Pleasant 2013.0 NaN Drama|Horror|Thriller ... 6.3 76.0 NaN 0.0
4914 Shanghai Calling 2012.0 PG-13 Comedy|Drama|Romance ... 6.3 100.0 2.35 5.0
4915 My Date with Drew 2004.0 PG Documentary ... 6.6 90.0 1.85 0.0

4916 rows × 28 columns

11 操作整个DataFrame

11.1 获取DataFrame的行和列数

movies.shape
(4916, 28)

11.2 获取DataFrame的个数

movies.size
137648

11.3 获取DataFrame的长度

len(movies)
4916

11.4 获取DataFrame的维度

movies.ndim
2

11.5 统计DataFrame中各种类型的个数

movies.count()
color                     4897
director_name             4814
num_critic_for_reviews    4867
duration                  4901
                          ... 
actor_2_facebook_likes    4903
imdb_score                4916
aspect_ratio              4590
movie_facebook_likes      4916
Length: 28, dtype: int64

11.6 取DataFrame中各列的最小值

# 都是对数值类型的列进行统计
movies.min()
num_critic_for_reviews     1.00
duration                   7.00
director_facebook_likes    0.00
actor_3_facebook_likes     0.00
                           ... 
actor_2_facebook_likes     0.00
imdb_score                 1.60
aspect_ratio               1.18
movie_facebook_likes       0.00
Length: 16, dtype: float64

11.7 取DataFrame中各列的最大值

movies.max()
num_critic_for_reviews        813.0
duration                      511.0
director_facebook_likes     23000.0
actor_3_facebook_likes      23000.0
                             ...   
actor_2_facebook_likes     137000.0
imdb_score                      9.5
aspect_ratio                   16.0
movie_facebook_likes       349000.0
Length: 16, dtype: float64

11.8 取DataFrame中各列的平均值

movies.mean()
num_critic_for_reviews      137.988905
duration                    107.090798
director_facebook_likes     691.014541
actor_3_facebook_likes      631.276313
                              ...     
actor_2_facebook_likes     1621.923516
imdb_score                    6.437429
aspect_ratio                  2.222349
movie_facebook_likes       7348.294142
Length: 16, dtype: float64

11.9 取DataFrame中各列的中位数

movies.median()
num_critic_for_reviews     108.00
duration                   103.00
director_facebook_likes     48.00
actor_3_facebook_likes     366.00
                            ...  
actor_2_facebook_likes     593.00
imdb_score                   6.60
aspect_ratio                 2.35
movie_facebook_likes       159.00
Length: 16, dtype: float64

11.10 取DataFrame中各列的分位数

movies.quantile([0.5,0.3])
num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes ... actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
0.5 108.0 103.0 48.0 366.0 ... 593.0 6.6 2.35 159.0
0.3 60.0 95.0 11.0 176.0 ... 345.0 6.0 1.85 0.0

2 rows × 16 columns

11.11 取DataFrame中各列的标准差

movies.std()
num_critic_for_reviews       120.239379
duration                      25.286015
director_facebook_likes     2832.954125
actor_3_facebook_likes      1625.874802
                               ...     
actor_2_facebook_likes      4011.299523
imdb_score                     1.127802
aspect_ratio                   1.402940
movie_facebook_likes       19206.016458
Length: 16, dtype: float64

11.12 取DataFrame中各列的和

movies.sum()
num_critic_for_reviews       671592.00
duration                     524852.00
director_facebook_likes     3326544.00
actor_3_facebook_likes      3088835.00
                              ...     
actor_2_facebook_likes      7952291.00
imdb_score                    31646.40
aspect_ratio                  10200.58
movie_facebook_likes       36124214.00
Length: 16, dtype: float64

补充说明:上面的统计函数可以传入skip = bool 选择是否跳过空值

11.13 获取DataFrame中各列的统计信息

movies.describe()
num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes ... actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
count 4867.000000 4901.000000 4814.000000 4893.000000 ... 4903.000000 4916.000000 4590.000000 4916.000000
mean 137.988905 107.090798 691.014541 631.276313 ... 1621.923516 6.437429 2.222349 7348.294142
std 120.239379 25.286015 2832.954125 1625.874802 ... 4011.299523 1.127802 1.402940 19206.016458
min 1.000000 7.000000 0.000000 0.000000 ... 0.000000 1.600000 1.180000 0.000000
25% 49.000000 93.000000 7.000000 132.000000 ... 277.000000 5.800000 1.850000 0.000000
50% 108.000000 103.000000 48.000000 366.000000 ... 593.000000 6.600000 2.350000 159.000000
75% 191.000000 118.000000 189.750000 633.000000 ... 912.000000 7.200000 2.350000 2000.000000
max 813.000000 511.000000 23000.000000 23000.000000 ... 137000.000000 9.500000 16.000000 349000.000000

8 rows × 16 columns

11.14 判断DataFrame中的空值

movies.isnull()
color director_name num_critic_for_reviews duration ... actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
0 False False False False ... False False False False
1 False False False False ... False False False False
2 False False False False ... False False False False
3 False False False False ... False False False False
... ... ... ... ... ... ... ... ... ...
4912 False True False False ... False False False False
4913 False False False False ... False False True False
4914 False False False False ... False False False False
4915 False False False False ... False False False False

4916 rows × 28 columns

11.15 判断DataFrame中的非空值

movies.notnull()
color director_name num_critic_for_reviews duration ... actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
0 True True True True ... True True True True
1 True True True True ... True True True True
2 True True True True ... True True True True
3 True True True True ... True True True True
... ... ... ... ... ... ... ... ... ...
4912 True False True True ... True True True True
4913 True True True True ... True True False True
4914 True True True True ... True True True True
4915 True True True True ... True True True True

4916 rows × 28 columns

11.16 判断DataFrame中的值是否全为真

# 这样便可以查看那些是没有空值的列
movies.notnull().all()
color                     False
director_name             False
num_critic_for_reviews    False
duration                  False
                          ...  
actor_2_facebook_likes    False
imdb_score                 True
aspect_ratio              False
movie_facebook_likes       True
Length: 28, dtype: bool

11.17 在进行对有空值的列进行统计时,填充空值后才能统计

# 必须将空值进行补充后才能统计,不然返回会为空
movies[["director_name"]].fillna("").max()
director_name    Étienne Faure
dtype: object

12.DataFrame算数运算

12.1准备数据

matrix = pd.DataFrame([np.random.randint(0,10,3) for i in range(3)])
matrix
0 1 2
0 2 0 5
1 8 0 6
2 6 5 9

12.2 加法运算

12.2.1 操作符方式

matrix + 5
0 1 2
0 7 5 10
1 13 5 11
2 11 10 14

12.2.2 函数方式

matrix.add(5)
0 1 2
0 7 5 10
1 13 5 11
2 11 10 14

12.3 减法运算

12.3.1 操作符方法

matrix - 5
0 1 2
0 -3 -5 0
1 3 -5 1
2 1 0 4

12.3.2 函数方法

matrix.sub(5)
0 1 2
0 -3 -5 0
1 3 -5 1
2 1 0 4

12.4 乘法运算

12.4.1 操作符方法

matrix * 5
0 1 2
0 10 0 25
1 40 0 30
2 30 25 45

12.4.2 函数方法

matrix.mul(5)
0 1 2
0 10 0 25
1 40 0 30
2 30 25 45

12.5 除法运算

12.5.1 操作符方法

matrix / 5
0 1 2
0 0.4 0.0 1.0
1 1.6 0.0 1.2
2 1.2 1.0 1.8

12.5.2 函数方法

matrix.div(5)
0 1 2
0 0.4 0.0 1.0
1 1.6 0.0 1.2
2 1.2 1.0 1.8

12.6 整除运算

12.6.1 操作符方法

matrix // 5
0 1 2
0 0 0 1
1 1 0 1
2 1 1 1

12.6.2 函数方法

matrix.floordiv(5)
0 1 2
0 0 0 1
1 1 0 1
2 1 1 1

12.7 取模运算

12.7.1 操作符方法

matrix % 5
0 1 2
0 2 0 0
1 3 0 1
2 1 0 4

12.7.2 函数方法

matrix.mod(5)
0 1 2
0 2 0 0
1 3 0 1
2 1 0 4

13.DataFrame比较运算

13.1 大于

13.1.1 算数方法

matrix > 5
0 1 2
0 False False False
1 True False True
2 True False True

13.1.2 函数方法

matrix.gt(5)
0 1 2
0 False False False
1 True False True
2 True False True

13.2 大于等于

13.2.1 算数方法

matrix >= 5
0 1 2
0 False False True
1 True False True
2 True True True

13.2.2 函数方法

matrix.ge(5)
0 1 2
0 False False True
1 True False True
2 True True True

13.3 小于

13.3.1 算数方法

matrix < 5
0 1 2
0 True True False
1 False True False
2 False False False

13.3.2 函数方法

matrix.lt(5)
0 1 2
0 True True False
1 False True False
2 False False False

13.4 小于等于

13.4.1 算数方法

matrix <= 5
0 1 2
0 True True True
1 False True False
2 False True False

13.4.2 函数方法

matrix.le(5)
0 1 2
0 True True True
1 False True False
2 False True False

13.5 等于

13.5.1 算数方法

matrix == 5
0 1 2
0 False False True
1 False False False
2 False True False

13.5.2 函数方法

matrix.eq(5)
0 1 2
0 False False True
1 False False False
2 False True False

13.6 不等于

13.6.1 算数方法

matrix != 5
0 1 2
0 True True False
1 True True True
2 True False True

13.6.2 函数方法

matrix.ne(5)
0 1 2
0 True True False
1 True True True
2 True False True
发布了35 篇原创文章 · 获赞 78 · 访问量 1万+

猜你喜欢

转载自blog.csdn.net/qq_42359956/article/details/105472345