pandas筛选数据时可能会遇到的报错

df[df['director'].str.contains('|', regex=False)]
## output
ValueError: cannot index with vector containing NA / NaN values

## 仔细查看数据
df.director.str.contains('|', regex=False).unique()
## output, 发现错误是nan造成的,所以要先筛选非空数据
array([False, True, nan], dtype=object)

## 对比 regex=True时的情况
df.director.str.contains('|').unique()
## output, 居然全部都包含'|',显然时错误的!
array([True, nan], dtype=object)

# 先筛选非空数据就ok了
df = df[df['director'].notnull()]
df[df['director'].str.contains('|', regex=False)].director
## output
8                                 Kyle Balda|Pierre Coffin
11                          Lana Wachowski|Lilly Wachowski
64                                Glenn Ficarra|John Requa
85                John Francis Daley|Jonathan M. Goldstein
100                                Chris Buck|Jennifer Lee
···
10794                        Richard A. Colla|Alan J. Levi
10797                             Warren Beatty|Buck Henry
10815                                  Eric Idle|Gary Weis
10819                               Chuck Jones|Ben Washam
10842                         Basil Dearden|Eliot Elisofon
Name: director, Length: 754, dtype: object

.contains('|', regex=False),regex默认为True,regex=False表明第一个参数不是正则表达式,而是一般的字符串。

猜你喜欢

转载自blog.csdn.net/guo_ya_nan/article/details/81021158
今日推荐