Table of contents
Pandas discards missing data values
pandas filter the value of a specific date
Pandas discards missing data values
In Pandas, you can use dropna()
the function to remove missing values. By default, the function drops rows containing any missing values. Here are some examples:
- remove all rows with missing values
import pandas as pd
# 创建一个包含缺失值的 DataFrame
df = pd.DataFrame({'A': [1, 2, None, 4], 'B': [None, 6, 7, 8]})
# 删除所有包含缺失值的行
df = df.dropna()
print(df)
output:
A B
1 2.0 6.0
- Delete only rows with missing values in a column
import pandas as pd
# 创建一个包含缺失值的 DataFrame
df = pd.DataFrame({'A': [1, 2, None, 4], 'B': [5, None, 7, 8]})
# 删除列 B 中包含缺失值的行
df = df.dropna(subset=['B'])
print(df)
output:
A B
0 1.0 5.0
2 NaN 7.0
3 4.0 8.0
It should be noted that dropna()
the default parameter of axis=0
means to delete rows, how='any'
which means to delete as long as it contains any missing values, and thresh=None
means that there is no limit to the number of deleted rows. Also, you can use to inplace=True
apply delete operations to the original DataFrame without returning a new DataFrame object. For example: df.dropna(inplace=True)
.
pandas filter the value of a specific date
In Pandas, datetime indexes can be used to filter data for a specific date. First, you need to convert the date column in the DataFrame to a datetime type and set it as an index, and then you can use the datetime index to filter the data. Here are some examples:
- Convert date column to datetime type
import pandas as pd
# 创建一个包含日期列和数值列的 DataFrame
df = pd.DataFrame({'date': ['2022-01-01', '2022-01-02', '2022-01-03'],
'value': [10, 20, 30]})
# 将日期列转换为日期时间类型
df['date'] = pd.to_datetime(df['date'])
# 将日期列设置为索引
df = df.set_index('date')
print(df)
output:
value
date
2022-01-01 10
2022-01-02 20
2022-01-03 30
- Filter data for a specific date using a datetime index
# 筛选 2022 年 1 月 2 日的数据
df_filtered = df.loc['2022-01-02']
print(df_filtered)
output:
value 20
Name: 2022-01-02 00:00:00, dtype: int64
It should be noted that when using a datetime index, you need to ensure that the index of the DataFrame is a datetime type. Also, you can use loc[]
to filter data within a certain time range, for example: df_filtered = df.loc['2022-01-01':'2022-01-02']
, which will return all data between January 1, 2022 and January 2, 2022.