[Python] pandas discards missing data values and pandas filters values for specific dates

Table of contents

Pandas discards missing data values

pandas filter the value of a specific date


Pandas discards missing data values

In Pandas, you can use dropna()the function to remove missing values. By default, the function drops rows containing any missing values. Here are some examples:

  • remove all rows with missing values
import pandas as pd

# 创建一个包含缺失值的 DataFrame
df = pd.DataFrame({'A': [1, 2, None, 4], 'B': [None, 6, 7, 8]})

# 删除所有包含缺失值的行
df = df.dropna()

print(df)

output:

     A    B
1  2.0  6.0
  • Delete only rows with missing values ​​in a column
import pandas as pd

# 创建一个包含缺失值的 DataFrame
df = pd.DataFrame({'A': [1, 2, None, 4], 'B': [5, None, 7, 8]})

# 删除列 B 中包含缺失值的行
df = df.dropna(subset=['B'])

print(df)

output:

     A    B
0  1.0  5.0
2  NaN  7.0
3  4.0  8.0

It should be noted that dropna()the default parameter of axis=0means to delete rows, how='any'which means to delete as long as it contains any missing values, and thresh=Nonemeans that there is no limit to the number of deleted rows. Also, you can use to inplace=Trueapply delete operations to the original DataFrame without returning a new DataFrame object. For example: df.dropna(inplace=True).

pandas filter the value of a specific date

In Pandas, datetime indexes can be used to filter data for a specific date. First, you need to convert the date column in the DataFrame to a datetime type and set it as an index, and then you can use the datetime index to filter the data. Here are some examples:

  • Convert date column to datetime type
import pandas as pd

# 创建一个包含日期列和数值列的 DataFrame
df = pd.DataFrame({'date': ['2022-01-01', '2022-01-02', '2022-01-03'],
                   'value': [10, 20, 30]})

# 将日期列转换为日期时间类型
df['date'] = pd.to_datetime(df['date'])

# 将日期列设置为索引
df = df.set_index('date')

print(df)

output:

            value
date             
2022-01-01     10
2022-01-02     20
2022-01-03     30
  • Filter data for a specific date using a datetime index
# 筛选 2022 年 1 月 2 日的数据
df_filtered = df.loc['2022-01-02']

print(df_filtered)

output:

value    20
Name: 2022-01-02 00:00:00, dtype: int64

It should be noted that when using a datetime index, you need to ensure that the index of the DataFrame is a datetime type. Also, you can use loc[]to filter data within a certain time range, for example: df_filtered = df.loc['2022-01-01':'2022-01-02'], which will return all data between January 1, 2022 and January 2, 2022.

Guess you like

Origin blog.csdn.net/fanjufei123456/article/details/130889722