Python filters and eliminates data in Excel tables based on a specified range

  This article introduces the method of reading Excel table files based on the Python language, filtering the data in it based on our given rules, eliminating data that is not within the specified data range , and retaining data that meets our needs .

  First, let's clarify the specific needs of this article. There is an Excel table file (in this article, we will take .csvthe format file as an example), as shown in the figure below.

  Among them, the Excel table file has a large amount of data, each column represents a certain attribute , and each row represents a certain sample ; what we need to do is to filter the data for some of the attributes - for example, we hope that the first Filter the data in one column, select the part that is greater than 2or less than -1, and directly delete the row corresponding to each selected cell ; at the same time, we also want to filter other attributes as well. The conditions are also different, but they all need to delete the entire row where the cells that do not meet the conditions are located. In the end, the data we retain is the data that meets our needs. At this time, we need to save it as a new Excel spreadsheet file.

  After understanding the requirements, we can start writing the code; the specific code used in this article is as follows.

# -*- coding: utf-8 -*-
"""
Created on Wed Jun  7 15:40:50 2023

@author: fkxxgis
"""

import pandas as pd

original_file = "E:/01_Reflectivity/99_Model_Training/00_Data/02_Extract_Data/23_Train_model_NoH/Train_Model_1_NoH.csv"
result_file = "E:/01_Reflectivity/99_Model_Training/00_Data/02_Extract_Data/23_Train_model_NoH/Train_Model_1_NoH_New.csv"

df = pd.read_csv(original_file)

df = df[(df["inf"] >= -0.2) & (df["inf"] <= 18)]
df = df[(df["NDVI"] >= -1) & (df["NDVI"] <= 1)]
df = df[(df["inf_dif"] >= -0.2) & (df["inf_dif"] <= 18)]
df = df[(df["NDVI_dif"] >= -2) & (df["NDVI_dif"] <= 2)]
df = df[(df["soil"] >= 0)]
df = df[(df["inf_h"] >= -0.2) & (df["inf_h"] <= 18)]
df = df[(df["ndvi_h"] >= -1) & (df["ndvi_h"] <= 1)]
df = df[(df["inf_h_dif"] >= -0.2) & (df["inf_h_dif"] <= 18)]
df = df[(df["ndvi_h_dif"] >= -1) & (df["ndvi_h_dif"] <= 1)]

df.to_csv(result_file, index = False)

  Here is an explanation of each step in the above code:

  1. Import necessary libraries: pandasLibraries are imported for data processing and manipulation.
  2. Define file path: Define the original file path original_fileand the result file path result_file.
  3. Read raw data: Use pd.read_csv()functions to read raw file data and store it in a DataFrame object df.
  4. Data filtering: perform multiple conditional filtering operations on DataFrame objects df, using logical operators &and comparison operators to combine conditions. For example, the sum in the first row df["inf"] >= -0.2means df["inf"] <= 18to filter out the data whose "inf"column value is -0.2between to; the second row and means to filter out the data whose column value is between to , and so on.18df["NDVI"] >= -1df["NDVI"] <= 1"NDVI"-11
  5. Save the result data: use to_csv()the function to save the filtered DataFrame object dfas a new .csvfile, save the path as result_file, and set index=Falseto avoid saving the index column.

  Of course, if we need to filter the data of multiple attributes (that is, multiple columns ), in addition to the method in the above code, we can also use the code shown below, which is more convenient than the previous code.

result_df = result_df[(result_df["blue"] > 0) & (result_df["blue"] <= 1) &
                              (result_df["green"] > 0) & (result_df["green"] <= 1) &
                              (result_df["red"] > 0) & (result_df["red"] <= 1) &
                              (result_df["inf"] > 0) & (result_df["inf"] <= 1) &
                              (result_df["NDVI"] > -1) & (result_df["NDVI"] < 1) &
                              (result_df["inf_dif"] > -1) & (result_df["inf_dif"] < 1) &
                              (result_df["NDVI_dif"] > -2) & (result_df["NDVI_dif"] < 2) &
                              (result_df["soil"] >= 0) &
                              (result_df["NDVI_dif"] > -2) & (result_df["NDVI_dif"] < 2) &
                              (result_df["inf_h_dif"] > -1) & (result_df["inf_h_dif"] < 1) &
                              (result_df["ndvi_h_dif"] > -1) & (result_df["ndvi_h_dif"] < 1)]

  The above code can directly filter the DataFrame object at one time, instead of saving it every time it is filtered .

  Run the code mentioned in this article, and we can get the data-filtered files under the specified result folder.

  So far, you're done.

Welcome to pay attention: Crazy learning GIS

Guess you like

Origin blog.csdn.net/zhebushibiaoshifu/article/details/131115193