"Python data analysis" learning record 001: 2.2.1 pandas filters the value that meets a certain condition in the row, and the'.astype (float)' pattern reports an error!

In the second chapter of the book, subsection 2.2.1, use the pandas module to filter all the columns in the Supplier Name column whose name contains Z or the value in the Cost column is greater than 600.0.
In the csv file, the unit of Cost (Cost column) is U.S. dollars, with a dollar sign'$' in front of it. In excel, it is in currency form, that is, if it is greater than thousands, it will be separated by','. Therefore, it is necessary to find a way to convert the text in the Cost column into a floating-point value to compare with 600.0. The problem is here, the problems are listed below:

Problem display:

According to the code given in the book, the complete knock down looks like this:

import pandas as pd

file1 = 'supplier_data.csv'
file2 = 'output_file.csv'
data_frame = pd.read_csv(file1)
data_frame['Cost'] = data_frame['Cost'].str.strip('$').astype(float)
pandas3 = data_frame.loc[(data_frame['Supplier Name'].str.contains('Z')) | (data_frame['Cost'] > 600.0), :]
pandas3.to_csv(file2, index = False)

operation result:

巴拉巴拉一堆之后:
ValueError: could not convert string to float: '6,015.00 '

Error analysis:
Run directly and report a value-related error: there is no way to convert the string type '6,015.00' into a floating point type.
Analysis, it may be that the numeric type of this column in excel was originally set to the'currency' type, so even if the dollar sign is removed with the str.strip('$') method, the',' comma in the number cannot be removed. .

1) First try to use the replace() method to remove the',' in the string

Modify the code, add a print, and see what is printed:

data_frame['Cost'] = data_frame['Cost'].str.strip('$').replace(',', '')
print(data_frame['Cost'])

operation result:

0           500.00 
1           500.00 
2           750.00 
3           750.00 
4           250.00 
5           250.00 
6           125.00 
7           125.00 
8           615.00 
9           615.00 
10        6,015.00 
11    1,006,015.00 
Name: Cost, dtype: object
+一堆不相关报错,先不理

The dollar sign is successfully removed, but the program cannot successfully call the replace() method. This method is also a method for processing strings, so it also needs to be called through the string module.
Try to call the replace() method through the string module.

Modify the code for the second time:

data_frame['Cost'] = data_frame['Cost'].str.strip('$').str.replace(',', '')
print(data_frame['Cost'])

Successfully removed the','

0         500.00 
1         500.00 
2         750.00 
3         750.00 
4         250.00 
5         250.00 
6         125.00 
7         125.00 
8         615.00 
9         615.00 
10       6015.00 
11    1006015.00 
Name: Cost, dtype: object
+一堆不相关报错,先不理

See if you can use the astype() method directly at this time:

data_frame['Cost'] = data_frame['Cost'].str.strip('$').str.replace(',', '').astype(float)
print(data_frame['Cost'])

operation result:

0         500.0
1         500.0
2         750.0
3         750.0
4         250.0
5         250.0
6         125.0
7         125.0
8         615.0
9         615.0
10       6015.0
11    1006015.0
Name: Cost, dtype: float64

Success!

2) Let’s summarize:

  1. If you want to call the method of module B in this module A, you need to call another module B, and the method is'B. method ()'.
  2. To view the results of the operation, the simple way is to print(). You can print the result itself or the type of the result. Look at your own needs.

Modify the original code + display of running results

Through the above analysis, the code in the book can be modified to:

import pandas as pd

file1 = 'supplier_data.csv'
file2 = 'output_file.csv'
data_frame = pd.read_csv(file1)
data_frame['Cost'] = data_frame['Cost'].str.strip('$').str.replace(',', '').astype(float)
pandas3 = data_frame.loc[(data_frame['Supplier Name'].str.contains('Z')) | (data_frame['Cost'] > 600.0), :]
pandas3.to_csv(file2, index = False)

Running result:
Modify code + run result
There is no output after running.
The filtered result is stored in file2. The screenshot is as follows:
Output result
You can see that the line with Z in the Supplier Name or the Cost greater than 600.0 is found.

Guess you like

Origin blog.csdn.net/Haoyu_xie/article/details/106562665