Python efficient processing of test data using Pandas

Transfer: https: //www.cnblogs.com/keyou1/p/10948796.html

First, think

What 1.Pandas that?

  • Extremely powerful data analysis library
  • You can efficiently operate various data sets
    • csv file format
    • Excel files
    • HTML file
    • XML format file
    • JSON file format
    • Database operations

2. classic face questions

By face questions leads to the theme, the reader may think, if you encounter this problem, how to answer it?

 

 

Second, the use of pandas to manipulate Excel files

1. Install

a. mounted by Pypi

pip install pandas

b. mounted by source

git clone git://github.com/pydata/pandas.git
cd pandas
python setup.py install

2. The data is read by columns

Case in lemon_cases.xlsx file contents are as follows:

 

import pandas as pd

# 读excel文件
# 返回一个DataFrame对象,多维数据结构
df = pd.read_excel('lemon_cases.xlsx', sheet_name='multiply') print(df) # 1.读取一列数据 # df["title"] 返回一个Series对象,记录title这列的数据 print(df["title"]) # Series对象能转化为任何序列类型和dict字典类型 print(list(df['title'])) # 转化为列表 # title为DataFrame对象的属性 print(list(df.title)) # 转化为列表 print(tuple(df['title'])) # 转化为元组 print(dict(df['title'])) # 转化为字典,key为数字索引 # 2.读取某一个单元格数据 # 不包括表头,指定列名和行索引 print(df['title'][0]) # title列,不包括表头的第一个单元格 # 3.读取多列数据 print(df[["title", "actual"]])

3. Press the read data line

import pandas as pd

# 读excel文件
df = pd.read_excel('lemon_cases.xlsx', sheet_name='multiply') # 返回一个DataFrame对象,多维数据结构 print(df) # 1.读取一行数据 # 不包括表头,第一个索引值为0 # 获取第一行数据,可以将其转化为list、tuple、dict print(list(df.iloc[0])) # 转成列表 print(tuple(df.iloc[0])) # 转成元组 print(dict(df.iloc[0])) # 转成字典 print(dict(df.iloc[-1])) # 也支持负索引 # 2.读取某一个单元格数据 # 不包括表头,指定行索引和列索引(或者列名) print(df.iloc[0]["l_data"]) # 指定行索引和列名 print(df.iloc[0][2]) # 指定行索引和列索引 # 3.读取多行数据 print(df.iloc[0:3])

4.iloc method and loc

import pandas as pd

# 读excel文件
df = pd.read_excel('lemon_cases.xlsx', sheet_name='multiply') # 返回一个DataFrame对象,多维数据结构 print(df) # 1.iloc方法 # iloc使用数字索引来读取行和列 # 也可以使用iloc方法读取某一列 print(df.iloc[:, 0]) print(df.iloc[:, 1]) print(df.iloc[:, -1]) # 读取多列 print(df.iloc[:, 0:3]) # 读取多行多列 print(df.iloc[2:4, 1:4]) print(df.iloc[[1, 3], [2, 4]]) # 2.loc方法 # loc方法,基于标签名或者索引名来选择 print(df.loc[1:2, "title"]) # 多行一列 print(df.loc[1:2, "title":"r_data"]) # 多列多行 # 基于布尔类型来选择 print(df["r_data"] > 5) # 某一列中大于5的数值为True,否则为False print(df.loc[df["r_data"] > 5]) # 把r_data列中大于5,所在的行选择出来 print(df.loc[df["r_data"] > 5, "r_data":"actual"]) # 把r_data到actual列选择出来

5. Read all the data

import pandas as pd

# 读excel文件
df = pd.read_excel('lemon_cases.xlsx', sheet_name='multiply') # 返回一个DataFrame对象,多维数据结构 print(df) # 读取的数据为嵌套列表的列表类型,此方法不推荐使用 print(df.values) # 嵌套字典的列表 datas_list = [] for r_index in df.index: datas_list.append(df.iloc[r_index].to_dict()) print(datas_list)

6. Write data

import pandas as pd

# 读excel文件
df = pd.read_excel('lemon_cases.xlsx', sheet_name='multiply') # 返回一个DataFrame对象,多维数据结构 print(df) df['result'][0] = 1000 print(df) with pd.ExcelWriter('lemon_cases_new.xlsx') as writer: df.to_excel(writer, sheet_name="New", index=False)

Third, the use of pandas to operate csv file

1. Read csv file

Case in data.log file contents are as follows:

TestID,TestTime,Success
0,149,0
1,69,0
2,45,0
3,18,1
4,18,1
import pandas as pd

# 读取csv文件
# 方法一,使用read_csv读取,列与列之间默认以逗号分隔(推荐方法)
# a.第一行为列名信息 csvframe = pd.read_csv('data.log') # b.第一行没有列名信息,直接为数据 csvframe = pd.read_csv('data.log', header=None) # c.第一行没有列名信息,直接为数据,也可以指定列名 csvframe = pd.read_csv('data.log', header=None, names=["Col1", "Col2", "Col3"]) # 方法二,read_table,需要指定列与列之间分隔符为逗号 csvframe = pd.read_table('data.log', sep=",")

2. The answer interview questions

import pandas as pd

# 1.读取csv文件
csvframe = pd.read_csv('data.log')

# 2.选择Success为0的行 new_csvframe = csvframe.loc[csvframe["Success"] == 0] result_csvframe = new_csvframe["TestTime"] avg_result = round(sum(result_csvframe)/len(result_csvframe), 2) print("TestTime最小值为:{}\nTestTime最大值为:{}\nTestTime平均值为:{}". format(min(result_csvframe), max(result_csvframe), avg_result)) 

IV Summary

  • In the data analysis, data visualization, Pandas are extremely versatile; in large-scale data on multiple types of data processing efficiency is very high
  • In the field of software testing is also used, but only if used to store test data excel, use a little Pandas "slaughter chickens Yanyong chopper" feeling, it is recommended to use a specific module to handle (such as openpyxl )

Guess you like

Origin www.cnblogs.com/songzhenhua/p/11481317.html