An article teaches you how to parse excel files for automated testing?

foreword

In automated testing, we store data by using files or databases. The files can be csv, xlsx, xml, or even txt files. Usually, excel files are our first choice. Whether it is writing test cases or storing test data, excel is Very convenient. So today we will summarize the methods of processing excel files by different modules, and directly package them, so that we can use them directly in the future and increase work efficiency.

1. openpyxl

openpyxl is a third-party library, first we use the command pip install openpyxl to install it directly

Note: When openpyxl operates excel, the row number and column number are calculated from 1

package code

"""
from openpyxl import load_workbook
from openpyxl.styles import Font
from openpyxl.styles.colors import BLACK
from collections import namedtuple
class ParseExcel(object):
    """解析excel文件"""
    def __init__(self, filename, sheet_name=None):
        try:
            self.filename = filename
            self.sheet_name = sheet_name
            self.wb = load_workbook(self.filename)
            if self.sheet_name is None:
                self.work_sheet = self.wb.active
            else:
                self.work_sheet = self.wb[self.sheet_name]
        except FileNotFoundError as e:
            raise e
    def get_max_row_num(self):
        """获取最大行号"""
        max_row_num = self.work_sheet.max_row
        return max_row_num
    def get_max_column_num(self):
        """获取最大列号"""
        max_column = self.work_sheet.max_column
        return max_column
    def get_cell_value(self, coordinate=None, row=None, column=None):
        """获取指定单元格的数据"""
        if coordinate is not None:
            try:
                return self.work_sheet[coordinate].value
            except Exception as e:
                raise e
        elif coordinate is None and row is not None and column is not None:
            if isinstance(row, int) and isinstance(column, int):
                return self.work_sheet.cell(row=row, column=column).value
            else:
                raise TypeError('row and column must be type int')
        else:
            raise Exception("Insufficient Coordinate of cell!")
    def get_row_value(self, row):
        """获取某一行的数据"""
        column_num = self.get_max_column_num()
        row_value = []
        if isinstance(row, int):
            for column in range(1, column_num + 1):
                values_row = self.work_sheet.cell(row, column).value
                row_value.append(values_row)
            return row_value
        else:
            raise TypeError('row must be type int')
    def get_column_value(self, column):
        """获取某一列数据"""
        row_num = self.get_max_column_num()
        column_value = []
        if isinstance(column, int):
            for row in range(1, row_num + 1):
                values_column = self.work_sheet.cell(row, column).value
                column_value.append(values_column)
            return column_value
        else:
            raise TypeError('column must be type int')
    def get_all_value_1(self):
        """获取指定表单的所有数据(除去表头)"""
        max_row_num = self.get_max_row_num()
        max_column = self.get_max_column_num()
        values = []
        for row in range(2, max_row_num + 1):
            value_list = []
            for column in range(1, max_column + 1):
                value = self.work_sheet.cell(row, column).value
                value_list.append(value)
            values.append(value_list)
        return values
    def get_all_value_2(self):
        """获取指定表单的所有数据(除去表头)"""
        rows_obj = self.work_sheet.iter_rows(min_row=2, max_row=self.work_sheet.max_row,
                                             values_only=True)  # 指定values_only 会直接提取数据不需要再使用cell().value
        values = []
        for row_tuple in rows_obj:
            value_list = []
            for value in row_tuple:
                value_list.append(value)
            values.append(value_list)
        return values
    def get_excel_title(self):
        """获取sheet表头"""
        title_key = tuple(self.work_sheet.iter_rows(max_row=1, values_only=True))[0]
        return title_key
    def get_listdict_all_value(self):
        """获取所有数据,返回嵌套字典的列表"""
        sheet_title = self.get_excel_title()
        all_values = self.get_all_value_2()
        value_list = []
        for value in all_values:
            value_list.append(dict(zip(sheet_title, value)))
        return value_list
    def get_list_nametuple_all_value(self):
        """获取所有数据,返回嵌套命名元组的列表"""
        sheet_title = self.get_excel_title()
        values = self.get_all_value_2()
        excel = namedtuple('excel', sheet_title)
        value_list = []
        for value in values:
            e = excel(*value)
            value_list.append(e)
        return value_list
    def write_cell(self, row, column, value=None, bold=True, color=BLACK):
        """
        指定单元格写入数据
        :param work_sheet:
        :param row: 行号
        :param column: 列号
        :param value: 待写入数据
        :param bold: 加粗, 默认加粗
        :param color: 字体颜色,默认黑色
        :return:
        """
        try:
            if isinstance(row, int) and isinstance(column, int):
                cell_obj = self.work_sheet.cell(row, column)
                cell_obj.font = Font(color=color, bold=bold)
                cell_obj.value = value
                self.wb.save(self.filename)
            else:
                raise TypeError('row and column must be type int')
        except Exception as e:
            raise e
if __name__ == '__main__':
    pe = ParseExcel('testdata.xlsx')
    # sheet = pe.get_sheet_object('testcase')
    column_row = pe.get_max_column_num()
    print('最大列号:', column_row)
    max_row = pe.get_max_row_num()
    print('最大行号:', max_row)
    #
    cell_value_1 = pe.get_cell_value(row=2, column=3)
    print('第%d行, 第%d列的数据为: %s' % (2, 3, cell_value_1))
    cell_value_2 = pe.get_cell_value(coordinate='A5')
    print('A5单元格的数据为: {}'.format(cell_value_2))
    value_row = pe.get_row_value(3)
    print('第{}行的数据为:{}'.format(3, value_row))
    value_column = pe.get_column_value(2)
    print('第{}列的数据为:{}'.format(2, value_column))
    #
    values_1 = pe.get_all_value_1()
    print('第一种方式获取所有数据\n', values_1)
    values_2 = pe.get_all_value_2()
    print('第二种方式获取所有数据\n', values_2)
    title = pe.get_excel_title()
    print('表头为\n{}'.format(title))
    dict_value = pe.get_listdict_all_value()
    print('所有数据组成的嵌套字典的列表:\n', dict_value)
    #
    namedtuple_value = pe.get_list_nametuple_all_value()
    print('所有数据组成的嵌套命名元组的列表:\n', namedtuple_value)
    pe.write_cell(1, 2, 'Tc_title')

If the above package is used to simultaneously operate two sheets of the same excel file and write data, there will be a small bug (after writing, you will find that one of the two sheets has no data)

In fact, the reason is very simple: different objects have their own independent attributes. When you write operations, each object only saves its own form, so after the last object writes data, it only saves its own form, and the others The form of the object is not actually saved. In response to this problem, a slight change was made to the code encapsulated above

"""
from openpyxl import load_workbook
from openpyxl.styles import Font
from openpyxl.styles.colors import BLACK
from collections import namedtuple
 
 
class ParseExcel(object):
    """解析excel文件"""
 
    def __init__(self, filename):
        try:
            self.filename = filename
            self.__wb = load_workbook(self.filename)
        except FileNotFoundError as e:
            raise e
 
    def get_max_row_num(self, sheet_name):
        """获取最大行号"""
        max_row_num = self.__wb[sheet_name].max_row
        return max_row_num
 
    def get_max_column_num(self, sheet_name):
        """获取最大列号"""
        max_column = self.__wb[sheet_name].max_column
        return max_column
 
    def get_cell_value(self, sheet_name, coordinate=None, row=None, column=None):
        """获取指定单元格的数据"""
        if coordinate is not None:
            try:
                return self.__wb[sheet_name][coordinate].value
            except Exception as e:
                raise e
        elif coordinate is None and row is not None and column is not None:
            if isinstance(row, int) and isinstance(column, int):
                return self.__wb[sheet_name].cell(row=row, column=column).value
            else:
                raise TypeError('row and column must be type int')
        else:
            raise Exception("Insufficient Coordinate of cell!")
 
    def get_row_value(self, sheet_name, row):
        """获取某一行的数据"""
        column_num = self.get_max_column_num(sheet_name)
        row_value = []
        if isinstance(row, int):
            for column in range(1, column_num + 1):
                values_row = self.__wb[sheet_name].cell(row, column).value
                row_value.append(values_row)
            return row_value
        else:
            raise TypeError('row must be type int')
 
    def get_column_value(self, sheet_name, column):
        """获取某一列数据"""
        row_num = self.get_max_column_num(sheet_name)
        column_value = []
        if isinstance(column, int):
            for row in range(1, row_num + 1):
                values_column = self.__wb[sheet_name].cell(row, column).value
                column_value.append(values_column)
            return column_value
        else:
            raise TypeError('column must be type int')
 
    def get_all_value_1(self, sheet_name):
        """获取指定表单的所有数据(除去表头)"""
        max_row_num = self.get_max_row_num(sheet_name)
        max_column = self.get_max_column_num(sheet_name)
        values = []
        for row in range(2, max_row_num + 1):
            value_list = []
            for column in range(1, max_column + 1):
                value = self.__wb[sheet_name].cell(row, column).value
                value_list.append(value)
            values.append(value_list)
        return values
 
    def get_all_value_2(self, sheet_name):
        """获取指定表单的所有数据(除去表头)"""
        rows_obj = self.__wb[sheet_name].iter_rows(min_row=2, max_row=self.__wb[sheet_name].max_row, values_only=True)
        values = []
        for row_tuple in rows_obj:
            value_list = []
            for value in row_tuple:
                value_list.append(value)
            values.append(value_list)
        return values
 
    def get_excel_title(self, sheet_name):
        """获取sheet表头"""
        title_key = tuple(self.__wb[sheet_name].iter_rows(max_row=1, values_only=True))[0]
        return title_key
 
    def get_listdict_all_value(self, sheet_name):
        """获取所有数据,返回嵌套字典的列表"""
        sheet_title = self.get_excel_title(sheet_name)
        all_values = self.get_all_value_2(sheet_name)
        value_list = []
        for value in all_values:
            value_list.append(dict(zip(sheet_title, value)))
        return value_list
 
    def get_list_nametuple_all_value(self, sheet_name):
        """获取所有数据,返回嵌套命名元组的列表"""
        sheet_title = self.get_excel_title(sheet_name)
        values = self.get_all_value_2(sheet_name)
        excel = namedtuple('excel', sheet_title)
        value_list = []
        for value in values:
            e = excel(*value)
            value_list.append(e)
        return value_list
 
    def write_cell(self, sheet_name, row, column, value=None, bold=True, color=BLACK):
        if isinstance(row, int) and isinstance(column, int):
            try:
                cell_obj = self.__wb[sheet_name].cell(row, column)
                cell_obj.font = Font(color=color, bold=bold)
                cell_obj.value = value
                self.__wb.save(self.filename)
            except Exception as e:
                raise e
        else:
            raise TypeError('row and column must be type int')
 
 
if __name__ == '__main__':
    pe = ParseExcel('testdata.xlsx')
    print(pe.get_all_value_2('division'))
    print(pe.get_list_nametuple_all_value('division'))
    column_row = pe.get_max_column_num('division')
    print('最大列号:', column_row)
    max_row = pe.get_max_row_num('division')
    print('最大行号:', max_row)
    cell_value_1 = pe.get_cell_value('division', row=2, column=3)
    print('第%d行, 第%d列的数据为: %s' % (2, 3, cell_value_1))
    cell_value_2 = pe.get_cell_value('division', coordinate='A5')
    print('A5单元格的数据为: {}'.format(cell_value_2))
    value_row = pe.get_row_value('division', 3)
    print('第{}行的数据为:{}'.format(3, value_row))
    value_column = pe.get_column_value('division', 2)
    print('第{}列的数据为:{}'.format(2, value_column))
    values_1 = pe.get_all_value_1('division')
    print('第一种方式获取所有数据\n', values_1)
    values_2 = pe.get_all_value_2('division')
    print('第二种方式获取所有数据\n', values_2)
    title = pe.get_excel_title('division')
    print('表头为\n{}'.format(title))
    dict_value = pe.get_listdict_all_value('division')
    print('所有数据组成的嵌套字典的列表:\n', dict_value)
    namedtuple_value = pe.get_list_nametuple_all_value('division')
    print('所有数据组成的嵌套命名元组的列表:\n', namedtuple_value)
    pe.write_cell('division', 1, 2, 'Tc_title')

Two, xlrd

Install xlrd, this module only supports read operations, if you want to write, you need to use xlwt or use xlutils to cooperate with xlrd, but using xlwt can only write to new excel files, and cannot write to original files, so choose to use xlutils here

But there is another problem, if you use xlutils, then our excel file needs to be suffixed with .xls. Because it is impossible to write with xlsx as the suffix, an error will be reported (pro-test, because the formatting_info parameter is not yet compatible with the format of the new version of xlsx)

Note: When xlrd operates excel, the row number and column number are calculated from 0

package code

"""
import xlrd
from xlutils import copy
from collections import namedtuple
 
 
class ParseExcel(object):
    # xlrd 解析excel, 行号和列号都是从0开始的
    def __init__(self, filename, sheet):
        try:
            self.filename = filename
            self.sheet = sheet
            self.wb = xlrd.open_workbook(self.filename, formatting_info=True)
            if isinstance(sheet, str):
                self.sheet = self.wb.sheet_by_name(sheet)
            elif isinstance(sheet, int):
                self.sheet = self.wb.sheet_by_index(sheet)
            else:
                raise TypeError('sheet must be int or str')
        except Exception as e:
            raise e
 
    def get_max_row(self):
        """获取表单的最大行号"""
        max_row_num = self.sheet.nrows
        return max_row_num
 
    def get_max_column(self):
        """获取表单的最大列号"""
        min_row_num = self.sheet.ncols
        return min_row_num
 
    def get_cell_value(self, row, column):
        """获取某个单元格的数据"""
        if isinstance(row, int) and isinstance(column, int):
            values = self.sheet.cell(row-1, column-1).value
            return values
        else:
            raise TypeError('row and column must be type int')
 
    def get_row_values(self, row):
        """获取某一行的数据"""
        if isinstance(row, int):
            values = self.sheet.row_values(row-1)
            return values
        else:
            raise TypeError('row must be type int')
 
    def get_column_values(self, column):
        """获取某一列的数据"""
 
        if isinstance(column, int):
            values = self.sheet.col_values(column-1)
            return values
        else:
            raise TypeError('column must be type int')
 
    def get_table_title(self):
        """获取表头"""
        table_title = self.get_row_values(1)
        return table_title
 
    def get_all_values_dict(self):
        """获取所有的数据,不包括表头,返回一个嵌套字典的列表"""
        max_row = self.get_max_row()
        table_title = self.get_table_title()
        value_list = []
        for row in range(2, max_row):
            values = self.get_row_values(row)
            value_list.append(dict(zip(table_title, values)))
        return value_list
 
    def get_all_values_nametuple(self):
        """获取所有的数据,不包括表头,返回一个嵌套命名元组的列表"""
        table_title = self.get_table_title()
        max_row = self.get_max_row()
        excel = namedtuple('excel', table_title)
        value_list = []
        for row in range(2, max_row):
            values = self.get_row_values(row)
            e = excel(*values)
            value_list.append(e)
        return value_list
 
    def write_value(self, sheet_index, row, column, value):
        """写入某个单元格数据"""
        if isinstance(row, int) and isinstance(column, int):
            if isinstance(sheet_index, int):
                wb = copy.copy(self.wb)
                worksheet = wb.get_sheet(sheet_index)
                worksheet.write(row-1, column-1, value)
                wb.save(self.filename)
            else:
                raise TypeError('{} must be int'.format(sheet_index))
        else:
            raise TypeError('{} and {} must be int'.format(row, column))
 
if __name__ == '__main__':
    pe = ParseExcel('testdata.xls', 'testcase')
    print('最大行号:', pe.get_max_row())
    print('最大列号:', pe.get_max_column())
    print('第2行第3列数据:', pe.get_cell_value(2, 3))
    print('第2行数据', pe.get_row_values(2))
    print('第3列数据', pe.get_column_values(3))
    print('表头:', pe.get_table_title())
    print('所有的数据返回嵌套字典的列表:', pe.get_all_values_dict())
    print('所有的数据返回嵌套命名元组的列表:', pe.get_all_values_nametuple())
   pe.write_value(0, 1, 3, 'test')

3. pandas

Pandas is a library for data analysis. I always feel that using pandas to parse excel files to read data in automated testing is a bit overkill. No matter what, let’s write pandas to parse excel files.

I only encapsulate reading here, and I have a little problem with writing. I will modify it later and add code.

Please pip install pandas first to install pandas

package code

"""
import pandas as pd
class ParseExcel(object):
    def __init__(self, filename, sheet_name=None):
        try:
            self.filename = filename
            self.sheet_name = sheet_name
            self.df = pd.read_excel(self.filename, self.sheet_name)
        except Exception as e:
            raise e
    def get_row_num(self):
        """获取行号组成的列表, 从0开始的"""
        row_num_list = self.df.index.values
        return row_num_list
    def get_cell_value(self, row, column):
        """获取某一个单元格的数据"""
        try:
            if isinstance(row, int) and isinstance(column, int):
                cell_value = self.df.ix[row-2, column-1] # ix的行参数是按照有效数据行,且从0开始
                return cell_value
            else:
                raise TypeError('row and column must be type int')
        except Exception as e:
            raise e
    def get_table_title(self):
        """获取表头, 返回列表"""
        table_title = self.df.columns.values
        return table_title
    def get_row_value(self, row):
        """获取某一行的数据, 行号从1开始"""
        try:
            if isinstance(row, int):
                row_data = self.df.ix[row-2].values
                return row_data
            else:
                raise TypeError('row must be type int')
        except Exception as e:
            raise e
    def get_column_value(self, col_name):
        """获取某一列数据"""
        try:
            if isinstance(col_name, str):
                col_data = self.df[col_name].values
                return col_data
            else:
                raise TypeError('col_name must be type str')
        except Exception as e:
            raise e
        
    def get_all_value(self):
        """获取所有的数据,不包括表头, 返回嵌套字典的列表"""
        rows_num = self.get_row_num()
        table_title = self.get_table_title()
        values_list = []
        for i in rows_num:
            row_data = self.df.ix[i, table_title].to_dict()
            values_list.append(row_data)
        return values_list
if __name__ == '__main__':
    pe = ParseExcel('testdata.xlsx', 'testcase')
    print(pe.get_row_num())
    print(pe.get_table_title())
    print(pe.get_all_value())
    print(pe.get_row_value(2))
    print(pe.get_cell_value(2, 3))
    print(pe.get_column_value('Tc_title'))

Summarize

I used 3 methods, 4 libraries xlrd, openpyxl, xlwt, and pandas to operate excel files. I personally feel that using openpyxl is more suitable for automation. Of course, different people have different choices, and the difference between them is not very big.

The above 3 methods can be used directly, no need to do packaging!

Finally, I would like to thank everyone who has read my article carefully. Reciprocity is always necessary. Although it is not a very valuable thing, you can take it away if you need it:

insert image description here

Software testing interview applet

The software test question bank maxed out by millions of people! ! ! Who is who knows! ! ! The most comprehensive quiz mini program on the whole network, you can use your mobile phone to do the quizzes, on the subway or on the bus, roll it up!

The following interview question sections are covered:

1. Basic theory of software testing, 2. web, app, interface function testing, 3. network, 4. database, 5. linux

6. web, app, interface automation, 7. performance testing, 8. programming basics, 9. hr interview questions, 10. open test questions, 11. security testing, 12. computer basics

These materials should be the most comprehensive and complete preparation warehouse for [software testing] friends. This warehouse has also accompanied tens of thousands of test engineers through the most difficult journey. I hope it can help you too!  

Guess you like

Origin blog.csdn.net/2301_78276982/article/details/132428979