python's tablib: processing tabular data

Table of contents

1. Introduction to installation and basic knowledge

2. Import data from different data sources

3. Process the data

4. Export data to different formats


Tablib is a Python library for processing spreadsheet data, which can easily import and export data, as well as convert data formats. This article will introduce various uses and examples of the Tablib library in detail, including how to import data from different data sources, how to process data, and how to export data to different formats.

1. Introduction to installation and basic knowledge

Installing Tablib is very simple, just use the pip command to complete the installation. Enter the following command in the terminal to start the installation:

pip install tablib

The most basic data type of Tablib is Dataset. It consists of rows and columns, which can be understood as a two-dimensional array, and each element can be any Python object. Dataset can define the names of columns and rows, which will also be reflected when data is exported.

2. Import data from different data sources

1. Import data from csv file

Using Tablib to import csv files is very easy, just use Tablib's built-in Dataset method. In the following code, we will import a csv file named "example.csv" and convert it to Tablib's Dataset format.

import tablib
data = tablib.Dataset().load(open('example.csv').read())

2. Import data from Excel file

Tablib also supports importing data from Excel files, and you can use the xlrd library to read Excel files. In the following code, we will import an Excel file named "example.xlsx" and convert it to Tablib's Dataset format.

import xlrd
import tablib
data = tablib.Dataset().load(open('example.xlsx').read(), format='xls')

3. Import data from json file

It is also very easy to import json files, just use Tablib's built-in Dataset method. In the following code, we will import a json file named "example.json" and convert it to Tablib's Dataset format.

import tablib
import json
data = tablib.Dataset().load(json.load(open('example.json')))

3. Process the data

1. Add columns and rows

Adding columns and rows is very convenient in Tablib. For the addition of columns, we only need to use the add_column() method in the Dataset. For the addition of rows, we can use the append() method of Dataset to add specific data in the form of a list.

import tablib
data = tablib.Dataset()
data.headers = ['name', 'age']
data.add_column(['Tom', 'Ali', 'Mike'], header='name')
data.add_column(['27', '25', '28'], header='age')

2. Delete columns and rows

It is also easy to delete columns and rows, we only need to use some methods of Dataset to delete specified columns or rows. In the code below, we demonstrate how to delete the column named "age" along with the second row of data.

import tablib
data = tablib.Dataset()
data.headers = ['name', 'age']
data.add_column(['Tom', 'Ali', 'Mike'], header='name')
data.add_column(['27', '25', '28'], header='age')
data.remove_column('age')
data.pop(2)

3. Duplicated columns

In Tablib, renaming columns is also very easy, just use the rename() method of Dataset.

import tablib
data = tablib.Dataset()
data.headers = ['name', 'age']
data.add_column(['Tom', 'Ali', 'Mike'], header='name')
data.add_column(['27', '25', '28'], header='age')
data.rename('name', 'username')

4. Specify the data type of the column

When importing data, Tablib treats all data as strings by default. If you want to process numeric data in Tablib, you can do so by specifying the data type of the column.

import tablib
data = tablib.Dataset()
data.headers = ['name', 'age']
data.add_column(['Tom', 'Ali', 'Mike'], header='name')
data.add_column([27, 25, 28], header='age', type='numeric')

4. Export data to different formats

1. Export as csv file

It is also very convenient to export data as a csv file, just use the built-in csv format of Tablib. In the following code, we will use Tablib to export the data as a csv file of "example.csv".

import tablib
data = tablib.Dataset()
data.headers = ['name', 'age']
data.add_column(['Tom', 'Ali', 'Mike'], header='name')
data.add_column(['27', '25', '28'], header='age')
with open('example.csv', 'w') as f:
    f.write(data.csv)

2. Export as an Excel file

Exporting to an Excel file is equally easy, just use Tablib's built-in xls format and use the xlwt library to export the data as an Excel file.

import tablib
import xlwt
data = tablib.Dataset()
data.headers = ['name', 'age']
data.add_column(['Tom', 'Ali', 'Mike'], header='name')
data.add_column(['27', '25', '28'], header='age')
book = xlwt.Workbook(encoding='utf-8')
sheet = book.add_sheet('Sheet1')
for idx, row in enumerate(data):
    for colidx, value in enumerate(row):
        sheet.write(idx, colidx, value)
book.save('example.xls')

3. Export as a json file

It is also very easy to export data as a json file, just use Tablib's built-in json format.

import tablib
import json
data = tablib.Dataset()
data.headers = ['name', 'age']
data.add_column(['Tom', 'Ali', 'Mike'], header='name')
data.add_column(['27', '25', '28'], header='age')
with open('example.json', 'w') as f:
    f.write(json.dumps(data.json))

4. Export to other formats

Tablib also supports exporting to other formats, including YAML format, HTML format and even Markdown format. When exporting to other formats, you need to install the corresponding dependent libraries first.

import tablib
import yaml
import markdown
data = tablib.Dataset()
data.headers = ['name', 'age']
data.add_column(['Tom', 'Ali', 'Mike'], header='name')
data.add_column(['27', '25', '28'], header='age')

#导出为YAML格式
with open('example.yml', 'w') as f:
    f.write(yaml.dump(data.yaml))

#导出为HTML格式
html = '<table>{}</table>'.format(data.html)
with open('example.html', 'w') as f:
    f.write(html)

#导出为Markdown格式
md = markdown.markdown(data.markdown)
with open('example.md', 'w') as f:
    f.write(md)

Summarize:

This article introduces various uses and examples of the Tablib library, including importing data from different data sources, processing data, and exporting data to files in different formats. Among them, the Tablib library has good ease of use and flexibility, which can help us process tabular data more efficiently.

Guess you like

Origin blog.csdn.net/naer_chongya/article/details/131422513