Python read and write excel and other data files

There are many ways for Python to process data files. The types of files that can be operated include text files (csv, txt, json, etc.), excel files, database files, api and other data files.

Here are some ways that python can read and write data files.

1. read、readline、readlines

  • read (): Read the entire file content at once. It is recommended to use read (size) method, the larger the size, the longer the running time

  • readline (): read one line at a time. Used when there is insufficient memory, generally not used

  • readlines (): read the entire file content at once, and return to the list by line to facilitate our traversal

2. Built-in module csv

Python has a built-in csv module for reading and writing csv files. Csv is a comma-delimited file and is one of the most common data storage formats in data science. The csv module can easily complete the reading and writing operations of various volume data. Of course, the large amount of data requires optimization at the code level.

  • csv module read file

# 读取csv文件
import csv
with open('test.csv','r') as myFile:
    lines=csv.reader(myFile)
    for line in lines:
        print (line)
  • csv module write file

import csv
with open('test.csv','w+') as myFile:
    myWriter=csv.writer(myFile)
    # writerrow一行一行写入
    myWriter.writerow([7,8,9])
    myWriter.writerow([8,'h','f'])
    # writerow多行写入
    myList=[[1,2,3],[4,5,6]]
    myWriter.writerows(myList)

 

3. numpy library

  • loadtxt method

loadtxt is used to read text files (including txt, csv, etc.) and compressed files in .gz or .bz2 format, provided that each line of file data must have the same number of values.

import numpy as np
# loadtxt()中的dtype参数默认设置为float
# 这里设置为str字符串便于显示
np.loadtxt('test.csv',dtype=str)
# out:array(['1,2,3', '4,5,6', '7,8,9'], dtype='<U5')
  • load method

numpy dedicated for reading load .npy.npz or the pickledpersistent file.

import numpy as np
# 先生成npy文件
np.save('test.npy', np.array([[1, 2, 3], [4, 5, 6]]))
# 使用load加载npy文件
np.load('test.npy')
'''
out:array([[1, 2, 3],
       [4, 5, 6]])
'''
  • fromfile method

The fromfile method can read simple text data or binary data, and the data comes from the binary data saved by the tofile method. When reading data, the user needs to specify the element type and modify the shape of the array appropriately.

import numpy as np
x = np.arange(9).reshape(3,3)
x.tofile('test.bin')
np.fromfile('test.bin',dtype=np.int)
# out:array([0, 1, 2, 3, 4, 5, 6, 7, 8])

 

4. The pandas library

Pandas is one of the most commonly used analysis libraries for data processing. It can read data files in various formats and generally output dataframe formats. Such as: txt, csv, excel, json, clipboard, database, html, hdf, parquet, pickled files, sas, stata, etc.

  • The read_csv method The read_csv method is used to read the csv format file and output the dataframe format.

import pandas as pd
pd.read_csv('test.csv')
  • read_excel method

Read excel files, including xlsx, xls, xlsm format

import pandas as pd
pd.read_excel('test.xlsx')
  • read_table method

Read any text file by controlling the sep parameter (separator)

  • read_json method

Read json format file

df = pd.DataFrame([['a', 'b'], ['c', 'd']],index=['row 1', 'row 2'],columns=['col 1', 'col 2'])
j = df.to_json(orient='split')
pd.read_json(j,orient='split')
  • read_html method

Read html table

  • read_clipboard method

Read clipboard content

  • read_pickle method

Read plckled persistent files

  • read_sql method

Read the database data, after connecting to the database, you can pass in the sql statement

  • read_dhf method

Read hdf5 files, suitable for reading large files

  • read_parquet method

Read parquet file

  • read_sas method

Read sas file

  • read_stata method

Read stata file

  • read_gbq method

Read google bigquery data

Pandas learning website: https://pandas.pydata.org/

5. Read and write excel files

There are many Python libraries for reading and writing excel files. In addition to the aforementioned pandas, there are xlrd, xlwt, openpyxl, xlwings and so on.

Main modules:

  • xlrd library

Read data from excel, support xls, xlsx

  • xlwt library

Modify excel, do not support the modification of xlsx format

  • xlutils library

In xlw and xlrd, modify an existing file

  • openpyxl

Mainly read and edit excel in xlsx format

  • xlwings

Read, write, and modify formats such as xlsx, xls, and xlsm format files

  • xlsxwriter

Used to generate excel tables, insert data, insert icons and other table operations, does not support reading

  • Microsoft Excel API

Need to install pywin32, communicate directly with the Excel process, can do anything that can be done in Excel, but it is slow

6. Operate the database

Python almost supports the interaction of all databases. After connecting to the database, you can use the SQL statement to add, delete, modify and check.

Main modules:

  • pymysql

Used to interact with the mysql database

  • sqlalchemy

Used to interact with the mysql database

  • cx_Oracle

Used to interact with oracle database

  • sqlite3

Built-in library for interaction with sqlite database

  • pymssql

Used to interact with the sql server database

  • pymongo

Used to interact with mongodb non-relational database

  • repeat 、 pyredis

Used to interact with redis non-relational database

Published 117 original articles · 69 praises · 10,000+ views

Guess you like

Origin blog.csdn.net/zsd0819qwq/article/details/105321881