Python Cookbook学习笔记ch6_01

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/FANGLICHAOLIUJIE/article/details/82460940

第六章了,坚持!本章主要涉及一些文本读写和编码。here更好看

6.1读写CSV文件

  • 问题:如何读写csv格式的文件
  • 方案:使用csv库
  • 函数:reader()、DictReader()、writer()、DictWriter()、writerow()、writerows()
import csv
with open('data_file/stocks.csv','r') as f:
    f_csv = csv.reader(f)
    headers = next(f_csv)
    for row in f_csv:
        print(row)
['AA', '39.48', '6/11/2007', '9:36am', '-0.18', '181800']
['AIG', '71.38', '6/11/2007', '9:36am', '-0.15', '195500']
['AXP', '62.58', '6/11/2007', '9:36am', '-0.46', '935000']
  • 可以使用命名元祖来使访问更加清晰
from collections import namedtuple
with open('data_file/stocks.csv') as f:
    f_csv = csv.reader(f)
    headings = next(f_csv)
    Row = namedtuple('Row',headings)
    for r in f_csv:
        row = Row(*r)
        print(row)
Row(Symbol='AA', Price='39.48', Date='6/11/2007', Time='9:36am', Change='-0.18', Volume='181800')
Row(Symbol='AIG', Price='71.38', Date='6/11/2007', Time='9:36am', Change='-0.15', Volume='195500')
Row(Symbol='AXP', Price='62.58', Date='6/11/2007', Time='9:36am', Change='-0.46', Volume='935000')
  • 或者将数据读入到一个字典
with open('data_file/stocks.csv') as f:
    f_csv = csv.DictReader(f)
    for row in f_csv:
        print(row)
OrderedDict([('Symbol', 'AA'), ('Price', '39.48'), ('Date', '6/11/2007'), ('Time', '9:36am'), ('Change', '-0.18'), ('Volume', '181800')])
OrderedDict([('Symbol', 'AIG'), ('Price', '71.38'), ('Date', '6/11/2007'), ('Time', '9:36am'), ('Change', '-0.15'), ('Volume', '195500')])
OrderedDict([('Symbol', 'AXP'), ('Price', '62.58'), ('Date', '6/11/2007'), ('Time', '9:36am'), ('Change', '-0.46'), ('Volume', '935000')])
  • 写入到csv文件
headers = ['Symbol','Price','Date','Time','Change','Volume']
rows = [('AA', 39.48, '6/11/2007', '9:36am', -0.18, 181800),
('AIG', 71.38, '6/11/2007', '9:36am', -0.15, 195500),
('AXP', 62.58, '6/11/2007', '9:36am', -0.46, 935000),
]
with open('data_file/stocks2.csv','w') as f:
    f_csv = csv.writer(f)
    f_csv.writerow(headers)
    f_csv.writerow(rows)
  • 如果有一个字典数据,写入到csv文件中
headers = ['Symbol', 'Price', 'Date', 'Time', 'Change', 'Volume']
rows = [{'Symbol':'AA', 'Price':39.48, 'Date':'6/11/2007',
'Time':'9:36am', 'Change':-0.18, 'Volume':181800},
{'Symbol':'AIG', 'Price': 71.38, 'Date':'6/11/2007',
'Time':'9:36am', 'Change':-0.15, 'Volume': 195500},
{'Symbol':'AXP', 'Price': 62.58, 'Date':'6/11/2007',
'Time':'9:36am', 'Change':-0.46, 'Volume': 935000},
]
with open('data_file/stocks3.csv','w') as f:
    f_csv = csv.DictWriter(f,headers)
    f_csv.writeheader()
    f_csv.writerows(rows)
  • 优先选择csv模块处理csv文件
with open('data_file/stocks.csv','r') as f:
    for line in f:
        r = line.split(',')
        print(r)
['Symbol', 'Price', 'Date', 'Time', 'Change', 'Volume\n']
['"AA"', '39.48', '"6/11/2007"', '"9:36am"', '-0.18', '181800\n']
['"AIG"', '71.38', '"6/11/2007"', '"9:36am"', '-0.15', '195500\n']
['"AXP"', '62.58', '"6/11/2007"', '"9:36am"', '-0.46', '935000']
  • 读取以tab分隔的数据
with open('data_file/stocks.csv','r') as f:
    f_csv = csv.reader(f,delimiter='\t')
    for row in f_csv:
        print(row)
['Symbol,Price,Date,Time,Change,Volume']
['AA,39.48,"6/11/2007","9:36am",-0.18,181800']
['AIG,71.38,"6/11/2007","9:36am",-0.15,195500']
['AXP,62.58,"6/11/2007","9:36am",-0.46,935000']
  • csv产生的数据都是字符串类型的,如果需要进行转换,需要手动进行
col_types = [str, float, str, str, float, int]
with open('data_file/stocks.csv') as f:
    f_csv = csv.reader(f)
    headers = next(f_csv)
    for row in f_csv:
        row = tuple(convert(value) for convert,value in zip(col_types,row))
        print(row)
('AA', 39.48, '6/11/2007', '9:36am', -0.18, 181800)
('AIG', 71.38, '6/11/2007', '9:36am', -0.15, 195500)
('AXP', 62.58, '6/11/2007', '9:36am', -0.46, 935000)
  • 转换字典中特定字段
filed_types = [('Price',float),
              ('Change',float),
              ('Volume',int)
              ]
with open('data_file/stocks.csv','r') as f:
    for row in csv.DictReader(f):
        row.update((key,conversion(row[key])) for key, conversion in filed_types)
        print(row)
OrderedDict([('Symbol', 'AA'), ('Price', 39.48), ('Date', '6/11/2007'), ('Time', '9:36am'), ('Change', -0.18), ('Volume', 181800)])
OrderedDict([('Symbol', 'AIG'), ('Price', 71.38), ('Date', '6/11/2007'), ('Time', '9:36am'), ('Change', -0.15), ('Volume', 195500)])
OrderedDict([('Symbol', 'AXP'), ('Price', 62.58), ('Date', '6/11/2007'), ('Time', '9:36am'), ('Change', -0.46), ('Volume', 935000)])

6.2读写JSON文件

  • 问题:如何读写JSON(JavaScript Object Notation)编码格式的数据
  • 方案:使用json模块
  • 函数:json.dumps()、json.loads()

  • 将python数据结构转化为json

import json
data = {'name':'ACME',
       'shares':100,
       'price':345.24
       }
json_str = json.dumps(data)
json_str
'{"name": "ACME", "shares": 100, "price": 345.24}'
  • 将json编码的字符串转化为python数据结构
data = json.loads(json_str)
data
{'name': 'ACME', 'shares': 100, 'price': 345.24}
  • 如果处理的是文件而不是字符串,可以使用json.dump()和json.load()来编码和解码json数据
with open('data_file/data.json','w') as f:
    json.dump(data,f)
with open('data_file/data.json','r') as f:
    data2 = json.load(f)
    print(data2)
{'name': 'ACME', 'shares': 100, 'price': 345.24}
  • 笔记:JSON编码支持的数据类型有None, bool, int, float, str,以及包含这些类型的lists, tuples, dictionaries。对应dictionaries,key 必须是str,否则会在编码时转化为str。
  • JSON 与python的映射关系:True->true, False->false, None->null
json.dumps(False)
'false'
json.dumps(True)
'true'
  • JSON 会根据数据创建dicts或者lists。如果想要创建其它类型的对象,可以给loads()传递一个object_pairs_hook或者object_hook参数
s = '{"name":"ACME","shares":"90","price":"23.23"}'
from collections import OrderedDict
data = json.loads(s,object_pairs_hook=OrderedDict)
data
OrderedDict([('name', 'ACME'), ('shares', '90'), ('price', '23.23')])
  • 将一个JSON字典转化为一个Python对象
s = '{"name":"ACME","shares":"90","price":"23.23"}'
class JSONObject:
    def __init__(self,d):
        self.__dict__ = d
data = json.loads(s,object_hook = JSONObject)
data.name
'ACME'
data.shares
'90'
  • 如果想获得漂亮的格式化字符串后输出,可以使用json.dumps()的indent参数
  • 对象实例通常是不可序列化的
class Point:
    def __init__(self,x,y):
        self.x = x
        self.y = y
p = Point(2,3)
# 下面会报错误
json.dumps(p)#TypeError: Object of type 'Point' is not JSON serializable
  • 如果想要序列化对象实例,可以定义一个函数,它的输入是实例,输出是一个可序列化的字典
def serialize_instance(obj):
    d = {'__classname__':type(obj).__name__}
    d.update(vars(obj))
    return d
p = Point(2,3)
s = json.dumps(p,default=serialize_instance)
s
'{"__classname__": "Point", "x": 2, "y": 3}'
  • 如果想要反过来获取这个实例,可以:
classes = {'Point':Point}
def unserialize_object(d):
    clsname = d.pop('__classname__',None)
    if clsname:
        cls = classes[clsname]
        obj = cls.__new__(cls)
        for key, value in d.items():
            setattr(obj,key,value)
        return obj
    else:
        return d
a = json.loads(s,object_hook=unserialize_object)
a
<__main__.Point at 0x762130>
a.x
2
a.y
3

6.3解析简单的XML数据

  • 问题:从一个简单的XML文档中提取数据
  • 方案:使用xml.etree.ElementTree
from urllib.request import urlopen
from xml.etree.ElementTree import parse
u = urlopen('https://planetpython.org/rss20.xml')
doc = parse(u)
for item in doc.iterfind('channel/item'):
    title = item.findtext('title')
    date = item.findtext('pubDate')
    link = item.findtext('link')
#考虑到输出内容太多,将下面的三条打印语句移出for循环,只输出最后一次迭代的内容
#自己做实验时应该将其移入for循环中
print(title)    
print(date)
print(link)  
Real Python: Structuring Python Programs
Mon, 03 Sep 2018 14:00:00 +0000
https://realpython.com/python-program-structure/
  • ElementTree模块有一些重要的属性和方法。tag属性包含了标签的名字,text属性包含了内部的文本,get()方法能够获取属性值
doc
<xml.etree.ElementTree.ElementTree at 0x631c750>
e = doc.find('channel/title')
e
<Element 'title' at 0x06577D80>
e.tag
'title'
e.text
'Planet Python'
e.get('some_attribute')

猜你喜欢

转载自blog.csdn.net/FANGLICHAOLIUJIE/article/details/82460940
今日推荐