Learning xml and json in python

Python notes from teacher Dana

Structured file storage

  • xml, json,
  • In order to solve the information exchange between different devices

xml

  • XML file

  • Reference

    • https://docs.python.org/3/library/xml.etree.elementtree.html
    • https://www.runoob.com/python/python-xml.html
    • https://blog.csdn.net/seetheworld518/article/details/49535285
  • XML: Extensible Markup Language

    • Markup language: The language is marked with a text string enclosed in angle brackets

    • Extensible: users can define their own marks

    • For example:

      custom tag Teacher
      , anything between the two tags should be related to Teacher

    • Is a standard developed by the w3c organization

    • XML describes the data itself, that is, the structure and semantics of the data

    • HTML focuses on how to display the data of the web page

  • The composition of XML documents

    • Processing instruction (it can be considered that there is only one processing instruction in a file), at most there is only one line, and must be in the first line, the content is some declarations or instructions related to the processing of xml itself, starting with the xml keyword, generally used to declare the version of xml And the encoding used, the version attribute is required, the encoding attribute is used to indicate the encoding used by the xml interpreter, if not, the default is used

    • Root element

      • As a tree structure
      • There is only one root element in a file
    • Child element

    • Attributes

    • content

      • Indicates the information stored by the tag
    • Comment

      • Informative information
      • Comments cannot be nested inside tags
      • Only use double dashes at the beginning and end of comments
      • There can be a dash in the comment
      • Three dashes can only appear at the beginning of the comment and not at the end
    • Handling of reserved characters

      • The symbols used in XML may conflict with actual symbols, typically left and right front brackets
      • Use entity references (escaping) to represent reserved characters
        • For example, use score & gt; 80 instead of score>80
      • Put the part containing reserved characters inside the CDATA block, and the CDATA block treats the internal information as no need to escape
        • <![CDATA[ select name, age(sql语言) from Student where score>80 ]]>
        • Common reserved characters and corresponding entity references that need to be escaped-
          &: &
          ->: >
          -<: <
          -': &apos-
          ": "
          -There are five in total, each entity reference starts with & and starts with; end

    • XML file naming rules

      • pacal nomenclature
      • In words, the first letter is capitalized
      • Strict case sensitivity
      • The paired labels must be the same
<?xml version="1.0" ?>  <!--这是处理指令-->
<Student type="online" loc="beijing"> <!--最高级的只能有一个,内部可以写入属性-->
	<name sex="lalala">haha</name>
	<age>18</age>
</Student>
  • Namespaces
    • To prevent naming conflicts
    • In order to avoid conflicts, you need to add namespaces to elements that may conflict
    • xmlns: short for xml name space
<!--这是就添加了命名空间stuent和room,不然Name的命名会冲突-->
<Schooler xmlns:student="http://my_student" xmlns:room="hettp://my_room">
	<student:Name>LiuYing</student:Name>
	<Age>23</age>
	<room:Name>2014</room:Name>
	<Location>1-23-1</Location>
<Schooler>
  • XML access
    • Read
      • XML reading is divided into two main technologies, SAX and DOM
      • SAX(Simple API for XML):
        • Event-driven API
        • Using SAX to parse the document design to the parser and time processing two parts
        • Features
          • fast
          • Streaming reading
      • JUDGMENT
        • W3C stipulates that XML becomes an interface
        • An XML file is stored in a tree structure in the cache and read
        • Position and browse any node information in XML
        • Add and delete corresponding content and
        • minidom
          • minidom.parse(filename): Load the read xml file, filename can also be xml code
          • doc.documentElement: Get xml document object, an xml file has only one corresponding document object
          • node.getAttribute(attr_name): Get the attribute value of the xml node
          • node.getElementByTagName(tage_name): Get a collection of node objects
          • node.childNodes: get all child nodes
          • node.childNodes[index].nodeValue: Get the value of a single node
          • node.firstNode: get the first node, which is equivalent to node.childNodes[0]
          • node.attributes[tage_name]
        • etree
          • Represent xml in a tree structure
          • root.getiterator: get the corresponding iterable collection of nodes
          • root.iter: Same as above
          • find(node_name): Find the node with the specified node_name, and return a node
          • root.findall(node_name): return multiple node_name nodes
          • node.tag: the tagname corresponding to the node
          • node.text: the text value of the node
          • node.attrib: is the dictionary type content of node attributes

The tree structure is as follows

<?xml version="1.0" encoding="utf-8"?>
<School>
	<Teacher>
		<Name>LiuDana</Name>
		<age detail="Age for year 2010">18</Age>
		<Mobile>13260446055<Mobile>
	</Teacher>
	<Student>
		<Name Other="他是班长">ZhangSan</Name>
		<Age Detail="The youngest boy in class">14</Age>
	<Student>
	<Student>
		<Name>LiSi</Name>
		<Age>19</Age>
		<Mobile>15578875040</Mobile>
	</Student>
</School>

Insert picture description here
Minidom case

import xml.dom.minidom
#负责解析xml文件
from xml.dom.minidom import parse

#使用minidom打开xml文件
DOMTree = xml.dom.minidom.parse("student.xml")
#得到文档对象
doc = DOMTree.documentElement

#显示子元素
for ele in doc.childNodes:
	if ele.nodeName == "Teacher":
		print("----Node:{}----".format(ele.nodeName))
		childs = ele.childNodes
		for child in childs:
			if child.nodeName == "Name":
			#data是文本节点的一个属性,表示他的值
				print("Name:{}".format(child.childNodes[0].data))
			if child.nodeName == "Mobile":
			#data是文本节点的一个属性,表示他的值
				print("Mobile:{}".format(childNodes[0].data))
			#data是文本节点的一个属性,表示他的值
			if child.nodeName == "Age"
				print("Age-detail:{}".format(child.getAttribute("detail"))
				

etree case

import xml.etree.ElemenTree

root = xml.etree.ElemenTree.parse("student.xml")
print("利用getiterator访问:")
nodes = root.getiterator()
for node in nodes:
	print("{}---{}".format(node.tag,node.text))


print("利用find和findall方法")
ele_teacher = root.find("Teacher")
print("{}---{}".format(ele_teacher.tag,ele_teacher.text))


ele_stus = root.findall("Student")
for ele in ele_stus:
	print("{}---{}".format(ele.tag,ele.text))
	for sub in ele.getiterator()
		if sub.tag == "Name":
			if "Other" in sub.attrib.keys():
				print(sub.attrib['Other']
  • xml file writing
    • change
      • ele.set: modify attributes
      • ele.append: Append element
      • ele.remove: remove element
    • Build create

xml file modification case

import xml.etree.ElemenTree as et

tree = et.parse(r'to_edit.xml')

root = tree.getroot()

for e in root.iter("Name")
	print(e.text)


for stu in root.iter("Student"):
	name = stu.find('Name') # 去找这个节点

	if name != None :
		name.set('text',name.text*2) # 将text文本重复两次

stu = root.find("Student")


#生成一个新的元素
e = et.Element('ADDer')
e.attrib = {
    
    'a':'b'}
e.text = "我加的"

sut.append(e)

#一定要把修改后的内容写回文件 , 否则修改无效
tree.write('to_edit.xml')
  • xml file generation/creation
    • SubElement

Look at the xml file generation case

import xml.etree.ElemenTree as et

stu = et.Element("Student")

name = et.SubElement(stu,"Name") #生成一个儿子,属于stu,名字叫Name
name.attrib = {
    
    "lang" , "en"} #属性
name.text = "emmmm" #文本

age = et.SubElement(stu,'Age')
age.text = 18


et.dump()

etree write case

import xml.etree.ElemenTree as et


# 在内存中创建一个空的文档

etree = et.ElementTree()

e = et.Element("Student")

etree._setroot(e) # 将这个设置为根

e_name = et.SubElement(e,'Name')
e_name.text = "hahaa"

etree.write('v06.xml')

JSON

  • Reference

    • https://www.sojson.com/
    • https://www.w3school.com.cn/json/
    • http://www.runoob.com/json/json-tutorial.html
  • JSON(JavaScriptObjectNotation)

  • Lightweight data exchange format based on ECMAScript

  • The json format is a data set in the form of key-value pairs

    • key: string
    • value: string, number, list, json
    • json is wrapped in braces
    • Key-value pairs are directly separated by parentheses
student={
    
    
 "name" : "wangdapeng",
 "age" : 18,
 "mobile" : "13260446055"
}
  • Correspondence between json and python format

    • String: string
    • Number: Number
    • Queue: list
    • Object: dict
    • Boolean: Boolean (different case)
  • python for json

    • json package
    • Conversion of json and python objects
      • json.dumps(): Encode the data and express the python format into json format
      • json.loads(): Decode data and convert json format to python format
    • python reads json file
      • json.dump(): write content to file
      • json.load(): read the content of the json file into python

json case

improt json


# 此时student是一个dict格式内容,不是json
student={
    
    
	"name" : "liudana",
	"age" : 18,
	"mobile" : "15512321"
	}
print(type(student))

stu_json = json.dumps(student) # 转换成json格式
print(type(stu_json)) 
print("JSON对象:{}".format(stu_json))

stu_dict = json.loads(stu_json) # 转换成python格式
print(type(stu_dict))
print(stu_dict)
import json

data = {
    
    'name':'hahaha','age':12}

with open('t.json','w') as f:
	json.dump(data,f) # 把内容写入文件

with open('t.json','r') as f:
	d = json.load(f) # 把json文件内容写入python
	print(d)

Guess you like

Origin blog.csdn.net/qq_45911278/article/details/112650795