Python Reptile (xiii) _JSON module JsonPath

Extract the data in JSON and JsonPATH

JSON (JavaScript Object Notation) is a lightweight data interchange format, it is very easy people to read and write. But also facilitate the analysis and generation machine. Suitable for a scene data exchange, data exchange between sites such as foreground and background.

Comparison of JSON and XML can be described as comparable.

Python2.7 comes in JSON module, directly import jsonready for use.
Official blog: http: //docs.python.org/library/json.html

Json parsing Online website: http://www.json.cn/#

JSON

It simply is json javascript objects and arrays, so both objects and data structures is the two structures, two structures can be represented by a variety of complex structures.

  1. Object: objects are represented as {} enclosed content, the data structure in the js {key: value, key: value, ...} of the key structure, in object-oriented languages, key property of the object , value of the corresponding property value, so it is readily understood that the method of the object value .key getting a property value, the type of the attribute values ​​may be numbers, strings, arrays, these types of objects.
  2. Array: An array is square brackets [] enclosed js content, the data structure [ "Python", "javascript", "C ++", ..], and mode values ​​of all languages, the use of index obtaining, field type values ​​may be numbers, strings, arrays, several objects.

import json

json module provides four functions: dumps, , dump, loads, loadand python for string data type conversion key.

1、json.loads()

Json converted into the format string decoded from the Python object to json Python type conversion control as follows:

# Json_loads.py 

Import JSON 


strlist = ' [. 1, 2,. 3,. 4] ' 

strDict = ' { "City": "Beijing", "name": "Big Cat"} ' 

json.loads (strlist) 
for STR in strlist:
     Print (STR)
 # [. 1, 2,. 3,. 4] 

json.loads (strDict)   # JSON data is automatically stored as Unicode 
# {u'city ': U' \ u5317 \ u4eac ', u'name': U '\ u5927 \ u732b'}

2、json.dumps()

Python json type implemented into a string, returns a str object. Converting a Python object Json encoded as a string, the following conversion table from the original type python json Type:

# Json_dumps.py 

# - * - Coding: UTF-. 8 - * - 

Import JSON
 Import the chardet 

listStr = [. 1, 2,. 3,. 4 ] 
tupleStr = (. 1, 2,. 3,. 4 ) 
dictStr   = { " City " : " Beijing " , " name " : " big cat " } 

Print (type (json.dumps (listStr)))
 # '[. 1, 2,. 3,. 4]' 

Print (type (json.dumps (tupleStr)))
 # ' [1, 2, 3, 4] ' 

# Note that, json.dumps () coding default serialization ascii 
#Add the parameter ensure_ascii = False, disabling encoded ascii, utf-8 encoded by 
# chardet.detect () Returns the dictionary, where confidence is the detection accuracy. 

Print (json.dumps (dictStr))
 # '{ "City": "\\ \\ u5317 u4eac", "name": "\\ \\ u5218 u5927"}' 

Print (chardet.detect (json.dumps (dictStr ))) 

Print (json.dumps (dictStr, ensure_ascii = False)) 

Print (chardet.detect (json.dumps (dictStr, ensure_ascii = False)))

 

chardet encoding is a very good identification module, can be installed by pip

3. json.dump()

After the file is written to the built-in Python type object is serialized to json

#json_dump.py

import json

listStr = [{"city":"北京"}, {"name":"大刘"}]

json.dump(listStr, open("listStr.json", "w"), ensure_ascii=False)

dictStr = {"city":"北京", "name":"大刘"}
json.dump(dictStr.open("dictStr.json", "w"), ensure_ascii=False)

 

4.json.load()

Json read file is converted into the form of a string element type python

#-*- coding:utf-8 -*-


import json

strList = json.load(open("listStr.json"))
print strList
# [{u'city': u'\u5317\u4eac'}, {u'name': u'\u5927\u5218'}]

strDict = json.load(open("dictStr.json"))
print strDict
# {u'city': u'\u5317\u4eac', u'name': u'\u5927\u5218'}

 

JsonPath

JsonPath is an information extraction library, is to extract specific information from the JSON document tool that provides a variety of reasons to achieve capital preservation: JavaScript / Python / PHP and Java

JsonPath For JSON, the equivalent XPATH for XML

Download: https: //pypi.python.org/pypi/jsonpath
installation: Perform After clicking Download URL link to download jsonpath, unzip python setup.py install
the official document: http: //goessner.net/articles/JsonPath

JsonPath contrast with XPath syntax:

Json clear structure, high readability, low complexity, very easy to match the corresponding table usage XPath.

Xpath JSONPath description
/ $ With node
. @ The current node
/ . or [] Take the child nodes
.. n/a That is, regardless of location, select all the qualifying conditions
* * Matches all element nodes
[] [] Flag iterator (simple iteration operation can be done on the inside, such as the array index, according to the content selected value, etc.)
&#124 [,] Support iterator do multiple choice
[] ?() Support filtering
n/a () Support for expression evaluation
() n/a Grouping, JsonPath not supported

Example:

Our city JSON file http://www.lagou.com/lbs/getAllCitySearchLabels.json pull hook network, for example, access to all cities.

# - * - Coding: UTF-. 8 - * - 

Import urllib2
 Import JSON
 Import jsonpath
 Import the chardet 


URL = " http://www.lagou.com/lbs/getAllCitySearchLabels.json " 
Request = urllib2.Request (URL) 

Response   = urllib2 .urlopen (Request) 

HTML = response.read () 

# converts the format string to python json objects 
jsonobj = json.loads (HTML) 

# starting from the root, the node name matching 
CityList = jsonpath.jsonpath (jsonobj, ' $. .name ' ) 

Print CityList
 Print(type(citylist))

fp = open('city.json', 'w')


content = json.dumps(citylist, ensure_ascii=False)
print content
fp.write(content.encode('utf-8'))

fp.close()

 

Precautions:

json.loads () is converted into a decoded format string Json Python object, if at the time json.loads error codes are decoded to be noted Json characters.

If the incoming string encoding is not UTF-8, then the need for character encoding parameters:encoding

dataDict = json.loads(jsonStrGBK);
  • dataJsonStr JSON string is assumed that the code itself is non-UTF-8 but GBK words, then the above code results in an error, corresponding to.
dataDict = json.loads(jsonStrGBK, encoding="GBK")
  • If appropriate coding dataJsonStr specified by encoding, but which also contains other character encoding, it is necessary to convert dataJsonStr go to Unicode, and how re-encoding format specified call json.loads ()
dataJsonStrUni = data.JsonStr.decode("GB2312")
dataDict = json.loads(dataJsontrUni, encoding="GB2312")

String transcoding

This is the most hard to force the programmer place, almost all the Chinese characters garbled what caused the like.
In fact, the coding problem to get well, just remember one thing:

Any encoding any platform, and Unicode can be interchangeable.

UTF-8 and GBK mutual conversion, it is first converted to Unicode UTF-8, and then converting from Unicode to GBK, empathy and vice versa.

# This is a UTF-8 encoded string 
utf8Str = " Hello Earth " 

# 1 converts the UTF-8 encoded string into Unicode encoding 
unicodeStr = utf8Str.decode ( " UTF-8 " ) 

# 2. Then Unicode encoding format string to convert GBK encoding 
gbkData = unicodeStr.encode ( " GBK " ) 

# 1. GBK encoding format string and then converted to Unicode 
unicodeStr = gbkData.decode ( " GBK " ) 

# 2. then Unicode encoding format . 8-string into a UTF 
utf8Str = unicodeStr.encode ( " UTF-. 8 " )

decodeThe other role is to convert the encoded string into Unicode encoding
encoderole is to convert Unicode encoded encoded string into other
words: UTF-8is an encoding format of the Unicode character set encoding memory

Reference Links: https://www.cnblogs.com/miqi1992/category/1105419.html

Guess you like

Origin www.cnblogs.com/moying-wq/p/11570013.html