Extract the data in JSON and JsonPATH
JSON (JavaScript Object Notation) is a lightweight data interchange format, it is very easy people to read and write. But also facilitate the analysis and generation machine. Suitable for a scene data exchange, data exchange between sites such as foreground and background.
Comparison of JSON and XML can be described as comparable.
Python2.7 comes in JSON module, directly import json
ready for use.
Official blog: http: //docs.python.org/library/json.html
Json parsing Online website: http://www.json.cn/#
JSON
It simply is json javascript objects and arrays, so both objects and data structures is the two structures, two structures can be represented by a variety of complex structures.
- Object: objects are represented as {} enclosed content, the data structure in the js {key: value, key: value, ...} of the key structure, in object-oriented languages, key property of the object , value of the corresponding property value, so it is readily understood that the method of the object value .key getting a property value, the type of the attribute values may be numbers, strings, arrays, these types of objects.
- Array: An array is square brackets [] enclosed js content, the data structure [ "Python", "javascript", "C ++", ..], and mode values of all languages, the use of index obtaining, field type values may be numbers, strings, arrays, several objects.
import json
json module provides four functions: dumps
, , dump
, loads
, load
and python for string data type conversion key.
1、json.loads()
Json converted into the format string decoded from the Python object to json Python type conversion control as follows:
# Json_loads.py Import JSON strlist = ' [. 1, 2,. 3,. 4] ' strDict = ' { "City": "Beijing", "name": "Big Cat"} ' json.loads (strlist) for STR in strlist: Print (STR) # [. 1, 2,. 3,. 4] json.loads (strDict) # JSON data is automatically stored as Unicode # {u'city ': U' \ u5317 \ u4eac ', u'name': U '\ u5927 \ u732b'}
2、json.dumps()
Python json type implemented into a string, returns a str object. Converting a Python object Json encoded as a string, the following conversion table from the original type python json Type:
# Json_dumps.py # - * - Coding: UTF-. 8 - * - Import JSON Import the chardet listStr = [. 1, 2,. 3,. 4 ] tupleStr = (. 1, 2,. 3,. 4 ) dictStr = { " City " : " Beijing " , " name " : " big cat " } Print (type (json.dumps (listStr))) # '[. 1, 2,. 3,. 4]' Print (type (json.dumps (tupleStr))) # ' [1, 2, 3, 4] ' # Note that, json.dumps () coding default serialization ascii #Add the parameter ensure_ascii = False, disabling encoded ascii, utf-8 encoded by # chardet.detect () Returns the dictionary, where confidence is the detection accuracy. Print (json.dumps (dictStr)) # '{ "City": "\\ \\ u5317 u4eac", "name": "\\ \\ u5218 u5927"}' Print (chardet.detect (json.dumps (dictStr ))) Print (json.dumps (dictStr, ensure_ascii = False)) Print (chardet.detect (json.dumps (dictStr, ensure_ascii = False)))
chardet encoding is a very good identification module, can be installed by pip
3. json.dump()
After the file is written to the built-in Python type object is serialized to json
#json_dump.py import json listStr = [{"city":"北京"}, {"name":"大刘"}] json.dump(listStr, open("listStr.json", "w"), ensure_ascii=False) dictStr = {"city":"北京", "name":"大刘"} json.dump(dictStr.open("dictStr.json", "w"), ensure_ascii=False)
4.json.load()
Json read file is converted into the form of a string element type python
#-*- coding:utf-8 -*- import json strList = json.load(open("listStr.json")) print strList # [{u'city': u'\u5317\u4eac'}, {u'name': u'\u5927\u5218'}] strDict = json.load(open("dictStr.json")) print strDict # {u'city': u'\u5317\u4eac', u'name': u'\u5927\u5218'}
JsonPath
JsonPath is an information extraction library, is to extract specific information from the JSON document tool that provides a variety of reasons to achieve capital preservation: JavaScript / Python / PHP and Java
JsonPath For JSON, the equivalent XPATH for XML
Download: https: //pypi.python.org/pypi/jsonpath
installation: Perform After clicking Download URL link to download jsonpath, unzip python setup.py install
the official document: http: //goessner.net/articles/JsonPath
JsonPath contrast with XPath syntax:
Json clear structure, high readability, low complexity, very easy to match the corresponding table usage XPath.
Xpath | JSONPath | description |
---|---|---|
/ | $ | With node |
. | @ | The current node |
/ | . or [] | Take the child nodes |
.. | n/a | That is, regardless of location, select all the qualifying conditions |
* | * | Matches all element nodes |
[] | [] | Flag iterator (simple iteration operation can be done on the inside, such as the array index, according to the content selected value, etc.) |
| | [,] | Support iterator do multiple choice |
[] | ?() | Support filtering |
n/a | () | Support for expression evaluation |
() | n/a | Grouping, JsonPath not supported |
Example:
Our city JSON file http://www.lagou.com/lbs/getAllCitySearchLabels.json pull hook network, for example, access to all cities.
# - * - Coding: UTF-. 8 - * - Import urllib2 Import JSON Import jsonpath Import the chardet URL = " http://www.lagou.com/lbs/getAllCitySearchLabels.json " Request = urllib2.Request (URL) Response = urllib2 .urlopen (Request) HTML = response.read () # converts the format string to python json objects jsonobj = json.loads (HTML) # starting from the root, the node name matching CityList = jsonpath.jsonpath (jsonobj, ' $. .name ' ) Print CityList Print(type(citylist)) fp = open('city.json', 'w') content = json.dumps(citylist, ensure_ascii=False) print content fp.write(content.encode('utf-8')) fp.close()
Precautions:
json.loads () is converted into a decoded format string Json Python object, if at the time json.loads error codes are decoded to be noted Json characters.
If the incoming string encoding is not UTF-8, then the need for character encoding parameters:encoding
dataDict = json.loads(jsonStrGBK);
- dataJsonStr JSON string is assumed that the code itself is non-UTF-8 but GBK words, then the above code results in an error, corresponding to.
dataDict = json.loads(jsonStrGBK, encoding="GBK")
- If appropriate coding dataJsonStr specified by encoding, but which also contains other character encoding, it is necessary to convert dataJsonStr go to Unicode, and how re-encoding format specified call json.loads ()
dataJsonStrUni = data.JsonStr.decode("GB2312") dataDict = json.loads(dataJsontrUni, encoding="GB2312")
String transcoding
This is the most hard to force the programmer place, almost all the Chinese characters garbled what caused the like.
In fact, the coding problem to get well, just remember one thing:
Any encoding any platform, and Unicode can be interchangeable.
UTF-8 and GBK mutual conversion, it is first converted to Unicode UTF-8, and then converting from Unicode to GBK, empathy and vice versa.
# This is a UTF-8 encoded string utf8Str = " Hello Earth " # 1 converts the UTF-8 encoded string into Unicode encoding unicodeStr = utf8Str.decode ( " UTF-8 " ) # 2. Then Unicode encoding format string to convert GBK encoding gbkData = unicodeStr.encode ( " GBK " ) # 1. GBK encoding format string and then converted to Unicode unicodeStr = gbkData.decode ( " GBK " ) # 2. then Unicode encoding format . 8-string into a UTF utf8Str = unicodeStr.encode ( " UTF-. 8 " )
decode
The other role is to convert the encoded string into Unicode encoding
encode
role is to convert Unicode encoded encoded string into other
words: UTF-8
is an encoding format of the Unicode character set encoding memory
Reference Links: https://www.cnblogs.com/miqi1992/category/1105419.html