First, what is Json?
It simply is json javascript objects and arrays, so the two structures is the object and an array of two structures, two structures can be represented by a variety of complex structures.
- Object: Object js expressed as the
{ }
content enclosed, the data structure{ key:value, key:value, ... }
of the key-on configuration, in object-oriented languages, key attributes for the object, value of the corresponding property value, so it is easily understood, the value .key method of getting a property value for the object, the type of the attribute value can be a number, character strings, arrays, these types of objects.
- Array: An array is in brackets in js
[ ]
enclosed content, data structure["Python", "javascript", "C++", ...]
, and mode values in all languages, the use of index obtaining, value type field can be numbers, strings, arrays, several objects.
Two, Json basic functions
json module provides four functions: dumps
, , dump
, loads
, load
and for inter-python string data type conversion.
1.json.loads()
Converting the format string is decoded into Json objects from Python python json to control the type of conversion as follows:
# Json_loads.py Import JSON strlist = ' [. 1, 2,. 3,. 4] ' strDict = ' { "City": "Beijing", "name": "Big Cat"} ' json.loads (strlist) # [. 1 , 2,. 3,. 4] json.loads (strDict) # JSON data is automatically stored as Unicode # {u'city ': U' \ u5317 \ u4eac ', u'name': U '\ u5927 \ u732b'}
2.json.dumps()
Python json type implemented into a string, str returns a target object is to convert a Python string encoded as Json
The conversion from the original type python json types of control are as follows :
# Json_dumps.py Import JSON Import the chardet listStr = [. 1, 2,. 3,. 4 ] tupleStr = (. 1, 2,. 3,. 4 ) dictStr = { " City " : " Beijing " , " name " : " Big Cat " } json.dumps (listStr) # '[. 1, 2,. 3,. 4]' json.dumps (tupleStr) # '[. 1, 2,. 3,. 4]' # Note: json.dumps () when the default serialization encoding the ascii # add the parameter coding ensure_ascii = False disable ascii, utf-8 encoded by # chardet.detect()返回字典, 其中confidence是检测精确度 json.dumps(dictStr) # '{"city": "\\u5317\\u4eac", "name": "\\u5927\\u5218"}' chardet.detect(json.dumps(dictStr)) # {'confidence': 1.0, 'encoding': 'ascii'} print json.dumps(dictStr, ensure_ascii=False) # {"city": "北京", "name": "大刘"} chardet.detect(json.dumps(dictStr, ensure_ascii=False)) # {'confidence': 0.99, 'encoding': 'utf-8'}
3.json.dump()
After the file is written to the built-in Python type object is serialized to json
# json_dump.py import json listStr = [{"city": "北京"}, {"name": "大刘"}] json.dump(listStr, open("listStr.json","w"), ensure_ascii=False) dictStr = {"city": "北京", "name": "大刘"} json.dump(dictStr, open("dictStr.json","w"), ensure_ascii=False)
4.json.load()
Json read file is converted into the form of a string element type python
# json_load.py import json strList = json.load(open("listStr.json")) print strList # [{u'city': u'\u5317\u4eac'}, {u'name': u'\u5927\u5218'}] strDict = json.load(open("dictStr.json")) print strDict # {u'city': u'\u5317\u4eac', u'name': u'\u5927\u5218'}
Three, JsonPath
1.JsonPath rules
XPath | JSONPath | description |
---|---|---|
/ |
$ |
Root |
. |
@ |
The current node |
/ |
. or[] |
Take the child nodes |
.. |
n/a | Take the parent node, Jsonpath not support |
// |
.. |
That is, regardless of location, select all the qualifying conditions |
* |
* |
Matches all element nodes |
@ |
n/a | According to property access, Json is not supported, because Json Key-value is a recursive structure, no. |
[] |
[] |
Flag iterator (simple iterative operation can be done in it, such as an array index, according to the content selected value, etc.) |
| | [,] |
Support iterator make multiple selections. |
[] |
?() |
Support filtering. |
n/a | () |
Support for expression evaluation |
() |
n/a | Grouping, JsonPath not supported |
2. Examples
# jsonpath_lagou.py import urllib2 import jsonpath import json import chardet url = 'http://www.lagou.com/lbs/getAllCitySearchLabels.json' request =urllib2.Request(url) response = urllib2.urlopen(request) html = response.read() # 把json格式字符串转换成python对象 jsonobj = json.loads(html) # 从根节点开始,匹配name节点 citylist = jsonpath.jsonpath(jsonobj,'$..name') print citylist print type(citylist) fp = open('city.json','w') content = json.dumps(citylist, ensure_ascii=False) print content fp.write(content.encode('utf-8')) fp.close()
注意事项:
##字符串编码转换 这是中国程序员最苦逼的地方,什么乱码之类的几乎都是由汉字引起的。 其实编码问题很好搞定,只要记住一点: ####任何平台的任何编码 都能和 Unicode 互相转换 UTF-8 与 GBK 互相转换,那就先把UTF-8转换成Unicode,再从Unicode转换成GBK,反之同理。 # 这是一个 UTF-8 编码的字符串 utf8Str = "你好地球" # 1. 将 UTF-8 编码的字符串 转换成 Unicode 编码 unicodeStr = utf8Str.decode("UTF-8") # 2. 再将 Unicode 编码格式字符串 转换成 GBK 编码 gbkData = unicodeStr.encode("GBK") # 3. 再将 GBK 编码格式字符串 转化成 Unicode unicodeStr = gbkData.decode("gbk") # 4. 再将 Unicode 编码格式字符串转换成 UTF-8 utf8Str = unicodeStr.encode("UTF-8")