Data cleaning (add provinces)

Use python to match names provinces,

Design idea: use Baidu api achieve the positioning of latitude and longitude, latitude and longitude and then matched by provinces

 

1. Read the name of the place from the text

# Extraction area 
DEF diqu (): 
    F = codecs.open ( ' kjcg.txt ' , MODE = ' R & lt ' , encoding = ' UTF-. 8 ' )   # open txt file to 'utf-8' code reading 
    line = f.readline ()    # read the file in rows 
    the while line: 
        a = line.split () 
        B = a [0:. 1]    # which is to be read selecting bits 
        list.append (B)   # the add it in the list of 
        Line = f.readline () 
    f.close ()

 

2. Call Baidu api achieve latitude and longitude location ------

# Extract latitude and longitude 
DEF geocodeB (address):
     "" " 
    @ address: name string 
    @ Return Value: latitude, longitude 
    " "" 
    the base_url = " http://api.map.baidu.com/geocoder?address={address} & output = json & key = your key " .format (address = address) 

    Response = requests.get (the base_url) 
    answer = response.json () 
    Latitude = answer [ ' Result ' ] [ ' LOCATION ' ] [ ' LNG '] # By 
    longitude answer = [ 'result']['location']['lat'] #

    list_lag.append(latitude)
    list_lng.append(longitude)

 

3. Also use Baidu api latitude and longitude ------- provinces

# Extract the provincial, city, district 
# based on latitude and longitude information in Baidu Maps API to parse the location information 

DEF getLocation (LAT, lng): 

    url = requests.get ( ' http://api.map.baidu.com/geocoder? LOCATION = ' + LAT + ' , ' + LNG + ' & Output = JSON & key = your key ' ) 
    Result = url.json ()
     # Print (Result) 
    City = Result [ ' Result ' ] [ ' addressComponent ' ] [ ' City ' ] 
    Province = Result [ ' Result']['addressComponent']['province']
    district = result['result']['addressComponent']['district']
    # print(city,province,district)
    list_all= [
        '\n' + result['result']['addressComponent']['city'] + ',' +
        result['result']['addressComponent']['province'] + ',' +
        result['result']['addressComponent']['district']]

    print(list_all)

 

4. Data storage (the list is converted to a string stored in a text file)

= res_all ' , ' .join (list_all) 
    with Open ( ' scientific and technological achievements Territory .txt ' , ' A + ' , encoding = " UTF-8 " ) AS f: 
        f.write (res_all)

 

:( data show the effect to be converted ------ conversion data) (but do not know why the two-way conversion from api out some of the data is not on)

For example, Tianjin University of Science and Technology: Tianjin Dagu South Road, Hexi District, 1038 

But api to achieve them is through Baidu Beijing

 

 

 

Guess you like

Origin www.cnblogs.com/birdmmxx/p/12483789.html