Crawling Drug Administration in relevant companies in Detail

  • demand
    • Crawling Drug Administration related business details information http://125.35.6.84:81/xk/
  • demand analysis
    • Business-related data to determine whether the page is loaded dynamically?
      • Relevant business information is dynamically loaded out
    • Global search implemented by the packet capture tool positioning data corresponding to dynamic loading package!
    • Every business details page url, domain names are the same, only the id request parameter values ​​for different
      • It can be used in conjunction with the id value of different businesses together into a complete business details page url with a domain name
      • Whether the data business details page for dynamic loading?
        • By detecting packet capture tool, we found that corporate data details information dynamically loaded in the details page
        • Global search implemented by the capture tool positioning data corresponding to the dynamic loading of the data packet
import requests
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'
}


#获取每一家企业的id
url = 'http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsList'
for page in range(1,6):
    print('正在爬取第{}页的数据......'.format(page))
    data = {
        'on': 'true',
        'page': str(page),
        'pageSize': '15',
        'productName': '',
        'conditionType': '1',
        'applyname': '',
        'applysn':'',
    }
    company_data = requests.post(url,headers=headers,data=data).json()
    for dic in company_data['list']:
        _id = dic['ID']
        detail_url = 'http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsById'
        data = {
            'id':_id
        }
        detail_data = requests.post(url=detail_url,data=data,headers=headers).json()
        print(detail_data['epsName'],detail_data['legalPerson'])
  • How to detect whether there is data dynamically loading pages?
    • Based on packet capture tool to achieve
      • To capture all packets after the site requested
      • Locating in the data packet corresponds to the address bar requested data packet, local search (a set of content page) in response data corresponding tab
        • You can search: crawling data is not dynamically loaded
        • No search: crawling data is dynamically loaded
      • How to locate the dynamic loading of data in which data packets it?
        • Global search

Guess you like

Origin www.cnblogs.com/5kuishoua666/p/12021683.html