[2020-09-22] Capture of a drug information APP

Disclaimer: This article is for study and research only, and it is forbidden to be used for illegal purposes. Otherwise, you will be at your own risk. If there is any infringement, please notify and delete it, thank you!

Project scene:


Address: aHR0cDovL2FwcDEubm1wYS5nb3YuY24vZGF0YV9ubXBhL2ZhY2UzL2Jhc2UuanNwP3RhYmxlSWQ9MjUmdGFibGVOYW1lPVRBQkxFMjUmdGl0bGU9JUU1JTlCJUJEJUU0JUJBJUE3JUU4JThEJUFGJUU1JTkzJTgxJmJjSWQ9MTUyOTA0NzEzNzYxMjEzMjk2MzIyNzk1ODA2NjA0
  1. There is a 6SQk6G2z encrypted parameter on the data interface, and the details page has c1SoYK0a encrypted parameters. As long as you know how these two parameters come from, you can solve the whole site acquisition of data. However, I recently debugged on the front end of the page and found him It is the data obtained through the request+post request interface. There is no clear text in the process of encrypting the parameters in the middle. The JS code variable name and function name are completely confused and irreversible. Maybe I am too silly to find it. If anyone knows it, I hope it can Tell the younger brother how to~
    Insert picture description here

  2. Use selenium to get data, there will be several problems, the page will crash if crawled for a long time, proxy IP is required, crawling speed is slow, etc...

Problem Description:

Thinking about it this way, getting data from the web is not a good way, so we turn to his APP, grab the link to request data in his APP, prepare the following tools, and we start to operate:
  1. Fiddler packet capture tool.
  2. MuMu simulator or a mobile phone (the blogger uses the simulator, which may be inconvenient to operate, it is recommended to use a mobile phone~).
  3. Install his APP: link on the simulator or mobile phone .
  4. Can't grab the phone's bag? Good~post a link https://www.jianshu.com/p/724097741bdf .

solution:


1. After configuring our tools, you can start capturing packets. First, clear the requests captured by fiddler, and then click on the drugs in the APP.

Insert picture description here

2. Very good, the data has been captured by us, post the link, first look at the upper right of the circle I circled in the above picture, there are several parameters that need to be introduced.
  1. tableId: drug type ID (that is, the unique ID of each column)
  2. pageIndex: the current page number
  3. pageSize: the amount of data per page
Then format the json data you get, you can see the title of each piece of data
# 链接--这里注意下,访问链接时需要使用安卓的请求头哦~
http://mobile.nmpa.gov.cn/datasearch/QueryList?tableId=25&searchF=Quick%20SearchK&pageIndex=1&pageSize=15
[
    {
    
    
        "COUNT":162127,
        "CONTENT":"龟龙中风丸 (86901343001160 国药准字Z20020147 沈阳红药集团股份有限公司)",
        "ID":"109228"
    },
    {
    
    
        "COUNT":162127,
        "CONTENT":"龟龄集 (86902884000629 国药准字Z14020687 山西广誉远国药有限公司)",
        "ID":"73590"
    },
    {
    
    
        "COUNT":162127,
        "CONTENT":"龟黄补酒 (86901890000661;86901890000678;86901890000654 国药准字Z20026072 远大医药黄石飞云制药有限公司)",
        "ID":"102841"
    },
    {
    
    
        "COUNT":162127,
        "CONTENT":"龟鹿补肾胶囊 (86905098000638 国药准字Z20123109 广西华天宝药业有限公司)",
        "ID":"120532"
    },
    {
    
    
        "COUNT":162127,
        "CONTENT":"龟鹿补肾片 (86900427000075 国药准字Z20080217 广东心宝药业科技有限公司)",
        "ID":"41884"
    },
    {
    
    
        "COUNT":162127,
        "CONTENT":"龟鹿补肾片 (86903249000087 国药准字Z20090420 郑州福瑞堂制药有限公司)",
        "ID":"133086"
    },
    {
    
    
        "COUNT":162127,
        "CONTENT":"龟鹿补肾口服液 (86900291000263 国药准字Z44023432 广东华天宝药业集团有限公司(药品上市许可持有人))",
        "ID":"108891"
    },
    {
    
    
        "COUNT":162127,
        "CONTENT":"龟鹿补肾丸 (86900415000308 国药准字Z44020148 广州花城药业有限公司)",
        "ID":"142930"
    },
    {
    
    
        "COUNT":162127,
        "CONTENT":"龟鹿补肾丸 (86900291000294 国药准字Z44022779 广东华天宝药业集团有限公司)",
        "ID":"66724"
    },
    {
    
    
        "COUNT":162127,
        "CONTENT":"龟鹿补肾丸 (86900291000270 国药准字Z44022778 广东华天宝药业集团有限公司(药品上市许可持有人))",
        "ID":"114636"
    },
    {
    
    
        "COUNT":162127,
        "CONTENT":"龟鹿益肾胶囊 (86905004000028 国药准字B20020196 湖南康寿制药有限公司)",
        "ID":"155866"
    },
    {
    
    
        "COUNT":162127,
        "CONTENT":"龟鹿滋肾丸 (86900008000272 国药准字Z11020387 北京宝树堂科技药业有限公司)",
        "ID":"161886"
    },
    {
    
    
        "COUNT":162127,
        "CONTENT":"龟鹿滋肾丸 (86905156002598 国药准字Z45020433 广西梧州制药(集团)股份有限公司)",
        "ID":"158752"
    },
    {
    
    
        "COUNT":162127,
        "CONTENT":"龟鹿滋肾丸 (86905156002604 国药准字Z45020432 广西梧州制药(集团)股份有限公司)",
        "ID":"117237"
    },
    {
    
    
        "COUNT":162127,
        "CONTENT":"龟鹿滋肾丸 (86900256000413 国药准字Z44023076 国药集团冯了性(佛山)药业有限公司)",
        "ID":"73911"
    }
]

3. OK, and then we will grab the package on the details page, here also need to pay attention to the following parameters:
  1. tableId: drug type ID (that is, the unique ID of each column)
  2. searchK: ID of the data obtained on the list page
# 链接
http://mobile.nmpa.gov.cn/datasearch/QueryRecord?tableId=25&searchF=ID&searchK=109228

Insert picture description here

4. Look at the requested data, the data is the same as that on the web, and then you only need to construct the URL of the request list page to get the corresponding data! (I believe that the data in the other columns can be obtained by you who are smart~)
[
    {
    
    
        "NAME":"批准文号",
        "CONTENT":"国药准字Z20020147"
    },
    {
    
    
        "NAME":"产品名称",
        "CONTENT":"龟龙中风丸"
    },
    {
    
    
        "NAME":"英文名称",
        "CONTENT":""
    },
    {
    
    
        "NAME":"商品名",
        "CONTENT":""
    },
    {
    
    
        "NAME":"剂型",
        "CONTENT":"丸剂(水丸)"
    },
    {
    
    
        "NAME":"规格",
        "CONTENT":"每30丸重5g"
    },
    {
    
    
        "NAME":"上市许可持有人",
        "CONTENT":""
    },
    {
    
    
        "NAME":"生产单位",
        "CONTENT":"沈阳红药集团股份有限公司"
    },
    {
    
    
        "NAME":"生产地址",
        "CONTENT":"沈阳市大东区北大营西路2号"
    },
    {
    
    
        "NAME":"产品类别",
        "CONTENT":"中药"
    },
    {
    
    
        "NAME":"批准日期",
        "CONTENT":"2015-07-30"
    },
    {
    
    
        "NAME":"原批准文号",
        "CONTENT":""
    },
    {
    
    
        "NAME":"药品本位码",
        "CONTENT":"86901343001160"
    },
    {
    
    
        "NAME":"药品本位码备注",
        "CONTENT":""
    },
    {
    
    
        "NAME":"注",
        "CONTENT":"企业用户如对药品数据信息有疑问,请及时与我局信息中心数据整理组联系,来电前请备好相应的批件证明材料以备工作人员查询。电话:88331520(工作日);企业用户也可通过发邮件与我们联系:邮件地址[email protected],邮件主题请注明“药品批件问题”,邮件正文中请准确填写以下全部信息:1.药品批准文号/注册证号;2.药品批件号;3.药品批件类型(注册批件、补充批件、包材注册证、药品标准颁布件、再注册批件、其他);4.问题描述(500字以内);5.企业名称(全称);6.统一社会信用代码;7.联系人姓名;8.联系电话(手机和座机);9.电子邮件。以上内容请勿直接以电子邮件附件形式发送。",
        "DESCRIPTION":"企业用户如对药品数据信息有疑问,请及时与我局信息中心数据整理组联系,来电前请备好相应的批件证明材料以备工作人员查询。电话:88331520(工作日);企业用户也可通过发邮件与我们联系:邮件地址[email protected],邮件主题请注明“药品批件问题”,邮件正文中请准确填写以下全部信息:1.药品批准文号/注册证号;2.药品批件号;3.药品批件类型(注册批件、补充批件、包材注册证、药品标准颁布件、再注册批件、其他);4.问题描述(500字以内);5.企业名称(全称);6.统一社会信用代码;7.联系人姓名;8.联系电话(手机和座机);9.电子邮件。以上内容请勿直接以电子邮件附件形式发送。"
    }
]

Note: You can't get the data by directly requesting with requests. Because there is no cookie, the cookies are mainly neCYtZEjo8GmS and neCYtZEjo8GmT. It is recommended to use selenium to obtain cookie information. It will expire in a few minutes.

Guess you like

Origin blog.csdn.net/qq_26079939/article/details/108732969