爬虫，中国商标网 - 代码天地

爬虫，中国商标网

其他 2019-03-12 18:01:24 阅读次数: 0

from win32com import client
import time
import random
from lxml import etree
dirver = client.DispatchEx("InternetExplorer.Application")
dirver.Navigate('http://sbj.saic.gov.cn/sbcx/')
dirver.Visible = 1
time.sleep(random.randint(2, 8))
dirver.Document.body.getElementsByTagName("p")[3].firstElementChild.click()
dirver.Visible = 1
time.sleep(random.randint(8, 12))
dirver.Document.body.getElementsByTagName("tbody")[1].click()
time.sleep(random.randint(10, 20))
for i in dirver.Document.body.getElementsByTagName("input"):
    if i.name == 'request:hnc':
        i.value = '百度'
# 点击查询
time.sleep(3)
dirver.Visible = 1
for i in dirver.Document.body.getElementsByTagName("input"):
    if i.id == '_searchButton':
        i.click()

time.sleep(20)
form_str=dirver.Document.body.getElementsByTagName("form")[2].innerHTML
print(form_str)
html_str = etree.HTML(form_str)
tr_list = html_str.xpath('//tr[@class="ng-repeat"]')
for tr in tr_list:
    item = {}
    item['注册号'] = tr.xpath('.//td[2]/a/text()')
    item['国际分类'] = tr.xpath('.//td[3]/text()')
    item['申请日期'] = tr.xpath('.//td[4]/text()')
    item['商标名称'] = tr.xpath('.//td[5]/a/text()')
    item['申请人名称'] = tr.xpath('.//td[6]/a/text()')

    print(item)
    with open('item.txt', 'w', encoding='utf-8') as f:
        f.write(str(item))

猜你喜欢

转载自www.cnblogs.com/sea-stream/p/10518276.html

爬虫，中国商标网

python3.6爬取中国商标网断点续爬,IP代理--玉米都督

python网络爬虫-2019年我破解企业工商数据+商标网+建筑招标网数据-爬虫技术分享

中国天气网爬虫

中国裁判网-爬虫-2018.09.28

中国知网爬虫（转）

中国裁判文书网爬虫分析

中国爬虫图鉴

【Python3爬虫】拉勾网爬虫

解决猫眼网反爬虫策略的爬虫

python爬虫(花瓣网)

斗图网爬虫

python爬虫——校花网

python爬虫——全书网

爬虫大麦网

汇图网爬虫

食品伙伴网爬虫

网银爬虫

人人网爬虫

拉勾网Ajax爬虫

python 爬虫系列04 实战中国天气网

爬虫_中国天气网_文字天气预报（xpath）

python 爬虫爬取中国天气网数据

R语言完成中国裁判文书网最新爬虫

Python selenium爬虫实例（列举中国糖酒网）

第5课-中国天气网爬虫案例

中国应急服务网自然灾害python爬虫

python爬虫实战之爬取中国农药网

python 爬虫爬取中国新闻网

Python 花瓣网动态爬虫

今日推荐

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

周排行

rbac——界面、权限

Apache CXF + SpringMVC 整合发布WebService

so插件化

Vue.js实战系列---图标字体制作（svg格式）

PAT乙级 1007 素数对猜想(孪生素数对) (20分) ---（C语言 + 详细注释）

被IRM保护的文档，打开失败

Calendar和Date计算日期差的小问题

win10子系统ubuntu18.4安装docker

利用Wrap Shell Script定位Android Native内存泄漏

MySQL: Transaction (Part I - Basic Concept)

每日归档

更多

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)