调用天行数据API爬取国际新闻并保存到数据库

先到天行数据官网(https://www.tianapi.com/)注册一个账号,获取APIKEY。

拉到下面点击美女图片:

找到我们需要用到的接口地址(http://api.tianapi.com/world/):

我们还需要在接口地址后面加上对应的参数APIKEY.即(http://api.tianapi.com/world/?&key=APIKEY)

用浏览器打开地址,会出现一串json格式的数据,不过是还没格式化的

{"code":200,"msg":"success","newslist":[{"ctime":"2019-06-25 06:35","title":"意大利米兰和科尔蒂纳丹佩佐赢得2026冬奥会举办权","description":"中华国际","picUrl":"https:\/\/img3.utuku.china.com\/200x112\/news\/20190625\/18aecd6e-858c-4029-ae1e-6e0c2ab5faf8.jpg","url":"https:\/\/news.china.com\/international\/1000\/20190625\/36468892.html"},{"ctime":"2019-06-25 05:58","title":"2架台风战机在德国上空相撞后坠毁 两名飞行员弹射","description":"中华国际","picUrl":"https:\/\/img2.utuku.china.com\/200x112\/news\/20190625\/4d734999-9769-4de4-a5a5-0e9996c0f468.png","url":"https:\/\/news.china.com\/international\/1000\/20190625\/36468636.html"},{"ctime":"2019-06-25 05:48","title":"章莹颖案被告绑架和谋杀罪名成立 将面临死刑或无期","description":"中华国际","picUrl":"https:\/\/img0.utuku.china.com\/200x112\/news\/20190625\/1e55246e-614a-4bb6-a128-8be803f0e84f.jpg","url":"https:\/\/news.china.com\/international\/1000\/20190625\/36468624.html"},{"ctime":"2019-06-24 17:51","title":"巴基斯坦一辆载有21人的旅游车坠入河谷 9人死亡","description":"中华国际","picUrl":"https:\/\/img1.utuku.china.com\/200x112\/news\/20190624\/26bdb4b7-8734-4c9a-b132-c780a8cce41f.jpg","url":"https:\/\/news.china.com\/international\/1000\/20190624\/36467610.html"},{"ctime":"2019-06-24 17:19","title":"美国禁运让中国下代超算遭致命打击?告诉你实情","description":"中华国际","picUrl":"https:\/\/img2.utuku.china.com\/200x112\/news\/20190624\/f744f006-6495-47b2-8435-69ca0512f36f.jpg","url":"https:\/\/news.china.com\/international\/1000\/20190624\/36466907.html"},{"ctime":"2019-06-24 17:00","title":"66人在纽约时报门口抗议被捕:要求媒体报道换词","description":"中华国际","picUrl":"https:\/\/img2.utuku.china.com\/200x112\/news\/20190624\/50f0b988-f344-4829-9e6b-e068bea466a4.jpg","url":"https:\/\/news.china.com\/international\/1000\/20190624\/36466787.html"},{"ctime":"2019-06-24 16:32","title":"沙特民用机场再遭空袭 胡塞武装:还有更多报复","description":"中华国际","picUrl":"https:\/\/img0.utuku.china.com\/200x112\/news\/20190624\/787f3d6b-876f-446c-8898-648f46b9acd8.jpg","url":"https:\/\/news.china.com\/international\/1000\/20190624\/36466522.html"},{"ctime":"2019-06-24 14:22","title":"美企将为152人举行太空葬礼:每克骨灰4995美元起","description":"中华国际","picUrl":"https:\/\/img0.utuku.china.com\/200x112\/news\/20190624\/8bbf1991-4d7b-484d-b49e-84a235ca805c.jpg","url":"https:\/\/news.china.com\/international\/1000\/20190624\/36465699.html"},{"ctime":"2019-06-24 13:22","title":"普京:一旦美国准备好了 俄罗斯乐于发展俄美关系","description":"中华国际","picUrl":"https:\/\/img3.utuku.china.com\/200x112\/news\/20190624\/75995b53-a271-4524-bf63-e6ee8ba3c728.jpg","url":"https:\/\/news.china.com\/international\/1000\/20190624\/36464857.html"},{"ctime":"2019-06-24 11:26","title":"伊朗外长称还有架美国无人机入侵 蓬佩奥:幼稚","description":"中华国际","picUrl":"https:\/\/img0.utuku.china.com\/200x112\/news\/20190624\/adb4343a-8ba6-49b8-b8fb-1aeb49c2c5bb.jpg","url":"https:\/\/news.china.com\/international\/1000\/20190624\/36464150.html"}]}

可以用json格式化网站如bejson(http://www.bejson.com/)来格式化:

{
	"code": 200,
	"msg": "success",
	"newslist": [{
		"ctime": "2019-06-25 06:35",
		"title": "意大利米兰和科尔蒂纳丹佩佐赢得2026冬奥会举办权",
		"description": "中华国际",
		"picUrl": "https:\/\/img3.utuku.china.com\/200x112\/news\/20190625\/18aecd6e-858c-4029-ae1e-6e0c2ab5faf8.jpg",
		"url": "https:\/\/news.china.com\/international\/1000\/20190625\/36468892.html"
	}, {
		"ctime": "2019-06-25 05:58",
		"title": "2架台风战机在德国上空相撞后坠毁 两名飞行员弹射",
		"description": "中华国际",
		"picUrl": "https:\/\/img2.utuku.china.com\/200x112\/news\/20190625\/4d734999-9769-4de4-a5a5-0e9996c0f468.png",
		"url": "https:\/\/news.china.com\/international\/1000\/20190625\/36468636.html"
	}, {
		"ctime": "2019-06-25 05:48",
		"title": "章莹颖案被告绑架和谋杀罪名成立 将面临死刑或无期",
		"description": "中华国际",
		"picUrl": "https:\/\/img0.utuku.china.com\/200x112\/news\/20190625\/1e55246e-614a-4bb6-a128-8be803f0e84f.jpg",
		"url": "https:\/\/news.china.com\/international\/1000\/20190625\/36468624.html"
	}, {
		"ctime": "2019-06-24 17:51",
		"title": "巴基斯坦一辆载有21人的旅游车坠入河谷 9人死亡",
		"description": "中华国际",
		"picUrl": "https:\/\/img1.utuku.china.com\/200x112\/news\/20190624\/26bdb4b7-8734-4c9a-b132-c780a8cce41f.jpg",
		"url": "https:\/\/news.china.com\/international\/1000\/20190624\/36467610.html"
	}, {
		"ctime": "2019-06-24 17:19",
		"title": "美国禁运让中国下代超算遭致命打击?告诉你实情",
		"description": "中华国际",
		"picUrl": "https:\/\/img2.utuku.china.com\/200x112\/news\/20190624\/f744f006-6495-47b2-8435-69ca0512f36f.jpg",
		"url": "https:\/\/news.china.com\/international\/1000\/20190624\/36466907.html"
	}, {
		"ctime": "2019-06-24 17:00",
		"title": "66人在纽约时报门口抗议被捕:要求媒体报道换词",
		"description": "中华国际",
		"picUrl": "https:\/\/img2.utuku.china.com\/200x112\/news\/20190624\/50f0b988-f344-4829-9e6b-e068bea466a4.jpg",
		"url": "https:\/\/news.china.com\/international\/1000\/20190624\/36466787.html"
	}, {
		"ctime": "2019-06-24 16:32",
		"title": "沙特民用机场再遭空袭 胡塞武装:还有更多报复",
		"description": "中华国际",
		"picUrl": "https:\/\/img0.utuku.china.com\/200x112\/news\/20190624\/787f3d6b-876f-446c-8898-648f46b9acd8.jpg",
		"url": "https:\/\/news.china.com\/international\/1000\/20190624\/36466522.html"
	}, {
		"ctime": "2019-06-24 14:22",
		"title": "美企将为152人举行太空葬礼:每克骨灰4995美元起",
		"description": "中华国际",
		"picUrl": "https:\/\/img0.utuku.china.com\/200x112\/news\/20190624\/8bbf1991-4d7b-484d-b49e-84a235ca805c.jpg",
		"url": "https:\/\/news.china.com\/international\/1000\/20190624\/36465699.html"
	}, {
		"ctime": "2019-06-24 13:22",
		"title": "普京:一旦美国准备好了 俄罗斯乐于发展俄美关系",
		"description": "中华国际",
		"picUrl": "https:\/\/img3.utuku.china.com\/200x112\/news\/20190624\/75995b53-a271-4524-bf63-e6ee8ba3c728.jpg",
		"url": "https:\/\/news.china.com\/international\/1000\/20190624\/36464857.html"
	}, {
		"ctime": "2019-06-24 11:26",
		"title": "伊朗外长称还有架美国无人机入侵 蓬佩奥:幼稚",
		"description": "中华国际",
		"picUrl": "https:\/\/img0.utuku.china.com\/200x112\/news\/20190624\/adb4343a-8ba6-49b8-b8fb-1aeb49c2c5bb.jpg",
		"url": "https:\/\/news.china.com\/international\/1000\/20190624\/36464150.html"
	}]
}

现在看起来就顺眼多了。

观察这串数据,我们要获取的是newslist里面的title,picUrl和url.

可以先获取newslist的json对象,然后循环遍历它得到我们需要的数据,最后再保存到mysql数据库中。

以下是完整的代码:

import requests

import mysql.connector as mysql
cnx=mysql.connect(user='root',password='123456',database='meinv')
#获得游标
cursor=cnx.cursor()


rs=requests.get("http://api.tianapi.com/world/?key=68c42f9baf78d084cfc3cc29f8c4e569")
jsonobj=rs.json()['newslist']
#遍历这个列表
for i in jsonobj:
    print(i['title'],i['picUrl'],i['url'])
    #持久化保存 mysql
    str1="insert into picture values('{}','{}','{}')".format(i['title'],i['picUrl'],i['url'])
    cursor.execute(str1)
cnx.commit()






其中,数据库的建表语句是:

CREATE TABLE `picture` (
  `title` varchar(255) DEFAULT NULL,
  `picUrl` varchar(255) DEFAULT NULL,
  `url` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8

注意各个字段要一一对应。

运行代码,控制台打印:

查看数据库:

大功告成!

猜你喜欢

转载自blog.csdn.net/Javaxiaobaismc/article/details/93594214