Hands with you in Python web crawler crawling near the famous universities Reviews

Click on the blue " Python space " Ah my attention

Add a " star " happy learning together every day

now

day

Chicken

soup

I stood below the tower, all bustling nothing to do with me.

/1 Introduction/

  Summary: This article describes how to crawl hotels near famous university reviews with python, and analyzed, we find out how to hotels near the famous university.

/ 2 specific implementation /

  Specific implementation is mainly divided into three steps, the specific operation is as follows.

First, grab near the university hotel information

  As the US Mission hotel computer client information without comment, so I start from the end of the phone web page, the web address is: https://i.meituan.com/awp/h5/hotel/search/search.html

  Search through the hotel near Beijing University, found a packet capture information back to the hotel json url.

  Which, limit the maximum number of representatives back to the hotel (after test, limit the maximum is 50), offset amount for each return to the hotel starting point, cityId as a symbol of the city, can be found on the page information, the time parameters can be modified, sort to return Sort hotel information, sort = distance represents the search by distance, q and keyword are college name.

  Returned data is shown below:

    Information includes hotel name, location, rating, realPoiId (the equivalent of the hotel's ID number, followed by a climb to comment), distance and other hotels and universities.

    Here we begin to climb the ranking of the top 10 colleges and universities near the hotel (do not care about university rankings, I find chaos of a learning-based):

(Photo from Internet)

  Part of the code is shown below:

  Which cityId and universities named control variables, from the information returned by the hotel is controlled within 2000 meters, output is:

  Look at these 10 universities how many hotels nearby 2000 m near:

  We can see that most hotels near Nanjing University, there are 453; hotels near Shanghai Jiaotong University Minhang campus least, there are 75.

二、抓取每家酒店的点评信息

  这个从这个url可以返回每家酒店的评论数量,poiId是酒店的“身份证号”。

  这个url可以返回酒店的所有评论信息,其中limit为返回的评论数量,可以直接用上个url返回的评论数量,一次全部以json格式返回,非常方便,返回结果如下:

三、遇到的坑

  1.刚开始爬评论是1次返回15个,后来发现可以Limit可以为评论的最大值,但是第一步返回的酒店信息中包含酒店评论数量是不准确的,要用第二步的方法;

  2.评论中乱七八糟的表情、符号也是大坑,去了好久也去不干净;

  3.最好用代理IP地址爬,否则评论太多,会被封。

/3 结语/

  本文基于Python网络爬虫,抓取了高校旁边的酒店数量及其评论数量,如果你想抓取其他地方的其他信息,也是可行的,可以纵向拓展。

  

-END-

推荐阅读:
出不了门的日子,我选择在 GitHub 上快乐的打游戏

神级宝库!GitHub 标星 1.2w+,Chrome 最天秀的插件都在这里啦!

全!全!全!GitHub 总星 5.7w+,最赞的操作系统软件都在这里啦!

卸载 x 雷某度!GitHub 标星 1.5w+,从此我只用这款全能高速下载工具!
B站收藏 6.1w+!GitHub 标星 3.9k+!这门神课拯救了我薄弱的计算机基础





????扫描上方二维码即可关注
发布了609 篇原创文章 · 获赞 6756 · 访问量 114万+

Guess you like

Origin blog.csdn.net/u013486414/article/details/104528713