Boy Python crawler and self-made news website, so fun! Netizen: Can play for a day

We are always crawling and crawling, is it just to be a word cloud when we climb to the data?

of course not! This time I will use flask to present a side dish for everyone.

Flask is a lightweight web framework in python, simpler than other web frameworks, suitable for beginners. Use Flask+ crawler to teach you how to display your crawled data on the web page in real time.

Let me show you this ugly webpage first↓

(Giving face, don't laugh)

 

 

Demonstrate three functions

The whole process is three simple steps:

  • Crawl data
  • Use real-time crawling data to generate word clouds
  • Recommend news using hotspots

Reptile part:

This crawler mainly uses multi-threaded methods to crawl all news information of Sina News + Netease News.

There are a total of 14 columns. The page information of the two websites is loaded through ajax. After requesting the corresponding column link, the returned string is like this. If you look carefully, you will find that the news content we want to watch is included in data_callback.

figure 2

Is a list style,

At this time, we can use the eval function to process this string into a list format

def get_wy_teach():
    url = 'https://tech.163.com/special/00097UHL/tech_datalist.js?callback=data_callback'
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'
    }

    res = requests.get(url=url, headers=headers)
    # print(res.text)
    data = res.text
    data = eval(data.replace('data_callback(','').replace(data[-1],""))

Then the news content can be extracted in a loop below, and the last step is to store it in our mysql database

After we have built the crawler for 14 columns, we will write a main file main, and use a simple multi-thread method to start 14 files and crawl 14 columns of news in parallel.

def multi_thread():
    t1 = threading.Thread(target=xzg)
    t2 = threading.Thread(target=xz)

    #......
    
    t13 = threading.Thread(target=wy_hua)
    t14 = threading.Thread(target=wy_chn)

    t1.start()
    t2.start()
    
    #......
    
    t13.start()
    t14.start()

By the way, after the crawler, we still made a word cloud, hahaha

Click to generate today's hot news word cloud, wait a moment

 

Today's hot vocabulary

Many people learn python and don't know where to start.
Many people learn python and after mastering the basic grammar, they don't know where to find cases to get started.
Many people who have done case studies do not know how to learn more advanced knowledge.
So for these three types of people, I will provide you with a good learning platform, free to receive video tutorials, e-books, and the source code of the course!
QQ group: 705933274

Flask part:

The auxiliary materials are processed, and now we start to make the main course part.

from flask import Flask,render_template,request

#注册创建app应用,_name_是python预定义变量
app = Flask(__name__)

#跨域请求cors
from flask_cors import CORS

CORS(app, resources=r'/*')

#启动爬虫页
@app.route('/test', methods=['GET'])
def mytest():
    main.multi_thread()
    time.sleep(10)
    return '爬取完成~'

if __name__ == '__main__':
    app.run(debug=True,port=5000)

 

  • render_template, used to render our h5 page
  • app = Flask(__name__), this is a mandatory item for Flask, the module name must be defined first to set the routing path (transfer 4)
  • Cors cross-domain requests, generally used for Ajax requests, CORS(app, resources=r'/*') defines all paths in app routing are applicable to cross-domain requests
  • @app.route('/test'), when you want to use the mytest function, set the access path of the function with /test. Example: http://49.233.23.230:5000/test
  • app.run(debug=True,port=5000), the last is to specify the listening address port as 5000, debug=True is the debugging environment, it can be changed to Flase when used in the production environment.

Such a small Flask page interface is completed

Now that the interface is written, let’s make a h5 page. Let’s first create a simple hmtl file (for example, a news recommendation page)

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
</head>
<body>
<div align="" class="img">
    <h1>今日新闻推荐</h1>
    <div class="img">
        <ul>
          <li> <a href="{
   
   {data[0][1]}}">{
   
   {data[0][0]}}</a></li>
          <li> <a href="{
   
   {data[1][1]}}">{
   
   {data[1][0]}}</a></li>
          <li> <a href="{
   
   {data[2][1]}}">{
   
   {data[2][0]}}</a></li>
          <li> <a href="{
   
   {data[3][1]}}">{
   
   {data[3][0]}}</a></li>
          <li> <a href="{
   
   {data[4][1]}}">{
   
   {data[4][0]}}</a></li>
          <li> <a href="{
   
   {data[5][1]}}">{
   
   {data[5][0]}}</a></li>
          <li> <a href="{
   
   {data[6][1]}}">{
   
   {data[6][0]}}</a></li>
          <li> <a href="{
   
   {data[7][1]}}">{
   
   {data[7][0]}}</a></li>
          <li> <a href="{
   
   {data[8][1]}}">{
   
   {data[8][0]}}</a></li>
          <li> <a href="{
   
   {data[9][1]}}">{
   
   {data[9][0]}}</a></li>
          <li> <a href="{
   
   {data[10][1]}}">{
   
   {data[10][0]}}</a></li>
          <li> <a href="{
   
   {data[11][1]}}">{
   
   {data[11][0]}}</a></li>
          <li> <a href="{
   
   {data[12][1]}}">{
   
   {data[12][0]}}</a></li>
          <li> <a href="{
   
   {data[13][1]}}">{
   
   {data[13][0]}}</a></li>
          <li> <a href="{
   
   {data[14][1]}}">{
   
   {data[14][0]}}</a></li>
          <li> <a href="{
   
   {data[15][1]}}">{
   
   {data[15][0]}}</a></li>
          <li> <a href="{
   
   {data[16][1]}}">{
   
   {data[16][0]}}</a></li>
          <li> <a href="{
   
   {data[17][1]}}">{
   
   {data[17][0]}}</a></li>
          <li> <a href="{
   
   {data[18][1]}}">{
   
   {data[18][0]}}</a></li>
          <li> <a href="{
   
   {data[19][1]}}">{
   
   {data[19][0]}}</a></li>

        </ul>
    </div>
    <div class="logo-img">

    </div>
</div>

We transfer the data obtained from the database to the h5 file

#新闻推荐
@app.route('/news')
def news_list():
    data = get_mysql()
    
    return render_template('index4.html', data=data)

Copy and paste 20 li to make it easier for everyone to understand. I set the number of recommended news items today to 20.

You can also select 20 news feed webpages from the database through your favorite algorithm

Refresh news and view news

At this point, a simple Flask website is complete, is it very simple?

Flask is a small and flexible web framework that allows you to decide which functions to customize and flexibly customize components, which is very suitable for small websites.

Conclusion: (If you want a beautiful website, you still have to learn h5, don't learn from me)

I still want to recommend the Python learning group I built myself : 705933274. The group is all learning Python. If you want to learn or are learning Python, you are welcome to join. Everyone is a software development party and share dry goods from time to time (only Python software development related), including a copy of the latest Python advanced materials and zero-based teaching compiled by myself in 2021. Welcome friends who are in advanced and interested in Python to join!

 

Guess you like

Origin blog.csdn.net/m0_55479420/article/details/115026380