~ Project to supplement

 

[TOC]

---

Review yesterday:

Incremental: monitoring a site, the site as long as there is data update, data updates crawling
to weight:
based on url
based on data refers
redis database: a collection of name-value Sadd
judge = sadd set of name-value: 1) If the value in the set already exist, Judge, indicates that the data has passed crawler
2). If the value is not set, Judge 1, the representative data is not climb

scrapy response data frame: acquiring its binary stream ---> response.body
response data module requests: acquiring its binary stream -> response.content

#### 1.Scrapy & Django project

`` Python `
# demand: Writing Project and reptiles Detailed Django project and, crawling to the data on the front page to show
` ``

`` Python `
# reptile write:
` ``

```Python
# spider编写:
import scrapy
from dl.items import DlItem
class PSpider(scrapy.Spider):
name = 'p'
# allowed_domains = ['www.baidu.com']
start_urls = ['https://www.kuaidaili.com/free/']

def parse(self, response):
# print(response)
tr_list = response.xpath('//*[@id="list"]/table/tbody/tr')
# print(tr_list)
for tr in tr_list:
ip = tr.xpath('./td[1]/text()').extract_first()
port = tr.xpath('./td[2]/text()').extract_first()
typ = tr.xpath('./td[3]/text()').extract_first()
protocal = tr.xpath('./td[4]/text()').extract_first()
position = tr.xpath('./td[5]/text()').extract_first()
# print(ip, port, protocal, position)
item = DlItem()
item['ip'] = ip
item['port'] = port
item['typ'] = typ
item['protocal'] = protocal
item['position'] = position
print(item)
yield item
```

```Python
# items编码
import scrapy
class DlItem(scrapy.Item):
ip = scrapy.Field()
port = scrapy.Field()
typ = scrapy.Field()
protocal = scrapy.Field()
position = scrapy.Field()
```

`` Python `
# Django project is created with all the configuration:
1.models creation:
from django.db Import Models

# Create your models here.

class Proxy(models.Model):
ip = models.CharField(max_length=50)
port = models.CharField(max_length=50)
typ = models.CharField(max_length=50)
protocal = models.CharField(max_length=50)
position = models.CharField(max_length=50)

2.在scrapy框架项目中嵌入django
import os
import sys
sys.path.append(os.path.dirname(os.path.abspath('.')))
os.environ['DJANGO_SETTINGS_MODULE'] = 'proxyscan.settings'
# 手动初始化Django:
import django
django.setup()

3. Modify the crawler Item:
Import Scrapy
from scrapy_djangoitem Import DjangoItem
from Proxy Import Models
class DlItem (DjangoItem):
django_model = models.Proxy

4.pipeline encoding:
class DlPipeline (Object):
DEF process_item (Self, Item, Spider):
Print ( 'open the database, data storage')
item.save ()
Print ( 'off database')
return item

5.Django project migration background configuration admin database and
the Python manage.py makemigrations
Python the migrate manage.py

from proxy.models import Proxy
admin.site.register(Proxy)

# Create a super user:
Python manage.py createsuperuser

```

```Python
# 路由:
from django.conf.urls import url
from django.contrib import admin
from proxy.views import index

urlpatterns = [
url(r'^admin/', admin.site.urls),
url(r'^index/', index),
]

# 视图函数:
from django.shortcuts import render
from proxy.models import Proxy
def index(requests):
p = Proxy.objects.all()
return render(requests, 'index.html', {"p":p})

```

```Python
# 前端代码:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
<script src="https://cdn.bootcss.com/jquery/3.4.1/jquery.min.js"></script>
<link href="https://cdn.bootcss.com/twitter-bootstrap/4.3.1/css/bootstrap.min.css" rel="stylesheet">
</head>
<body>
<div class="container">
<div class="row" >
<div class="col-md-10 col-md-offset-2" style="margin:0 auto">
<div class="panel panel-primary">
<div class="panel-heading" style="margin-top:50px">
<h3 class="panel-title">代理IP一览表</h3>
</div>
<div class="panel-body">
<table class="table table-striped">
<thead>
<tr>
<th>IP</th>
<th>Port</th>
<th>Type</th>
<th>Protocal</th>
<th>Positon</th>
</tr>
</thead>
<tbody>
{% for i in p %}
<tr>
<th>{{ i.ip }}</th>
<td>{{ i.port }}</td>
<td>{{ i.typ }}</td>
<td>{{ i.protocal }}</td>
<td>{{ i.position }}</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>

</body>
</html>
```

 

Guess you like

Origin www.cnblogs.com/hanghaiwang/p/11531417.html