Python Crawler(5)Deployment on RaspberryPi

Python Crawler(5)Deployment on RaspberryPi

Check python version
>python -V
Python 2.7.13

Install pip on raspberryPi

>sudo apt-get install python-pip
>pip -V
pip 9.0.1 from /usr/lib/python2.7/dist-packages (python 2.7)

It worked before but today when I run pip -V, it stuck.

Try this one
>curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
>python get-pip.py

>pip -V
pip 9.0.1 from /usr/local/lib/python2.7/dist-packages (python 2.7)

Install scrapy ENV
>sudo pip install scrapy

Exception
No package 'libffi' found
Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?

Solution:
>sudo apt-get install libxml2-dev libxslt1-dev
>sudo pip install lxml

Exceptions:
Could not import setuptools which is required to install from a source distribution.
Please install setuptools.
src/lxml/etree.c:91:20: fatal error: Python.h: No such file or directory
Running setup.py install for cffi ... error
Running setup.py install for cryptography ... error

Solution:
>sudo apt-get install python-dev
>sudo pip install -U setuptools
>sudo apt-get install python-cffi
>sudo apt-get install gcc libffi-dev libssl-dev python-dev

Compile and Install does not work.
So try this one
>sudo apt-get install python-cryptography
>sudo apt-get install python-crypto
>sudo apt-get install -y python-lxml
>sudo pip install scrapy

It does not work on raspberryPi 1 and 2. So I just install scrapyd there.
>sudo pip install scrapyd
>scrapyd --version
twistd (the Twisted daemon) 17.5.0
Copyright (c) 2001-2016 Twisted Matrix Laboratories.
See LICENSE for details.

Even on my raspberrypi1, I have issues when I run the command
>scrapy shell 'http://quotes.toscrape.com/page/1'

Exceptions:
'module' object has no attribute 'OP_NO_TLSv1_1'

Solution:
https://github.com/scrapy/scrapy/issues/2473
>sudo pip install --upgrade scrapy
>sudo pip install --upgrade twisted
>sudo pip install --upgrade pyopenssl

Install the Clients
>sudo pip install scrapyd-client

Install deploy tool
>sudo pip install scrapyd-deploy

Install selenium Support
>sudo pip install selenium

Start the Server
>scrapyd

Bind issue I guess, I can access 6800 on that server with localhost:6800, but not work from remote.
Add one file in /opt/scrapyd
cat scrapyd.conf
[scrapyd]
eggs_dir    = eggs
logs_dir    = logs
items_dir   =
jobs_to_keep = 100
dbs_dir     = dbs
max_proc    = 0
max_proc_per_cpu = 20
finished_to_keep = 100
poll_interval = 5.0
bind_address = 0.0.0.0
http_port   = 6800
debug       = off
runner      = scrapyd.runner
application = scrapyd.app.application
launcher    = scrapyd.launcher.Launcher
webroot     = scrapyd.website.Root

[services]
schedule.json     = scrapyd.webservice.Schedule
cancel.json       = scrapyd.webservice.Cancel
addversion.json   = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json   = scrapyd.webservice.DeleteProject
delversion.json   = scrapyd.webservice.DeleteVersion
listjobs.json     = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus

>nohup scrapyd &

Command to install all dependencies
>pip install -r requirements.txt

List the base version
https://docs.resin.io/runtime/resin-base-images/?ref=dockerhub

I also docker that application.
start.sh easily start the service
#!/bin/sh -ex

#start the service
cd /tool/scrapyd/
scrapyd

The Makefile open 6800 Port
IMAGE=sillycat/public
TAG=raspberrypi-scrapyd
NAME=raspberrypi-scrapyd

docker-context:

build: docker-context
docker build -t $(IMAGE):$(TAG) .

run:
docker run -d -p 6800:6800 --name $(NAME) $(IMAGE):$(TAG)

debug:
docker run -ti -p 6800:6800 --name $(NAME) $(IMAGE):$(TAG) /bin/bash

clean:
docker stop ${NAME}
docker rm ${NAME}

logs:
docker logs ${NAME}

publish:
docker push ${IMAGE}:${TAG}

fetch:
docker pull ${IMAGE}:${TAG}

The Dockerfile has all the installation steps
#Set up FTP in Docker

#Prepre the OS
FROM resin/raspberrypi3-python
MAINTAINER Carl Luo <[email protected]>

ENV DEBIAN_FRONTEND noninteractive
RUN apt-get -y update
RUN apt-get -y dist-upgrade

#install the software
RUN pip install scrapyd

#copy the config
RUN mkdir -p /tool/scrapyd/
ADD conf/scrapyd.conf /tool/scrapyd/

#set up the app
EXPOSE 6800
RUN     mkdir -p /app/
ADD     start.sh /app/
WORKDIR /app/
CMD[ "./start.sh" ]

conf/scrapyd.conf will have the configurations
[scrapyd]
eggs_dir    = eggs
logs_dir    = logs
items_dir   =
jobs_to_keep = 100
dbs_dir     = dbs
max_proc    = 0
max_proc_per_cpu = 20
finished_to_keep = 100
poll_interval = 5.0
bind_address = 0.0.0.0
http_port   = 6800
debug       = off
runner      = scrapyd.runner
application = scrapyd.app.application
launcher    = scrapyd.launcher.Launcher
webroot     = scrapyd.website.Root

[services]
schedule.json     = scrapyd.webservice.Schedule
cancel.json       = scrapyd.webservice.Cancel
addversion.json   = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json   = scrapyd.webservice.DeleteProject
delversion.json   = scrapyd.webservice.DeleteVersion
listjobs.json     = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus

References:
scrape
http://sillycat.iteye.com/blog/2391523
http://sillycat.iteye.com/blog/2391524
http://sillycat.iteye.com/blog/2391685
http://sillycat.iteye.com/blog/2391926

https://stackoverflow.com/questions/33785755/getting-could-not-find-function-xmlcheckversion-in-library-libxml2-is-libxml2
https://github.com/fredley/play-pi/issues/22
https://stackoverflow.com/questions/21530577/fatal-error-python-h-no-such-file-or-directory

Python Crawler(5)Deployment on RaspberryPi

猜你喜欢