Based spark anti-crawler technology and kafka [the next day]

3. System Architecture

3.1 Technology Selection

Technology Selection Component Name version number description
Presentation Layer Framework SpringMVC(Pivotal) 4.0.7 Highly efficient and stable, is the use of Servlet jsp technology and an MVC implementation to help developers control change web projects and improve the professional level, reduce developers in the use of MVC design pattern web application development time
Logical layer control framework Spring(Pivotal) 4.0.7 Providing transaction management, logic control. One of the main advantages of the framework is a layered architecture, layered architecture allows the user to select which components to use, while providing an integrated framework for J2EE application development
Persistence framework Hibernate(RedHat) 4.2.12 Persistence framework, made of very lightweight JDBC object package so that the programmer using the object JAVA programming thinking to manipulate the database can be arbitrary.

4 illustrates mainly used

4.1 Data Management

4.2 real-time monitoring

4.3 Data Visualization

4.4 Policy Management

4.5 Process Management

4.6 Rules Management

5. Anti-climb rules

5.1 Data Sources

Travel-related
origin, destination, departure time, a predetermined time
travel-related person
identity information (names opportunity, take advantage of the type of people, who took the opportunity to document type, document number opportunity people), the number of trips
a predetermined correlation
tickets personal information (login ID, login type, operating IP, browser UA, mobile phone device information)
contact information (contact name, contact phone number, contact mailbox)
additional information (sales units)
Flights anti-climb key fields
bookuser ticket buyer ID, a login ID such as Pearl membership number, phone number and other non-member
bookip purchasers IP
psgname took the opportunity to name (user sensitive information)
psgtype people took the opportunity to type, such as adults, children, infants
idtype types of documents, such as identity cards, passports, etc. other
idcard who took the opportunity to identification number (user sensitive information)
ContractName contact name (user sensitive information)
contractphone contact phone number (user sensitive information)
bookagent sales unit
depcity / depairport origin
arrcity / arrairport destination
flightdate / deptime departure time
cabin class level

5.2 anti-climb rules

Press single request - non-browser words appear UA,
by IP address aggregation - within any X minutes queries than Y times
by IP address aggregation - X consecutive queries time interval is less than Y seconds
by IP address polymeric - Query arbitrary X minutes Y the time interval is less than the variance
by IP address aggregation - within any X minutes, every minute variance of Y is less than the frequency of the query
by IP address aggregation - within any X minutes, the query at different departure exceeds Y

5.3 law

Holidays, membership date, the popular route

5.4 cheaters Features

Long-term, repeated crawling data
using multiple agents (every few minutes automatically switches UA, IP) crawling data
for each IP flow within a short time the outbreak of the higher
number of requests each day and night IP equilibrium
popular route updates more frequently than fast
browser path is not complete

9.OpenResty Overview

9.1 OpenResty development environment to build

1, OPenResty download
can download official (https://openresty.org/cn/)
where download linux version openresty-1.13.6.1.tar.gz, to upload the compressed / opt on a node slave1 / Software
2, linux (cent6.5) installed OpenResrty
Note
OpenResty dependent libraries have: perl 5.6.1+, libreadline, libpcre, libssl. So we need to install these dependencies
yum install -y readline-devel pcre- devel openssl-devel perl gcc
first step: extracting
tar -xzvf /opt/software/openresty-1.13.6.1.tar.gz -C / opt / apps /
step: configuration
enter [hadoop @ slave1 openresty] $ directory, and enter commands to configure
./configure --prefix = / opt / apps / openresty --with-http_stub_status_module
third step: mounting (compile)
the make && make install

Install lua 9.2 centos under

-C -zxvf /opt/software/lua-5.3.4.tar.gz the tar / opt / Apps /
[@ Hadoop Slave1 Apps] $ CD ./lua-5.3.4/
[Hadoop @ Slave1 Lua-5.3.4] the make Linux the test $
[hadoop @ slave1-Lua 5.3.4] $ the make install
test whether the installation was successful:
[hadoop @ slave1-Lua 5.3.4] -v $ Lua
Lua 5.3.4 Copyright © 1994-2017 Lua.org, PUC -Rio

Released two original articles · won praise 0 · Views 123

Guess you like

Origin blog.csdn.net/weixin_45617201/article/details/104504770