The most detailed study of big data roadmap [own custom]

 

I. Getting ready
1, linux operating basis

Linux Introduction, Linux installation: VMware Workstation virtualization software installation process, CentOS virtual machine installation process
for Linux common commands: use and practice (file operation, user management and permissions, free secret landing describes commonly used commands, common commands configuration network management)
Linux system process management basic principles and related management tools such as ps, pkill, top, use htop like;
Linux boot process, run level Detailed, chkconfig Detailed
VI, VIM editor: VI, VIM editor's introduction, VI , VIM Alto, and use shortcut keys used
Linux disk management, lvm logical volumes, nfs Detailed
Linux system file permissions management: file permissions to reports, the operation of file permissions
Linux RPM package management: introduction RPM package, RPM install, uninstall and other operations
yum command, yum source to build
Linux network: the Linux network, Linux network configuration and maintain a firewall configuration
Shell programming: introduction to Shell, the preparation of Shell scripts
to install common software on Linux: installing JDK, install Tomcat, install mysql, web project deployment
13) linux advanced text processing command cut, sed, awklinux

14) Timing task crontab

 

Today, as we put together a large part of the tutorial and share data, each person can choose according to their needs, required little friends can share learning materials at + skirt 199 plus 427 and finally 210 numbers to link it wants.

 

2, high concurrency processing large sites

The fourth layer load balancing

a) Lvs load balancing i. load algorithm, the NAT mode, the direct routing mode (the DR), tunnel mode (TUN)
B) the F5 load balancer Introduction

Load balancing seventh layer
a) Nginx b) Apache

Tomcat, jvm optimization to improve concurrency

Cache optimization
a) Java caching framework I. Oscache, Ehcache
B) cache database i. Redis, Memcached

Lvs + nginx + tomcat + redis | memcache two-story building load balancing ten million concurrent processing

Haproxy

Fastdfs small independent file storage management

Redis caching system a) Redis basic use b) Redis sentinel availability c) Redis friends recommendation algorithm

3, Lucene basis

Lucene Introduction

Lucene inverted index principle

Construction index IndexWriter

Search IndexSearcher

Query

Sort and filtered (filter)

And highlight the Index Tuning

4, solr basis

What is solr
why the project you want to use solr
principle of Solr
how to run in the tomcat solr
how to use solr to index and search
a variety of query solr
Filter solr the
sort of solr
solr highlighting
a domain of statistics solr
range of statistics solr
solrcloud cluster Setup
5, distributed coordination services zookeeper

Introduction and application scenarios zookeeper
zookeeper cluster installation deployment
data node and command line operation zookeeper's
java client and basic operation of the event listener zookeeper
zookeeper core mechanism and data node
zookeeper Applications - Distributed Shared Resource lock
zookeeper Applications - Dynamic server offline perception
zookeeper data consistency principle and leader election mechanism
6, java advanced features to enhance

Basic knowledge of Java multi-threaded
Java synchronized keyword Detailed
java application and contract in the thread pool and open source software in the
Java application and contract news team and open source software in the
Java JMS technology
Java dynamic proxy reflection

Second, off-line computing system
1, hadoop quick start
hadoop Background
Distributed System overview
offline data analysis process introduces
cluster to build
a cluster using the preliminary

2, HDFS enhance the
concepts and features of HDFS
HDFS's shell (command line client) operating
HDFS working mechanism
NAMENODE working mechanism
api java-operation
Case 1: Development of shell scripts collection

3, MAPREDUCE Detailed
RPC frame hadoop custom
Mapreduce programming specification and examples prepared
Mapreduce running debug mode and method of
the internal mechanism of the operating mode of the program mapreduce
body frame operation workflow mapreduce
custom serialization method defined object
MapReduce programming Case

4, MAPREDUCE enhance
Mapreduce ordering
custom partitioner
Combiner Mapreduce of
mapreduce working mechanism Detailed

5, MAPREDUCE combat
maptask parallelism mechanisms - file slice
maptask degree of parallelism set
inverted index
mutual friend

6, federation introduction and hive using
HA mechanism of Hadoop
installation deployment HA cluster of
the cluster operation Datanode dynamic maintenance test of the offline
cluster operation Namenode state maintenance test of the handover management
balance block cluster operation and maintenance test of
HDFS-API changes the HA
hive About
hive architecture
hive installation and deployment
hvie early use

7, hive enhance and flume introduce
HQL-DDL basic syntax
The basic syntax DML-HQL
the Join the HIVE
HIVE parameters
HIVE custom functions and Transform
instances HIVE execute HQL analysis
HIVE best practices Precautions
HIVE optimization strategy
HIVE actual cases
Flume introduction
Flume the installation and deployment
case: collection catalog to HDFS
case: capture files to HDFS

Third, the data migration tool Sqoop

For the configuration Sqoop
Sqoop the shell using
Sqoop Import-A) the DBMS HDFS-B) the DBMS-Hive C) the DBMS-HBase
Sqoop-Export
four, Flume distributed logging framework

flume Introduction - basics
flume installation and testing
flume deployment
flume source configuration and test
flume sink configuration and test
flume selector configuration and case studies
flume Sink Processors configuration and case studies
flume Interceptors configuration and case studies
flume AVRO Client Development
flume and kafka integrated
five-memory database redis
redis characteristics, comparison with other database
how to install redis
how to use the command-line client
redis string type
redis hash type
list redis type of
collection types redis of
how to use java access redis [a.python access redis, scala access] redis
redis transaction (transaction)
redis pipeline (pipeline)
redis persistence (AOF of + RDB)
redis optimization
master copy redis from the
availability of sentinel redis
twemproxy, codis combat
redis3.x cluster installation configuration
Sixth, the Storm and downstream integration architecture

What is kafka

kafka architecture

kafka configuration in detail

kafka installation

kafka storage strategy

kafka partition Features

kafka's publish and subscribe

zookeeper coordination and management

java programming operation kafka

scala programming operation kafka

flume and integration kafka

Integration of Kafka and storm

Seven, Storm from entry to the master

The basic concept of Storm

Storm application scenarios

Storm and contrast of Hadoop

Environment Storm installed linux cluster ready

zookeeper Cluster Setup

Storm Cluster Setup

Storm explain profile configuration item

Cluster Setup solve common problems

Storm common components and programming API: Topology, Spout, Bolt

Storm grouping strategy (stream groupings)

Use Strom develop a WordCount example

Storm local program mode debug, Storm remote debug program

Storm Transaction Processing

Storm message reliability and fault-tolerance principle

Storm binding message queue Kafka: Basic Concepts Message Queuing (Producer, Consumer, Topic, Broker, etc.), the message queue Kafka usage scenarios, Storm programming API binding Kafka

Storm Trident concept

Trident state principle

Trident Development Example

Storm DRPC (Distributed remote calls) Introduction

Storm DRPC combat explain

Storm and Hadoop 2.x integration: Storm on Yarn

Eight, scala programming

scala interpreter, variables, data types, and common
conditional expression scala, the input and output, circulation control structure
scala function, default parameters, variable length parameters
scala array, the array becomes long, multi-dimensional arrays like
scala mapping element group and other operations
scala classes, including bean properties, secondary structure, a main builder like
scala object, singleton object, associated with the object, the extended type, Apply method
scala package, the introduction and inheritance concept
scala characteristics
scala of operator
higher-order functions of the scala
scala set
scala database connection
nine, memory computing system Spark

Spark introduced
Spark scenarios
Spark and Hadoop MR, Storm comparison and advantages
RDD
Transformation
Action
Spark calculate PageRank
Lineage
Spark model profile
Spark caching strategies and fault tolerance
width dependence and narrow dependence
Spark configured explain
Spark cluster to build
a cluster to build solutions to common problems
Spark principle core common components and RDD
data locality
task scheduling
DAGScheduler
TaskScheduler
Spark source code reading
performance tuning
Spark and Hadoop2.x integration: Spark on Yarn principle
ten, SparkStreaming practical application
SparkStreaming Profile
SparkStreaming programming
combat: StageFulWordCount
Flume combined Streaming Spark
Kafka used to live in conjunction with Spark Streaming
window function
ELK technology stack introduction
ElasticSearch install and use
analysis framework Storm
Storm programming model, Tuple source, concurrency analysis
Storm WordCount case analysis and common Api

XI, machine learning algorithms
1, python and numpy library
of machine learning Introduction
Machine Learning and python
python language - Quick Start
python language - data types Detailed
python language - flow control statements
python language - function using
python language - modules and packages
phthon language - Object-oriented
python machine learning algorithm library -numpy
machine learning essential math - probability

2, commonly used algorithm
knn classification algorithm - the algorithm principle
knn classification algorithm - code for
knn classification algorithm - handwriting recognition cases
lineage back classification algorithm - algorithm theory
lineage back classification algorithm - algorithm and demo
naive Bayes classification algorithm - algorithm theory
naive Bayesian classification algorithm - algorithm
naive Bayes classification algorithm - spam recognition application case
kmeans clustering algorithm - the algorithm principle
kmeans clustering algorithm - algorithm
kmeans clustering algorithm - geographic clustering application of
decision tree classification algorithm - principle algorithms
decision tree classification algorithm - algorithm

 

Guess you like

Origin www.cnblogs.com/wuxiaoxia888/p/11015662.html