Hadoop Big Data Platform Development and Case Study

On hold "Hadoop Big Data platform development and case analysis" Senior Engineer


 

  

 

 

I. Course Introduction

1.  needs to understand

Hadoop beginning of the design on the target located in the high reliability, scalability, fault tolerance and high efficiency, it is these inherent design advantages, which makes Hadoop appeared to be a number of large companies alike, but also caused widespread concern in the research community.

Telecom operators, Internet users log contains a large number of individual users, preference information, analyze and mining, to better understand customer needs. Management Analysis System minicomputer architecture Traditional relational database plus unable to meet the massive demand for processing unstructured data, ways to build Hadoop-based X86 platform, the introduction of large data processing technology to achieve high efficiency, low cost, easy to expand business mix and match analysis system architectures become carriers of the most preferred option. This course is a thorough introduction to Hadoop platform development and technical operation and maintenance, the use of this technology for students with a high value.

2.  Courses frame structure and design thinking Road

 

Want to be a big data cloud computing Spark master, look here! I read poke

50W annual salary of Java programmers to turn Big Data learning route poke me read

Artificial intelligence, big data trends and prospects   poke me read

The latest and most complete big data exchange system path! ! Poke me read

2019 latest! Big Data Engineer jobs salary, it was amazing ! I read poke

(1) Architecture:

The course is divided into three main sections:

Part I: focuses on the application of Big Data technologies, so that students have a clear understanding of the extensive application of large data technology, in this session which will focus on the important status and application of technology in the entire Hadoop big data technology.

Part II: hadoop technology for specific modular spin-off from large data file storage systems and distributed file systems and application platforms talk to introduce the major Hadoop technology application tools and methods, as well as the operation and maintenance among maintenance mainstream practice, so that students fully understand and grasp the essence of Hadoop technology.

Part III: focus on big data analysis application case, among the participants in the case of deeper sensory impression of the technology

(2) design ideas:

The course is modular teaching methods, case studies to the main line, progressively, step by step, to design from theory to practice operations.

(3) and enterprise fit point:

This course combines big data and enterprise restructuring and development strategy, and expand development goals around the large enterprise data services and industry application market, focusing on teaching Hadoop application of technology to enhance the ability of enterprise development, and operation and maintenance of IT technical staff, has a strong fit degree.

Second, the object

Enterprises and institutions around big data industry stakeholders, operators IT operation and maintenance engineers and information technology personnel, finance personnel-related information, or for large data of interest related personnel.

Three target

Master big data processing platform (Hadoop, Spark, Storm) install and deploy technology infrastructure, and platforms, operation and maintenance configuration, application development; master the technology infrastructure and the practical application of mainstream Big Data Hadoop platform and Spark real-time processing platform; the use of Hadoop + Spark large data storage management technology industries and mining analysis; explain the Hadoop ecosystem components, including Storm, HDFS, MapReduce, HIVE, HBase, Spark, GraphX, MLib, Shark, elasticSearch and other large data storage management, distributed database, large-scale data warehousing, big data and search queries, analyze large data mining and distributed processing technology

Four , outline

(1) curriculum frameworks

time

training content

teaching methods

first day

morning

Part I: the mobile Internet, large data, cloud Description of Related Art

Part II: the challenges and direction of development of big data

Theoretical lectures + Case Studies

in the afternoon

Part III: large data file storage systems and distributed file system technology platform and its application

Part IV: Hadoop HDFS file system Best Practices

Theoretical lectures + case studies + group discussion

the next day

morning

Part V: Hadoop operation and maintenance management and performance tuning

Part VI: NOSQL database Hbase and Redis

Theoretical lectures + case studies + practical exercise

in the afternoon

Part VII: class SQL statements tool --Hive

Part VIII: Based on the introduction of data mining modeling SPARK

Theoretical lectures + case studies + practical exercise

The third day

morning

Part IX: Kafka basic introduction

Part 10: Large data Typical Application and Development Case Study: Internet Data Operations

Theoretical lectures + Case Studies

in the afternoon

Section 11: The current data center transformation and conversion analysis - to domestic and foreign operators, Internet companies, for example

Section 12: Course Summary and answering questions

Assessment Training

Theoretical lectures + case studies + group discussion

The fourth day

Student exchange and industry exam

Details Introduction

Course Modules

Course Topics

The main content and presentation and case

A module

Mobile Internet, big data, cloud computing technology introduced

1, a data center and cloud computing technology

2, smart city and cloud computing technology

3, mobile Internet, a large cloud data related art

4, the ecosystem of mobile and cloud computing industry chain

5. Application of Big Data technologies in the operators, financial, banking, e-commerce industry, retail, manufacturing, government information technology, internet, education, information technology and other industries

6, domestic and international mainstream big data solutions introduced

7, current analysis of large data solutions with traditional database solutions

8, Cloudera Hadoop big data analysis platform program

9, open source platform for big data ecosystem analysis

Module II

The challenges and direction of big data

1, the era of big data challenge

Ø strategic decision-making ability

Ø technology development and data processing capabilities

Ø organizational and operational capabilities

2, the direction of development of big data era

Ø cloud computing infrastructure

Ø Big Data is the soul of assets

Ø analysis, mining is the means

Ø discover and predict the ultimate goal

3, large data mining applications in various industries

Ø telecommunications industry applications and case studies

Ø Internet industry applications and case studies

Ø financial industry applications and case studies

Ø sales industry application case studies

Module III

Storage systems and distributed file systems and application platforms of large data files

1, Hadoop development process

Ø Hadoop Big Data platform architecture

Ø based PB-class storage management and analysis of large data processing Hadoop platform and mechanism works

Ø Hadoop core components analysis

2, HDFS distributed file system

Ø Overview, features, functions, advantages

Ø scope of application, application status

Ø trends

3, HDFS distributed file system architecture and principles

Ø key technologies

Ø 设计精髓

Ø 基本工作原理

Ø 系统架构

Ø 文件存储模式

Ø 工作机制

Ø 存储扩容与吞吐性能扩展

4、 分布式文件系统HDFS操作

Ø SHELL命令操作

Ø I/O流式操作

Ø 文件数据读取、写入、追加、删除

Ø 文件状态查询

Ø 数据块分布机制

Ø 数据同步与一致性

Ø 元数据管理技术

Ø 主节点与从节点工作机制

Ø 大数据负载均衡技术

Ø HDFS大数据存储集群管理技术

5、 Hadoop生态系统组件

Ø Storm

Ø HDFS

Ø MapReduce

Ø HIVE

Ø HBase

Ø Spark

Ø GraphX

Ø MLib

Ø Shark

模块四

Hadoop文件系统HDFS最佳实战

1、 HDFS的设计

2、 HDFS的概念

Ø 数据块

Ø namenode和datanode

Ø 联邦HDFS

Ø HDFS的高可用性

3、 命令行接口

4、 Hadoop文件系统

5、 Java接口

Ø 从Hadoop URL读取数据

Ø 通过FileSystem API读取数据

Ø 写入数据

Ø 目录

Ø 查询文件系统

Ø 删除数据

6、 数据流

Ø 剖析文件读取

Ø 剖析文件写入

Ø 一致模型

7、 通过Flume和Sqoop导入数据

8、 通过distcp并行复制

9、 Hadoop存档

Ø 使用Hadoop存档工具

Ø 不足

模块五

Hadoop运维管理与性能调优

1、 第二代大数据处理框架

Ø Yarn的工作原理及

Ø DAG并行执行机制

Ø Yarn大数据分析处理案例分析

Ø Yarn 框架并行应用程序实践

2、 集群配置管理

Ø Hadoop集群配置

Ø Hadoop性能调优与参数配置

Ø Hadoop机架感知策略与配置

Ø Hadoop压缩机制

Ø Hadoop任务负载均衡

Ø Hadoop 集群维护

Ø Hadoop监控管理

3、 HDFS的静态调优技巧

Ø HDFS 的高吞吐量I/O性能调优技巧

Ø MapReduce/Yarn的并行处理性能调优技巧

Ø Hadoop集群的运行故障剖析,以及解决方案

Ø 基于Hadoop大数据应用程序的性能瓶颈剖析与提

Ø Hadoop 大数据运维监控管理系统 HUE 平台的安装部署与应用配置

Ø Hadoop运维管理监控系统Ambari平台的安装部配置

Ø Hadoop 集群运维系统 Ganglia, Nagios的安装部署与应用配置

模块六

NOSQL数据库Hbase与Redis

1、 NOSQL基础

Ø CAP理论

Ø Base与ACID

Ø NOSQL数据库存储类型

 键值存储

 列存储

 文档存储

 图形存储

2、 HBase分布式数据基础

3、 安装Hbase

4、 Hbase应用

Ø HBase的逻辑数据模型,HBase的表、行、列族、列、单元格、版本、row key排序

Ø HBase的物理模型,命名空间(表空间)、表模式(Schema)的设计法则

Ø HBase 主节点HMaster的工作原理,HMaster的高可用配置,以及性能调优

Ø HBase 从节点RegionServer(分区服务节点)的工作原理,表分区及存储I/O高并发配置,以及性能调优

Ø HBase的存储引擎工作原理,以及HBase表数据的键值存储结构,以及HFile存储结构剖析

Ø HBase表设计与数据操作以及数据库管理操作

Ø HBase集群的安装部署、参数配置和性能优化

5、 HBase分布式数据库简介、发展历程、应用场景、工作原理、以及应用优势与不足之处

Ø HBase分布式数据库集群的主从式平台架构和关键技术剖析

Ø HBase伪分布式和物理集群分布式的控制与运行配置

Ø HBase从节点RegionServer(分区服务节点)的工作原理,表分区及存储I/O高并发配置,以及性能调优

Ø HBase的存储引擎工作原理,以及HBase表数据的键值存储结构,以及HFile存储结构剖析

Ø HBase表设计与数据操作以及数据库管理操作

Ø HBase集群的安装部署、参数配置和性能优化

Ø ZooKeeper分布式协调服务系统的工作原理、平台架构、集群部署应用实战

Ø ZooKeeper集群的原理架构,以及应用配置

6、 Redis内存数据库介绍,以及业界应用案例

Ø Redis内存数据库集群架构以及核心技术剖析

Ø Redis 集群的安装部署与应用开发实战

模块七

类SQL语句工具——Hive

1、 安装Hive

2、 示例

3、 运行Hive

Ø 配置Hive

Ø Hive服务

Ø Metastore

4、 Hive与传统数据库相比

Ø 读时模式vs.写时模式

Ø 更新、事务和索引

5、 HiveQL

Ø 数据类型

Ø 操作与函数

6、 表

Ø 托管表和外部表

Ø 分区和桶

Ø 存储格式

Ø 导入数据

Ø 表的修改

Ø 表的丢弃

7、 查询数据

Ø 排序和聚集

Ø MapReduce脚本

Ø 连接

Ø 子查询

Ø 视图

8、 用户定义函数

Ø 写UDF

Ø 写UDAF

模块八

数据挖掘SPARK建模基础介绍

 

1、 Spark简介

Ø Spark是什么

Ø Spark生态系统BDAS

2、 Spark架构

Ø Spark分布式架构与单机多核架构的异同

3、 Spark集群的安装与部署

Ø Spark的安装与部署

Ø Spark集群初试

4、 Spark硬件配置

Ø Spark硬件

Ø Spark硬件配置流程

模块九

Kafka基础介绍

1、 Kafka介绍

2、 kafka体系结构

3、 kafka设计理念简介

4、 kafka通信协议

5、 kafka的伪分布安装、集群安装

6、 kafka的shell操作、java操作

7、 kafka设计理念*

8、 kafka producer和consumer开发

9、 Kafka分布式消息订阅系统的应用介绍、平台架构、集群部署与配置应用实战

10、 Flume-NG数据采集系统的数据流模型、平台架构、集群部署与配置应用实战

11、 Hadoop与DBMS之间数据交互工具Sqoop的应用实践,

12、 Sqoop导入导出数据以及Sqoop集群部署与配置

13、 Kettle 集群的平台架构、核心技术、部署配置和应用实战

14、 利用Sqoop实现 MySQL 与 Hadoop集群之间

模块十

大数据典型应用与开发案例分析:互联网数据运营

1、 案例1:贵州数据交易中心

Ø 交易所交易形式:电子交易

Ø 交易所服务:大数据交易、大数据清洗建模分析、大数据定向采购、大数据平台技术开发

Ø 大数据交易安全性探讨分析

Ø 数据交易中心商业模式探讨分析

2、 案例2:大数据应用案例:公共交通线路的智能规划

Ø UrbanInsights:为公交公司提供基于订阅访问的大数据工具以及大数据咨询服务

Ø Urban Insights数据源、数据收集、数据仓库、数据分析——设计运营线路

Ø Urban Insights通过互联网数据的运营

3、 讨论:浙江移动大数据应用与开发方向

模块十一

当前数据中心的改造和转换分析-以国内外运营商互联网公司为例

1、 流商业大数据解决方案比较

2、 主流开源云计算系统比较 

3、 国内外代表性大数据平台比较 

4、 各厂商最新的大数据产品介绍

5、 案例分析

Ø Facebook的SNS平台应用

Ø Google的搜索引擎应用

Ø Rackspace的日志处理

Ø Verizon成立精准市场营销部

Ø TelefonicaDynamicInsights推出的名为“智慧足迹”的商业服务

Ø 中国联通的“移动通信用户上网记录集中查询与分析支撑系统”

 

Guess you like

Origin blog.csdn.net/spark798/article/details/93491267