Zero-based Big Data learning framework

Big Data to develop the core curriculum is the Hadoop framework, can almost be said that Big Data Hadoop development. This framework is similar to SSH / SSM framework for Java application development, is an open source Java framework capable cattle Apache Foundation or other open-source Java community groups to contribute to the development of everyone to use. Branch much data big data to show you.

Java language is king is the truth, Java's core code is open source, is the result of the global cattle were capable of learning together to jointly develop a common test, so that Java is the most stand the test of language, but anyone can learn Java core and the use of technology as the core technology to develop android same system and the same Hadoop framework. If the program in the world compared to a tree, then Java is the root, SSH, and so are its Hadoop framework bloom branches and leaves too.

In this case still have to recommend my own build Big Data learning exchange group: 529 867 072, the group is big data science development, big data if you are learning, you are welcome to join small series, we are all party software development, from time to time Share dry (only the big data-related software development), including a copy of the latest big data and advanced data advanced development course my own sort of welcome advanced and want to delve into the big data small partners to join.
Zero-based Big Data learning framework

Due to the large data development engineer is the most popular IT training industry professionals, technical personnel is leading the Big Data revolution smart beach-goers, is the most direct beneficiaries of intelligent era, such an important professional and Codo must give a detailed and thorough explanation of everyone, to Hadoop ecosystem-based, introduced all the technical level at present big data application development engineers at work which are used, suggest that you learn before the big data development engineer professional, have a certain experience of learning Java syntax and basic framework.

Zero-based curriculum subjects include how much data java + big data development in two parts, to improve the curriculum for friends java development experience of large data contain only part. Because the foregoing description that you should know, big data is the need to learn some java-based.

Big Data Hadoop open source development platform

hadoop is a large amount of data can be software framework for distributed processing, hadoop in a reliable, efficient and scalable approach to data processing, the reason why the user can easily develop and run the application data processing huge amounts of data on hadoop, because hadoop with high reliability, scalability, high efficiency, high fault tolerance advantages.

hadoop big data ecosystems:Zero-based Big Data learning framework

Distributed File System -HDFS

Lift hadoop file system, the first thought is to HDFS (Hadoop Distributed File System), HDFS is the main hadoop file system, data is stored in Hadoop platform, the establishment of a distributed storage system on the network. hadoop also integrates other file systems, hadoop file system is an abstract concept, HDFS is just one implementation.

Distributed computing framework -MapReduce

MapReduce is a programming model, is the platform Hadoop data processing. For large data sets (greater than 1TB) parallel computing. The concept "Map (Mapping)" and "Reduce (reduction)", and their main idea is borrowed, and borrowed from the vector programming language properties from functional programming languages. It is very easy for programmers in the case will not be distributed and parallel programming will own programs running on a distributed system.

Distributed open source database -Hbase

HBase - Hadoop Database, HBase is a distributed, column-oriented open-source database. Suitable for unstructured data storage, data retention periods multiple versions. Hbase greatly facilitate the expansion of the Hadoop for data processing and application.

Big Data ecosystem development platform moduleZero-based Big Data learning framework

Hive

Hive is a data warehouse Hadoop-based tool, SQL structured query processing functions. You can map the structure of the data file to a database table, and provides a simple sql queries, sql statement can be converted to run MapReduce task execution and submitted to the cluster up. The advantage is the low cost of learning, you can quickly achieve a simple MapReduce statistics by type of SQL statements, without having to develop specialized MapReduce applications, without the use of Java programming, statistical analysis of the data warehouse is very suitable.

学习Hive时,对于Hive QL中的DDL和DML就是必须要掌握的基础;表的定义、数据导出以及常用的查询语句的掌握是完成大数据统计分析的基础。学会针对Hive进行编程:使用Java API开操作Hive、开发Hive UDF函数。掌握好Hive部分高级的特性能大大提升Hive的执行效率。在优化过程中可以很好的借助于执行计划来进行分析,学习Hive时需要注意Hive性能优化是在生产中的最重要的环节,如何解决数据倾斜是关键;梳理清楚Hive元数据各个表之间的关联关系也能提升对Hive的把握能力。

Zookeeper协调Hadoop生态圈各个模块共同工作

从英文含义上来看Hadoop是小象,Hive是蜜蜂,pig是猪,Zookeeper是动物管理员。那么很显然Zookeeper的作用是分布式应用程序协调服务,为各个模块提供一致性服务的。

数据导入导出框架Sqoop

Sqoop是一款开源的工具,英文含义是象夫,就是喂养大象的人,主要用于在Hadoop(Hive)与传统的数据库(mysql、postgresql...)间进行数据的传递,可以将一个关系型数据库中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。大数据学习交流群:251956502

学习目标:

1.了解Sqoop是什么、能做什么及架构 ;

2.能够进行Sqoop环境部署 ;

3.掌握Sqoop在生产中的使用 ;

4.能够使用Sqoop进行ETL操作 。

Scala编程开发

Scala是一种函数式面向对象语言,类似于RUBY和GROOVY语言,它无缝结合了许多前所未有的特性形成一门多范式语言,其中高层并发模型适用于大数据开发。而同时又运行于JAVA虚拟机之上。

Spark

Spark是目前最流行的大数据处理框架,以简单、易用、性能卓越著称。丰富的程序接口和库文件也使得Spark成为业内数据快速处理和分布式机器学习的必备工具。

*扩展技能:

python开发基础、数据分析与数据挖掘

Sklearn learning data mining tools, data mining familiar naive Bayes algorithm and data mining SVM classification algorithm, and end-use Sklearn achieve Bayes and SVM algorithm.

Storm big data distributed real-time computing

Storm framework of distributed data processing, Storm and expansion can be easily prepared in complex real-time calculation of a cluster of computers, Storm for real-time processing, like Hadoop for batch processing. If MapReduce parallel batch processing reduces the complexity, Storm is to reduce the complexity of real-time processing.

Guess you like

Origin blog.51cto.com/14296550/2403089