Quickly master Alibaba Cloud E-MapReduce


Alibaba Cloud Elastic MapReduce (E-MapReduce) is a system solution for big data processing. Built on Alibaba Cloud ECS, based on open source Apache Hadoop and Apache Spark, users can easily use other peripheral systems in the Hadoop and Spark ecosystems (such as Apache Hive, Apache Pig, HBase, etc.) to analyze and process their own data. Users can also easily import and export data to other cloud data storage systems and database systems of Alibaba Cloud through E-MapReduce, such as Alibaba Cloud OSS and Alibaba Cloud RDS.

Purpose of E-MapReduce

When users want to use distributed processing systems such as Hadoop and Spark, they usually need to go through the following steps:

Evaluate business characteristics

Select machine type

Purchasing machines

Prepare the hardware environment

Install the operating system

Deploy apps like Hadoop and Spark

start the cluster

write application

run job

A series of steps to obtain data

In these processes, what is really related to the user's application logic starts from step 8. The work in steps 1-7 is the preparatory work in the early stage, which is usually very tedious and cumbersome. E-MapReduce provides integrated solutions for cluster management tools, such as host selection, environment deployment, cluster construction, cluster configuration, cluster operation, job configuration, job operation, cluster management, and performance monitoring.

By using E-MapReduce, users can be freed from the tedious procurement, preparation, operation and maintenance of cluster construction, and only care about the processing logic of their own applications. In addition, E-MapReduce also provides users with flexible collocation and combination methods, and users can choose different cluster services according to their own business characteristics. For example, if the user's needs are to perform daily statistics and simple batch operations on data, they can only choose to run the Hadoop service in E-MapReduce; and if the user also needs streaming computing and real-time computing, they can run the Hadoop service in the Hadoop service. Add Spark service on the basis.

Composition of E-MapReduce

The core of E-MapReduce and the component directly faced by users is the cluster. An E-MapReduce cluster is a Hadoop and Spark cluster composed of one or more Alibaba Cloud ECS instances. Taking Hadoop as an example, some daemon processes (such as namenode, datanode, resoucemanager, and nodemanager) are usually running on each ECS instance, and these daemon processes form a Hadoop cluster. The node running the namenode and resourcemanager is called the master node, and the node running the datanode and nodemanager is called the slave node.

Teaching Course: Alibaba Cloud E-MapReduce Learning

(The course mainly introduces how to use Alibaba Cloud E-MapReduce)

syllabus

teaching hours

Lesson 1: Basic introduction to E-Mapreduce 13:52

Lesson 2: Basic introduction to E-Mapreduce (PPT)

Lesson 3: E-Mapreduce Data Synchronization 13:12

Lesson 4: E-Mapreduce Data Synchronization (PPT)

Lesson 5: E-Mapreduce offline processing 15:47

Lesson 6: E-Mapreduce Offline Processing (PPT)

Lesson 7: E-Mapreduce Streaming Processing 15:38

Lesson 8: E-Mapreduce Stream Processing (PPT)

Course objectives

Master the usage of E-MapReduce

suitable for the crowd

big data engineer

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325831628&siteId=291194637