Alibaba Cloud Elastic MapReduce (E-MapReduce) is a system solution for big data processing. Built on Alibaba Cloud ECS, based on open source Apache Hadoop and Apache Spark, users can easily use other peripheral systems in the Hadoop and Spark ecosystems (such as Apache Hive, Apache Pig, HBase, etc.) to analyze and process their own data. Users can also easily import and export data to other cloud data storage systems and database systems of Alibaba Cloud through E-MapReduce, such as Alibaba Cloud OSS and Alibaba Cloud RDS.
Purpose of E-MapReduce
When users want to use distributed processing systems such as Hadoop and Spark, they usually need to go through the following steps:
Evaluate business characteristics
Select machine type
Purchasing machines
Prepare the hardware environment
Install the operating system
Deploy apps like Hadoop and Spark
start the cluster
write application
run job
A series of steps to obtain data
In these processes, what is really related to the user's application logic starts from step 8. The work in steps 1-7 is the preparatory work in the early stage, which is usually very tedious and cumbersome. E-MapReduce provides integrated solutions for cluster management tools, such as host selection, environment deployment, cluster construction, cluster configuration, cluster operation, job configuration, job operation, cluster management, and performance monitoring.
By using E-MapReduce, users can be freed from the tedious procurement, preparation, operation and maintenance of cluster construction, and only care about the processing logic of their own applications. In addition, E-MapReduce also provides users with flexible collocation and combination methods, and users can choose different cluster services according to their own business characteristics. For example, if the user's needs are to perform daily statistics and simple batch operations on data, they can only choose to run the Hadoop service in E-MapReduce; and if the user also needs streaming computing and real-time computing, they can run the Hadoop service in the Hadoop service. Add Spark service on the basis.
Composition of E-MapReduce
The core of E-MapReduce and the component directly faced by users is the cluster. An E-MapReduce cluster is a Hadoop and Spark cluster composed of one or more Alibaba Cloud ECS instances. Taking Hadoop as an example, some daemon processes (such as namenode, datanode, resoucemanager, and nodemanager) are usually running on each ECS instance, and these daemon processes form a Hadoop cluster. The node running the namenode and resourcemanager is called the master node, and the node running the datanode and nodemanager is called the slave node.
Teaching Course: Alibaba Cloud E-MapReduce Learning
(The course mainly introduces how to use Alibaba Cloud E-MapReduce)
syllabus
teaching hours
Lesson 1: Basic introduction to E-Mapreduce 13:52
Lesson 2: Basic introduction to E-Mapreduce (PPT)
Lesson 3: E-Mapreduce Data Synchronization 13:12
Lesson 4: E-Mapreduce Data Synchronization (PPT)
Lesson 5: E-Mapreduce offline processing 15:47
Lesson 6: E-Mapreduce Offline Processing (PPT)
Lesson 7: E-Mapreduce Streaming Processing 15:38
Lesson 8: E-Mapreduce Stream Processing (PPT)
Course objectives
Master the usage of E-MapReduce
suitable for the crowd
big data engineer