Ali cloud E-MapReduce Quest product quickly build scalable, high-performance platform for Big Data

This article from the Xia share, nickname Lei Biao, Alibaba computing platform EMR senior product specialist.
2014 came into contact with large data, large data after the internal development of Ali, currently in charge of open source in the cloud Ali big data platform EMR products, open source ecosystem to build on the cloud.

product description

Ali cloud EMR overall structure is as follows:

Management operation and maintenance capabilities

  • Cluster management, job management and scheduling
  • Web-based operation, SDK & API

Fully compatible with open-source system, and to strengthen the basis

  • Hadoop, Spark performance optimization
  • Enhanced monitoring capabilities can be integrated

Accompanied by ecological community development

  • Components follow the open source community to maintain version upgrade
  • Open source cloud platform and Ali were linked, give full play to the ecological capacity of the cloud
  • Cloud offerings docking (OSS, SLS, MaxCompute, etc.)
  • Cloud docking capability, flexibility, etc. (local disk example of stringent break, resilient and elastic capacity to support the bid instance)

Global deployment (global deployment region 15)

  • Fast Copy ecologically diverse enterprise-class open source big data scene programs

It provides a complete enterprise-class integration platform

  • Packaged computing platform capabilities
  • Out of the box experience

Common combination used:

file

Big Data platform application to components include:

General Hadoop

  • Open source big data off-line, real-time, Ad-hoc query scenarios
  • Based on the open source Hadoop ecology, the use of cluster resource management YARN, providing Hive, Spark offline data storage and large-scale distributed computing, SparkStreaming, Flink, Storm streaming data calculation, Presto, Impala interactive query, Oozie, Pig and other Hadoop ecosystem component that supports OSS storage, support for Kerberos authentication and encryption of data.

Kafka

  • High throughput open, scalable message system
  • E-MapReduce Kafka provides a complete set of service monitoring system and metadata management. Widely used in log collection, monitoring data polymerization scene, or streaming support offline data processing, real-time data analysis.

DataScience

  • Big Data + AI scene
  • Data Science + AI scenarios for large data provided Hive, Spark off large data ETL, TensorFlow model training, the user can select the frame heterogeneous computing CPU + GPU, the GPU using NVIDIA partial-depth learning algorithm for calculating the line of high performance.

Druid

  • Real-time interactive analysis service scene
  • Druid provides a large data queries millisecond delay, support a variety of data ingestion ways. E-MapReduce service may combinations Hadoop, E-MapReduce Spark, Ali cloud OSS, Ali cloud using RDS, etc., to build robust and flexible solution for real-time queries.

Zookeeper

  • Distributed Lock
  • For large-scale Hadoop cluster, HBase cluster, Kafka separate cluster distributed lock service consistency.

Product Function Point

Visualization Cluster Management Console

file
file

Built-in scheduling system

file

  • Project-level rights management
  • Support DAG
  • Better flexibility combined resources
  • A variety of convenient job management
  • Sound alarm and monitoring

Machine Learning Support

Deep learning, AI to become hot words, EMR EMR Cluster Learning to learn the depth and the depth of the open-source Big Data technologies combine to provide the integration of large data + depth learning services. The use of a cluster, build enterprise data lake, at the same time machine learning and deep learning:

  • Support ECS GPU models, ML by Hadoop YARN scheduling GPU cluster resource Spark
  • TensorFlow Horvod • Support TensorFlow, Horvod and other computing framework
  • Employed PS, MPI data communication mode, etc.
  • Support Docker, Standalone operating mode

file

Disclaimer: This article numbers for all except otherwise specified, all original and the public have a priority right to read the reader number, shall not be reproduced without the author allows, otherwise pursue tort liability.

I am concerned about the number of public, backstage reply [JAVAPDF page 200] get questions!
50,000 people of concern to large data path of God, do not come to know about it?
Road 50000 Big Data concern to God, do not really look at it?
50,000 people of concern to large data path of God, do not really determined to learn about it?

Welcome your interest in "big data into the path of God."

Big Data technology and architecture

Guess you like

Origin www.cnblogs.com/importbigdata/p/11816910.html