Beginner MaxComputer

MaxComputer is a new big data provided by Ali cloud computing services, which have more efficient computing and storage capacity, my understanding is similar to a data warehouse on a cloud HBase, Hive's.
Refer to the official documentation series:

https://yq.aliyun.com/articles/85595?spm=a2c4e.11153940.blogcont78108.17.46c53af60mplZf

What is MaxCompute a
big data computing services (MaxCompute, formerly known as ODPS open data processing service) is a fast, fully hosted GB / TB / PB-class data warehousing solutions. MaxCompute provides you with complete data import programs and a variety of classic distributed computing model can more quickly solve the problem of massive data computing, reduce business costs, and guarantee data security.

Meanwhile, DataWorks and MaxCompute close relationship, DataWorks to MaxCompute provide a one-stop data synchronization, the task of developing the data workflow development, data management and data operation and maintenance and other functions, please see DataWorks (formerly Big Data Development Kit).

MaxCompute main bulk storage and computing services in the structured data, can provide massive data warehouse solutions and services for the analysis of large data modeling. With the continuous improvement of the social and enrich the data collection instruments, more and more industry data are accumulated. Data has grown to the size of the massive amounts of data (one hundred GB, TB or even PB) level of the traditional software industry can not carry.

In the analysis of massive data scene, since the processing capacity of a single server is limited, the data analysis usually distributed computing model. But distributed computing model for data analysts put forward higher requirements, and difficult to maintain. Using a distributed model, data analysts not only need to understand the business needs, but also need to be familiar with the underlying computing model. MaxCompute aim is to provide you with a convenient means of dealing with massive data analysis, you do not have to care about the details of distributed computing, we can achieve the purpose of analyzing big data.

MaxCompute learning path
you can quickly understand MaxCompute related concepts, basic operations, advanced operation by MaxCompute learning paths.

Advantages
of large-scale computing storage
MaxCompute apply to 100GB of storage and large-scale computing needs, up to the level of EB.

Multiple models of computation
MaxCompute support SQL, MapReduce, Graph and other types of computing and MPI iterative algorithm class.

Strong data security
MaxCompute Ali has stabilized support offline analysis of all business for more than seven years, providing a multi-layer sandbox protection and monitoring.

Low-cost
and self-built private clouds compared, MaxCompute computing storage more efficient, can be reduced by 20% -30% of the purchase cost.

Feature Overview
data channel
supports batch history data channel

TUNNEL is a data transmission service to provide you with MaxCompute, providing high concurrency offline data upload and download services. Support TB day / PB-level data import and export, it is particularly suitable for the full amount of data or historical data bulk import. Tunnel provides a Java programming interface for you, and in MaxCompute client tools, there is a corresponding command to achieve interoperability local files and service data.

Real-time, incremental data channel

For real-time data upload scene, MaxCompute provides low latency, easy to use DataHub services, especially for import incremental data. DataHub plug further supports a variety of data transmission, e.g. Logstash, Flume, Fluentd, Sqoop like support service delivery Log Log Log Service is to MaxCompute, further using DataWorks for log analysis and mining.

Calculation and analysis tasks MaxCompute support multiple models of computation, as detailed below.
SQL: MaxCompute can only be stored in the form of table data, and provide external SQL query functions. You can MaxCompute as a traditional database software operation, but it was able to deal with TB, PB levels of mass data.
Description
MaxCompute SQL does not support transactions, index and Update / Delete operations.
MaxCompute SQL syntax and Oracle, MySQL there are some differences, you can not seamlessly migrate to other databases in SQL statements to MaxCompute up. For details, see the difference with other SQL syntax.
In use, MaxCompute SQL fastest in minutes, even seconds to complete the query level, no results are returned in milliseconds.
MaxCompute SQL advantage is low cost to learn, you do not understand the complexity of distributed computing concepts. If you have database experience, you can quickly familiar MaxCompute SQL usage.
UDF: i.e., user-defined function.
MaxCompute provides many built-in functions to meet your computing needs, while you can also meet the needs of different computing by creating a custom function.

MapReduce: MaxCompute MapReduce is a Java MapReduce programming model MaxCompute provided, which can simplify the development process more efficient. If you use MaxCompute MapReduce, you need a basic understanding of the concepts of distributed computing, and there are corresponding programming experience. MaxCompute MapReduce to provide you with Java programming interfaces.
Graph: Graph function is to provide a MaxCompute iterative calculation processing for the frame of FIG. FIG computing jobs FIG modeling, by the FIG point (Vertex) and the edge (Edge) composition, comprising edge points and weights (Value). Editing, by the evolution of FIG iteration, the final result solved, typical applications: PageRank, single-source shortest distance algorithm, K- means clustering algorithm.
SDK
SDK is available to developers MaxCompute toolkit, please see the SDK introduction.

Security
MaxCompute provides powerful security services to provide protection for your data security, please refer to the safety guidelines.

Two development process
from September 2009 set up Ali cloud, the vision is to do arithmetic / first platform to share data. April 2010, with the on-line business loans Ali finance, ODPS officially put into operation. 2012 to establish a unified data platform, ultra-large scale in 2013 with massive data processing capacity, 2014 to 2015 began large data platform matures, 2016 MaxCompute2.0 birth, the beginning of the establishment of the vision being realized.

A key milestone
2010.04 ODPS officially put into operation, Ali loans on-line financial stability operation.
2013.05 ODPS beta.
2013.07 ODPS formal offer commercial service, single-server cluster size 5K multi-stage cluster capability.
2016.09 ODPS officially changed its name to MaxCompute, and launched MaxCompute2.0, high-performance, new features, rich ecology.

Published 21 original articles · won praise 39 · views 803

Guess you like

Origin blog.csdn.net/programmerDingl/article/details/104292510