[] Big Data Distributed Computing

Distributed computing is a calculation method, and is opposed to centralized computing. With the development of computing technology, some applications require enormous computing power to complete, if the use of centralized computing, you need to take a long time to complete. The distributed computing application into a number of smaller parts, allocated to multiple computers for processing. This saves the overall computation time, greatly improving the computational efficiency.

More details about distributed computing: Distributed Computing Starter

(Course explain data for large distributed computing in the related technology, the core flow calculation and explain memory computing technology, explained Ali cloud technology in dealing with these functions are used, and Ali in technology to optimize the way here were a detailed explanation. to help students learn distributed computing technologies for large data developers, enthusiasts to learn)

Distributed computing is defined 

Distributed computing is a computer science major object of study is a distributed system.

Is a hardware and software systems distributed system [1] by a plurality of computers interconnected via a network consisting of a computer and which cooperate with each other to complete a common objective (often referred to as the common goal of "item");
Distributed Computing distribution means executing on computing systems. Distributed computing is a large-scale computing tasks are divided into many parts to other computer processing, and all the results merged into solutions for the original problem.
Note: this is different from the parallel computing, parallel computing is executed using a single computing a plurality of parallel processors. The difference between parallel and distributed computing is computing: distributed computing emphasized that the distribution of tasks, and parallel computing emphasized that the concurrent execution of tasks 

Advantages and disadvantages

  • Pros: very large scale, virtualization, high reliability, versatility, high scalability, on-demand service, extremely cheap, fault tolerance

  • Weaknesses: multiple points of failure (a failure of one or more computers, or a network link failure or more, will result in a distributed system problems); safety (non-distributed system provides authorized user *** more opportunities)

Meaning pattern

  • Distributed Computing and Human

Because modern humans each subject disciplines range covers a wide and thin classification. And today each subject seems to require a lot of calculations. Organization for Astronomical Research in space requires a computer to analyze the pulse (pulse), the stars move; biologists need a computer to simulate the folding (protein folding) process of protein; scientists want to develop drugs against HIV (AIDS) or atypical pneumonia (SARS) in drugs; mathematician want to calculate the largest prime number and the ratio of the circumference of a more accurate value; economists consider to use computer analysis to calculate the direction of development of an enterprise / city / country of tens of thousands of factors in order to macro-control. Thus, the future of mankind science, always inseparable from the calculation. The Distributed Computing (Distributed Computing), with its unique advantages - cheap, efficient and more and more attention.

  • Distributed computing landscape

For now, the world's various distributed computing has about one hundred kinds, most of these calculations without mutual contact, managed independently, using their own independent set of software. This distributed computing separatist each pattern is very conducive to development needs. For example, a biology research institutions need to make use of volunteers around the world to computer simulations of protein folding, that there is no biological research institutions distributed computing professionals, and the community but also, and no company can provide such service, they have to spend a lot of energy for developing their own distributed computing server and client. As a result, the original can be used for time to study the organisms used elsewhere. Biological research institutions just mentioned is PANDE team at Stanford University.

  • BOINC dominate the overall situation

In order to change this separatist chaotic, University of California, Berkeley (UC Berkeley) first proposed the idea of the establishment of BOINC. The Chinese BOINC stands for Berkeley Open Infrastructure for Network Computing (Berkeley Open Infrastructure for Network Computing) , he was able to put a number of different distributed computing projects linked to unified management. And unified allocation of computer resources (for example that you are interested in the study of AIDS drugs and explore extraterrestrial civilization, you can choose to run two, and set priorities). Statistical scoring system for unified management (which, whether you work for the project, as long as you dedicate CPU time is long, it is integral high). With this unified management, it does provide convenience to such scientific research institutions PANDE team!
BOINC is ripe, several projects have been successfully running on the BOINC platform, such as SETI @ home, LHC @ home and so on.

safety

On the user side who, before joining any project, you must make sure you can trust the project development side, mainly related to two aspects:

  • Private data on your computer

You download from the project side calculation program, run in the machine, and can access the network, so only reliable way to ensure the privacy of project data on your computer will not be malicious removal and modification.

  • PC life

Although the computer program running distributed computing in general the lowest priority, will not affect your daily use, but will still cause some pressure on the various components of the computer program to calculate the full load operation, To learn more, please see the Distributed Effects of computer hardware and software calculations.

The project side, the volunteers distributed computing project, after all, is not the party's own staff, not all trusted, it must introduce a certain redundant computer system in order to prevent the calculation error, malicious cheating and so on.

Hours Introduction

  • Flow Calculation Overview

  • Calculate the difference between the flow and batch computing

  • Typical Flow Analysis calculated SYSTEM

  • Ali computing core Technology Overview

  • Stateful implementations calculated

  • StreamSQL

  • Combination of big data and databases

  • Services analytical database ADS

  • Unified Computing Framework

Ali cloud developer community fully upgraded, one-stop experience, with more cool :( Ali cloud developer community home page )

Guess you like

Origin blog.51cto.com/14377691/2409106