The challenges of technological development

In promoting the rapid development of science and technology in the field of IT, enterprises will face two problems.

First, how to achieve high availability websites, easily scalable, extensible, high-security objectives. To address this series of problems, forcing the site's infrastructure continues to develop. Towards high-availability architecture from a single architecture, this process must mention is distributed.

Second, the user increasing scale, the resulting data are to grow exponentially, commonly known as the Big Bang data. Scene massive data processing is also increasing. Technically how to face?

1. Distributed Systems 1.1. Outline

Distributed systems is a hardware or software components located on different network computers, and passes only the coordination system communicates through a message between one another. It is simply a collection of a group of independent computers together to provide services, but for users of the system, it is like a computer in the same service.

Distributed means you can use more ordinary computer (as opposed to expensive mainframe) distributed cluster consisting of external services. The more computers, the more CPU, memory, storage and other resources, the greater the amount of concurrent access can handle.

The first generation of web service site architecture is often relatively simple, all resource applications, databases, files are on a single server.

The challenges of technological development

Figure: Internet initial stages of site architecture

The challenges of technological development

Figure: Internet sites now commonly used in architecture

From the concept of distributed systems, we know that communication and coordination between the various hosts mainly through the network, so a distributed computer system is virtually no limit in space, and which could be placed on a different cabinet, also may be deployed in different rooms, it is also possible in different cities, for large sites may even be located in different countries and regions.

1.2. feature

Distribution: distributed among multiple computers in the system can be distributed randomly in spatial position between the main system is no more than one computer, from the points that the host does not control the entire system, not controlled from machine.

Transparency: system resources are shared by all computers. Each computer can not only use the resources of the machine can also be used in a distributed system resources for other computers (including CPU, files, printers, etc.).

Identity: Some computer systems can collaborate with each other to accomplish a common task, or a program can be distributed in parallel on several computers running.

Communicatively: Any two computers can exchange information through communication.

1.3. Commonly distributed program

Distributed applications and services

将应用和服务进行分层和分割,然后将应用和服务模块进行分布式部署。这样做不仅可以提高并发访问能力、减少数据库连接和资源消耗,还能使不同应用复用共同的服务,使业务易于扩展。比如:分布式服务框架Dubbo。

分布式静态资源

对网站的静态资源如JS、CSS、图片等资源进行分布式部署可以减轻应用服务器的负载压力,提高访问速度。比如:CDN。

分布式数据和存储

大型网站常常需要处理海量数据,单台计算机往往无法提供足够的内存空间,可以对这些数据进行分布式存储。比如Apache Hadoop HDFS。

分布式计算

随着计算技术的发展,有些应用需要非常巨大的计算能力才能完成,如果采用集中式计算,需要耗费相当长的时间来完成。分布式计算将该应用分解成许多小的部分,分配给多台计算机进行处理。这样可以节约整体计算时间,大大提高计算效率。比如Apache Hadoop MapReduce。

1.4. 分布式、集群

分布式(distributed)是指在多台不同的服务器中部署不同的服务模块,通过远程调用协同工作,对外提供服务。

集群(cluster)是指在多台不同的服务器中部署相同应用或服务模块,构成一个集群,通过负载均衡设备对外提供服务。

 

2. 海量数据处理

公开数据显示,互联网搜索巨头百度2013年拥有数据量接近EB级别。阿里、腾讯都声明自己存储的数据总量都达到了百PB以上。此外,电信、医疗、金融、公共安全、交通、气象等各个方面保存的数据量也都达到数十或者上百PB级别。全球数据量以每两年翻倍的速度增长,在2010年已经正式进入ZB时代,到2020年全球数据总量将达到44ZB。

The challenges of technological development

数据分析的前提是有数据,数据存储的目的是支撑数据分析。究竟怎么去存储庞大的数据量,是开展数据分析的企业在当下面临的一个问题。传统的数据存储模式存储容量是有大小限制或者空间局限限制的,怎么去设计出一个可以支撑大量数据的存储方案是开展数据分析的首要前提。

When solving the problem of massive data storage, computing the next question facing the vast amounts of data is more a headache, because companies can calculate not only pursue, but also the pursuit of computing speed, efficiency.

The current level to the amount of data generated by the Internet industry, to deal with these data, we need a better, more convenient way of analysis and calculation. Traditional apparently powerless, and efficiency will be very low. This is another challenge facing the field of traditional data analysis, how to analyze and calculate.

Guess you like

Origin blog.51cto.com/14473726/2432855