It is not just "big"! Analysis of big data behind the technical details

We all know that talking about big data, one of its most notable feature is the "big", this simply a "big" word, it makes the business in the face of big data start had to use distributed computing mode, there are a number of methods to simplify the calculation.

 

It is not just "big"! Analysis of big data behind the technical details

When dealing with large-scale information, a lot of big data applications for consideration for elastic applications, you need to copy the data to a number of different positions which, so the amount of information began to get bigger, even exponentially growth of.

 

The most important attribute is that big data is not its size, but big job is that it is divided into many small operations capabilities, its ability to handle a task resources across multiple position to parallel processing. When we are faced with big data applications and distributed application architecture, what issues need attention? This issue we are concerned that. >>

 

We know that if a set of distributed computing resources need to be collusion and coordination through the Internet, their application availability becomes very important, once one of the network communications aspect of the problem, then the data will lead to unimaginable results disaster.

 

In fact, for now many large data applications, most of the network infrastructure security and stability is very high, of course, network and data resources among the failures are inevitable, although the high availability of the network is also very important, but want to design the perfect usability is impossible.

 

For enterprise architects, the network resiliency solution is one very effective solution, network resilience and diversity depends path failover two categories. In addition to traditional methods mean time between failures, really large data network design criteria must include these characteristics.

"Congestion" problem of big data

As we all know, it is called a big data technology, the huge amount of data is certain, however, for large data applications, not just large-scale, data for emergency situations also allow many companies a headache.

 

In high-traffic time period, congestion is a serious problem. However, congestion may lead to more time queuing delay and packet loss rate. In addition, congestion may also trigger weight transfer, which could allow itself a heavily loaded network can not afford.

 

Network architecture should be designed to reduce congestion points as possible. In accordance with the design criteria of availability, reduce network congestion has required a higher path diversity, in order to allow network traffic dispersion on a number of different paths. >>

This is the view of senior industry expert, he pointed out, for most applications of big data, the network delay is actually not a big deal, if the calculated time is of the order of a few seconds or minutes, even if the network a larger delay is insignificant.

 

However, big data applications typically need to have higher data synchronization, a feature for large data services experience is very important, because it means that jobs are executed in parallel, while the larger between each job performance differences may lead to failure of the application.

Future scalability rainy day, data

Let's look at a set of numbers, many people know Yahoo runs more than 42,000 nodes in its big data environment, but based on Hadoop Wizard data, the average number of large data cluster nodes 2013 only 100.

 

In other words, even where each server is configured with dual redundant, then support the entire cluster only requires four access switch, scalability is not that now cluster How big is now, but that how to balance extended support future deployments scale.

 

If the infrastructure design is now only suitable for small-scale deployments, then how will this architecture with the increase in the number of nodes and evolving, scalability is not the absolute size, but more concerned about the path to achieve sufficient scale solutions. >>

Network segmentation is an important part composed of large data environments, simply, network segmentation techniques may mean that you need a large amount of data and network traffic separated flow on benefits of doing so is to be avoided because of sudden arising the normal operation of some key services.

 

In addition, users also need to address multiple tenants running multiple jobs, to meet the performance, or audit compliance requirements. The work required to achieve the logical separation of the network load in some instances, also to achieve a number of occasions the physical separation thereof. Want to learn the system big data, you can join the big data technology learning buttoned Junyang: 522 189 307

Everything is inseparable from application-aware

Big Data has now become one of the iconic words cluster environment, through the different needs of different applications, many aspects of data sensitivity requirements of the application to start rising, it means a network to support multiple applications and multi-tenant, it it must be able to distinguish their work load, and to be able to properly handle various workloads, this is actually very difficult.

 

Good experience of the application is composed by a variety of factors, network congestion, the network scalability, big data application skills, and so many ways, user demand and forward for these applications and big data skills but also to enhance the experience of the important indicators it

Published 181 original articles · won praise 3 · views 30000 +

Guess you like

Origin blog.csdn.net/mnbvxiaoxin/article/details/104909273