Six key big data network

Large governments, small businesses, a big data strategy seems inescapable topic, but in the end on how to deploy big data, we seem to be in a vague state.
 
In fact, Big Data applications require large-scale information processing, and for greater flexibility when considering to copy data to multiple locations, is becoming increasingly large-scale information. However, the most important attribute of large data does not lie in its size, but big job is that it is divided into many small operations capabilities, its ability to handle a task resources across multiple position to parallel processing.
  
When large-scale and distributed architecture together, we can find a large data network has a special set of needs. Below are six things to consider:
  
1, network flexibility and large data applications
  
if there is a set of distributed resources must be coordinated through the Internet, the availability becomes critical. If the network fails, the consequences are not continuous bad computing resources and data set appears.
  
Rather, the main focus of most network architectures and engineers is uptime. However, the root cause of network downtime and varied. They may result from various aspects, including equipment failure (hardware and software), maintenance and human error. Failure is inevitable. Although the high availability of the network is also very important, but want to design the perfect usability is impossible.
  
Network architects can not fault the goal to escape, but some flexibility should be designed to adapt to network failures. Elastic network depends on the path diversity (set multiple paths between resources) and failover (to quickly identify problems and transferred to other paths). In addition to traditional mean time between failures (MTBF) method, really large data network design criteria must include these characteristics.
  
2, solve network congestion problems big data applications
  
Big Data is not just large-scale applications, but there is a characteristic sudden I called. When a job is started, the data began to flow. In high-traffic time period, congestion is a serious problem. However, congestion may lead to more time queuing delay and packet loss rate. In addition, congestion may also trigger weight transfer, which could allow itself a heavily loaded network can not afford. Therefore, the network architecture should be designed to reduce congestion points as possible. In accordance with the design criteria of availability, reduce network congestion has required a higher path diversity, in order to allow network traffic dispersion on a number of different paths.
  
3, large data consistency of the network delay is more important than
  
reality, most large data applications are not sensitive to network latency. If the calculated time is of the order of a few seconds or minutes, even if a large delay occurs on the network also does not matter - the order of about a few thousand milliseconds. However, large data application generally have a higher synchronization. This means that jobs are executed in parallel, and the big difference in performance between the various operations may lead to failure of the application. Thus, the network not only efficient enough, but also to have consistent performance in space and time.
  
4, now we must prepare for the big data future scalability
  
may be a little surprised that the majority of big data cluster is actually not large. Many people know that Yahoo runs more than 42,000 nodes in its big data environment, but according to HadoopWizard data, the average number of large data cluster nodes 2013 only 100. In other words, even if each server dual redundant configuration, the entire cluster can support only four access switches (assuming there are 72 access Broadcom 10GbE switch ports, respectively).
  
Scalability is not that now cluster How big now, but that how to balance the extended support future deployments scale. If the infrastructure design is now only suitable for small-scale deployments, then how will this architecture with the increase in the number of nodes and evolving? At a certain point in the future, whether it requires a completely redesigned architecture? Whether this architecture requires some short-range data and location information? The key is to remember that scalability is not the absolute size, but more focused on the path to achieve sufficient scale solutions.
  
5, through the network to handle large data dividing
  
Network segmentation is an important condition to create a big data environment. In the simplest form, may mean that you want to split large data flow separate from other network traffic, bursty traffic generated by this application will not affect other mission-critical workloads. In addition, we also need to deal with multiple tenants to run multiple jobs to meet the performance, compliance and / or audit requirements. The work required to achieve the logical separation of the network load in some instances, also to achieve a number of occasions the physical separation thereof. Architects need to plan on two aspects, but the initial demand for the best unified together.
  
6, application-aware large capacity data network
  
Although the concept of big data and Hadoop deployment close, but it has become synonymous with the cluster environment. According to the characteristics of different applications, the demand for these various cluster environments the same. There are a number of possible some of you may be sensitive to latency high bandwidth requirements, and there is. In short, a network to support multiple applications and multi-tenant, it must be able to distinguish their work load, and to be able to properly handle various workloads.
  
The key big data network design is to understand one thing: demand is not just something to offer adequate bandwidth. Ultimately, the experience of the application depends on many factors, including network congestion and segmentation. Create a network to meet all these requirements need forward-looking, not only to consider the size of scalable infrastructure that can support, but also consider how different types of applications coexist in a common environment.

Guess you like

Origin blog.csdn.net/sdddddddddddg/article/details/91605686