Three large data storage issues to resolve technical problems

Big Data

In the IT industry is to keep pace with the other hot topic of cloud computing. "Big data" refers to the number of those huge, difficult to collect, process, analyze the data set, which is prone to memory problems, several major problems described in this article prone.

"Big data" generally refers to the number of those huge, difficult to collect, process, analyze data sets also means that data on long-term preservation of traditional infrastructure. Here the "big" has several meanings, it can describe the size of the organization, but more importantly, it defines the size of the enterprise IT infrastructure. The industry for large data applications has placed an infinite expectation of the more value the greater the accumulation of business information but we need a way to dig out those values.

Why now Big Data?

More than ever, we not only have the ability to store more than the amount of data, but also face more data types. These data sources include online trading, online social networking, automatic sensors, mobile devices, and scientific instruments, and so on. In addition to those fixed production data sources, conduct various transactions may also speed up the rate of accumulation of data. For example, the explosive growth of social networking multimedia data was derived from a new online trading and record behavior. Data always in growth, however, only the ability to store large amounts of data is not enough, because it does not guarantee that we can find out from successful commercial value.

Data is an important factor of production

Information age, the data seems to have become an important factor of production, like other elements of capital, labor and raw materials, like, but also as a universal demand, it is no longer limited to the application of certain specific industries. All walks of life companies are gathering and use a lot of data analysis, as much as possible to reduce costs, improve product quality, increase productivity and create new products. For example, by analyzing the data collected in the field from product testing, to help companies improve the design directly. In addition, a company but also through in-depth analysis of customer behavior, compared to a lot of market data, which go beyond his competitors.

Storage technology to be closely followed

With large data explosive growth in applications, it has spawned its own unique architecture, but also directly promote the development of storage, networking and computing technology. After all, this particular large data processing needs is a new challenge. Hardware, software development ultimately demand-driven, for this example, we clearly see that big data analytics applications is affecting the development of data storage infrastructure.

On the other hand, this change to other storage vendors and IT infrastructure vendors might be a chance. With the continued growth of structured data and unstructured data volume, and the diversification of sources of data analysis, design of storage systems had been unable to meet the needs of large data applications. Storage vendors have realized this, they began to modify the architecture of the storage system block and file-based to accommodate these new requirements. Here, what we will be associated with large data storage infrastructure discuss the properties to see how they meet the challenges of Big Data.

Delays

"Big data" applications there are real-time problems. Particularly in relation to the online trading or financial related applications. Take, for example, network clothing sales industry online advertising services require real-time customer history for analysis and accurate advertising. This requires storage system must be able to support the above characteristics while maintaining a high response speed, because the response is the result of the delay will push "expired" advertising content to customers. In this scenario, Scale-out storage system architecture can play an advantage, because it's every node has a processing and interconnection components, can also be simultaneously increased capacity while increasing processing power. And object-based storage system is capable of supporting concurrent data streams to further improve data throughput.

There are a lot of "big data" applications which require high IOPS performance, such as high-performance computing HPC. In addition, the popularity of server virtualization also led to demand for high IOPS, just as it has changed the traditional IT environment the same. To meet these challenges, various modes of solid-state storage devices came into being, to simply make a small cache inside the server, all solid medium to large scalable storage systems, etc. are all thriving.

Once companies recognize concurrent access to large data analysis application's potential value, they will put more into the system to compare data sets, while allowing more people to share and use the data. In order to create more business value, companies often comprehensive analysis of those data objects from a variety of different platforms. Including the global file system, including storage infrastructure can help users solve the problem of data access, global file system allows multiple users on multiple hosts concurrent access to file data, and these data may store multiple in multiple locations different types of storage devices.

safe question

Some special industry applications, such as financial data, medical information and government intelligence has its own security standards and confidentiality requirements. For IT managers, although these are no different, but all must comply, however, big data analysis often requires many types of reference data with each other, whereas in the past will not have access to this data is mixed case, and therefore large application data also spawned a number of new security issues to consider.

Capacity problems

Here the word "mass" is usually up to the scale of PB-level data, and therefore, mass data storage system must also have the appropriate level of scalability. At the same time, the expansion of the storage system must be simple and can increase capacity by adding disk modules or cabinets, or even no downtime. Based on this demand, more and more customers now favor Scale-out storage architecture. Scale-out characteristics of each cluster structure having nodes other than certain storage capacity, further comprising an internal data processing capability and network equipment, chimney architecture completely different from the conventional storage system, Scale-out architecture can seamlessly smooth the expansion, to avoid islands of storage.

"Big data" applications in addition to the huge size of the data, but also means that has a large number of files. So how to manage the file system metadata layer accumulation is a problem, if not handled properly will affect the scalability and performance of the system, and there is a traditional NAS system bottleneck. Fortunately, there is not an object-based storage architecture of this issue, it can manage the number of files one billion level in a system, but also will not encounter the same problems as traditional storage metadata management. Object storage system also has wide expansion capability, and can be deployed in large-scale storage infrastructure consisting of a cross-regional basis in several different locations.

Costs

"Great," also could mean costly. For those companies are using big data environment, the cost control is the key issue. Want to control costs, it means that we have to let each device achieve greater "efficiency", while also reducing those expensive components. Currently, like deduplication technology has entered into the main storage market, but can now handle more data types, which can be a large data storage applications that bring more value, improve storage efficiency. The amount of data growing environment by reducing the back-end storage consumption, even if only a few percentage points lower, are able to achieve significant return on investment. In addition, thin provisioning, snapshot and cloning techniques used may also enhance storage efficiency.

Many large data storage systems include archiving components, especially for those organizations need to analyze historical data or require long-term preservation of data, archiving essential equipment. From the storage capacity per unit cost perspective, tape is still the most economical storage medium, in fact, in many companies, supports TB class high-capacity tape archiving system is still the de facto standards and practices.

The biggest factor is the impact of cost control those commercial hardware. Therefore, many first-time users and those entering the field of the largest-scale application users will customize their own "hardware platform" rather than existing commercial products, a move they can be used to balance cost control in business expansion process strategy. To meet this demand, more and more storage products are available in the form of pure software, can be installed directly on the user's existing, generic or off the shelf hardware.

 

Guess you like

Origin blog.csdn.net/sdddddddddddg/article/details/90952271