Enterprise Big Data Hadoop deployment of new options

Until today, enterprises have recognized that big data analysis can provide a value of the development, but traditional data management and security problems have hampered the deployment of big data.

Suitable for large enterprise data in any case, which is under development by the company's location decision.

Many vendors provide large data services are certainly competing companies want to do business. After all, it is not the smallest of the large data set of data, but the data needs to take advantage of as many data management. If you are looking for the definition of a large deployment data, which is not fully defined. Do you need a growing data center infrastructure to match the data from all these growth.

The large data boom really began with the Apache Hadoop Distributed File System (HDFS), opened the era of massive data as cost-benefit analysis based on the size of the server using the relatively inexpensive local disk cluster. Regardless of how the business has developed rapidly, and its related solutions Hadoop big data, can ensure sustained analyze raw data (ie, not completely structured database).

The problem is that once you want to start from big data, you will find traditional data projects, including those familiar with the enterprise data management problems will emerge out of, such as data security, reliability, performance, and how to protect data.

Although Hadoop HDFS has become mature, but there are still many gaps in order to meet business needs. It turns out that when large data during data collection products, these products on the storage cluster (DAS) may not actually provide the lowest cost accounting.

There is, in fact, the most crucial point is how big companies will make an inventory of big data. We certainly do not want to simply copy, move, back up large data copy, copy large data is a big job. We need to manage as a safe and cautious, even more demanding, so different database than the small, do not hold much detailed information as possible. If the storage foundation of our key business processes on a new big data, we will need all of its operating flexibility and high performance.

Select the new big data belongs

DAS is still the best Hadoop physical storage media, because the associated high level of professional and business of the company is supported by research and practice to determine the storage media. But this HDFS based data storage there is a huge problem.

First, the default scheme is that all data were copied, moved, and then back up. HDFS is based on a large block of data I / O optimization, eliminating the time data interaction. Future use usually means copying data out. Despite local snapshots, but they are not exactly the same point in time or not fully recoverable.

For these and other reasons, the smart enterprise storage vendors HDFS do change, some technical madman type of Hadoop big data experts to calculate the use of external storage. But for many businesses, it offers a good compromise: no high-maintenance or storage to adapt to a new way of protecting storage, but there is a certain cost.

Many vendors, such as EMC's isilon provide interfaces for remote Hadoop HDFS clusters, it is the first choice for business than larger enterprises. Because they will be in isilon in, any other data processing big data protection, including security and other issues. Another benefit is that data is typically stored in external access to other protocols (such as Network File System, NFS) storage, support the transmission of a copy of the data workflow and data limitations and needs of the enterprise. NetApp also based on the principle of large data, a large data reference architecture, combined with a portfolio of storage solutions directly into the Hadoop cluster.

Also worth mentioning is that virtualization big data analytics. In theory, all compute nodes and storage can be virtualized. VMware, and RedHat / OpenStack has Hadoop virtualization solutions. However, almost all of HDFS host nodes can not solve the storage problems of enterprises. An innovative new company bluedata put forward a new choice. It Hadoop analog computing companies to make the existing data sets --SAN / NAS-- acceleration and dumped to HDFS under its coverage. In this way, large data analysis can be done of data from one data center does not have any change, so that all new storage architecture changes and new data streams or data management.

Most recent Apache Hadoop distribution are from the open source HDFS (currently defined software store large data) start, the difference is that they take a different approach. This is basically enterprise Hadoop storage required in order to establish their own compatible storage layer on Hadoop HDFS. MAPR version is fully capable of handling support I / O snapshot replication, as well as students and other protocols supported by the original compatible, such as NFS. It is also very effective and help the main provider of enterprise business intelligence applications, run decision support solutions rely on large historical data and real-time information. Similar idea, IBM has released a high-performance computing storage system API to Hadoop distribution as an alternative to HDFS.

Another interesting solution that can help solve the problem of data. One is dataguise, data security start, can effectively protect some unique IP Hadoop big data sets, which can be covered or encrypt sensitive information in a large cluster of automatic identification and data globally. Horizontal line Data Science (Water LineScience) is an emerging technology in this field, if you connect your login data files to Hadoop, no matter where your data is, even HDFS, it will be saved automatically. Big data to provide output object helps to quickly build business applications using location data sources and statistical information required by the business.

If you have been holding interest in Hadoop management or enterprise data center storage, this is a good time to update their understanding of big data, if you want to keep up with the pace of big data, he should not reject the application of new technologies.

Highly recommended reading articles

Zero-based Big Data Quick Start Tutorial

Java Basic Course

web front-end development based tutorial

Big Data engineers must understand the concept of the seven

The future of cloud computing and big data Five Trends

How to quickly build their own knowledge of large data

Guess you like

Origin blog.csdn.net/chengxvsyu/article/details/92011466