What do Big Data, Big Data Why Study

Hadoop and big data is one of the last two years were the hottest word, more and more companies are interested in this stuff, but most companies I come into contact with the people, both technical staff or boss. I do not know how these things can be used to improve their company's business. In the process of the answer, extract a few key points, record it.
What do Big Data, Big Data Why Study

Big data and the cloud is not the same thing?

This is one of the most confusing concept, I personally think it is two different things, cloud services, whether cloud hosting or cloud storage or other cloud applications are provided an interface to the user, but the back end of this interface is a virtual machine technology, or distributed storage technique, or other distributed computing technologies. In short, the concept of cloud services that I offer to you, and you do not care about architecture or technology to achieve such a service how complex. Figuratively, it seems, before the cloud era we want electricity, you need to build a own power plants, manufacturing units, building substation, and then go with it. The cloud service as if someone else is setting up a power plant, wire directly into your home, you want to use, just plug it in, do not care how electricity is being manufactured. Production and maintenance of electrical equipment of the power of commitment by the national grid. Linked to the network, that is, we used to have to buy their own server, install their own system, their own shelves, make their own load balancing, to maintain their own hardware and software environment. Once you have the cloud, this is done through the cloud service provider's virtual machine technology. Data security and network security provided by the cloud service provider, you do not need to specifically asked people to maintain a bunch of equipment.

If you want to learn the best big data added to a good learning environment, this may be the Q group 251,956,502 so that everyone would be relatively easy to learn, but also to communicate and share information on common

And when it comes to big data, this can be a cloud-based, may not be cloud-based. Processing of Big Data and cloud services technology is not the same, but there are intersection. We can say that cloud services are infrastructure, municipal engineering, and big data is the city's high-rise buildings. Big data can be cloud-based, may not be cloud-based.

From a technical point of view, the country's most cloud service providers, the main provider of the service is a virtual machine, which is a concept of points, to split one physical server into several virtual servers, use it as much as possible physical resources and avoid waste. And the idea of ​​big data is combined, it is put a lot of server consolidation into a giant virtual servers, allocate computing resources so that the data can be quickly productivity services. Describe Big Data and Hadoop with a Chinese old saying is: Top Three Stooges Zhuge Liang. Beyond minicomputer or medium-sized machines with computing power combined network computing resources. Of course there is also the concept of cloud exists, that is, you do not need to be concerned about data storage and computing in the end is how accomplished you just use it.

Big Data technology is not necessarily the amount of data before they can, the amount of data you do not need?

Usually I think so, but not absolute, large computing dimensions, complex calculations can also be considered big data. In other words, if the data you need, in the time you need not be calculated properly, you may need to use the big data technology.

On the one hand, your data storage needs exceed the capabilities of a database or data warehouse, you may need to Big Data technologies; on the other hand, more than your calculations beyond the traditional means of data processing capabilities of aging range, you also It may require big data technologies. The computing power of a typical challenge is to analyze data from mining and multi-dimensional. It may not be the amount of data, but the process is very complex algorithms and may also require large technical data. For example, users do recommend, do accurate advertising classification based on user groups. Or calculated in the traditional industries weather forecast, calculation of geological data do oil exploration, mineral exploration. Or used in the financial industry, through the establishment of a mathematical model to historical data, the securities and futures loans do risk estimate. The reason why Alibaba forecast for China's economic and export more accurate than the Ministry of Commerce and the Bureau, except that they have a group of mathematical and statistical experts, the big data is completely indispensable technical means.

Big Data technology is not that what is foolish Du?

Obviously not, there is a large number of vendors and application data fields, there are open source, there is a fee. For example, software companies and some non-Hadoop big data processing, EMC's Greenplum, Splunk's splunk and so on. These are not based on Hadoop, but also have a common flaw, that is very expensive. Therefore, most companies using open source software to complete the business process large data. And open source does best is to take a hadoop. So now hadoop basically become synonymous with the large data processing. Based on Hadoop rise to many commercial companies, because the Apache license agreement does not refuse business. Like the more well-known domestic Cloudera, MapR, their commercial products are based on its surrounding ecosystem Hadoop software.

Big data should be how to promote the development of the business?

This is related to the imagination of things, with a large capacity and large computing, as to how to use it to fend for themselves. How the original data do now how to do, but in addition to beer and diapers, chewing gum and condoms, there is a more vivid case is about is this: The United States has a company, every mile a plug in the grain-producing areas sensor, air humidity and soil collecting data such as the amount of nitrogen. After collecting up the process by means of big data and algorithms to predict a harvest in the region might be what happens, and then sold to the forecast of US agricultural insurance company.

Hadoop what advantages and disadvantages?

Hadoop advantage is that capacity and computing power and the backup data of the data security has improved significantly, can support 1.0 to about 4,000 and parallel computation storage server and can support 6000 about 2.0 servers. 2.0 But now is not perfect, so the production environment is recommended to use 1.0. I think the capacity and computing power of a cluster 4000 to rival IBM's mainframe, the mainframe downtime events from the Bank of China last year, number of December 15 run. Even then mainframe security is guaranteed, after all, is also a single point. Really broke down, who did not dare clappers switched to the backup mainframe. Hadoop 1.0've got a lot of single-point plan to solve the problem, and that they support a single point of 2.0 failover. Perhaps in the future to continue to develop, will be fully beyond the mainframe. In fact, IBM has begun its own Hadoop distribution up.

On the downside, there is still a single point is Hadoop1.0 problem, but can be offset by other technical means to do hot swap, but require a higher skill level of maintenance. Another drawback is that computing time will be longer, can not do real-time query response and quick decision making. But there are many other programs to make up for this problem Hadoop, like Apache out of the competition with Google Dremel Drill, Cloudera launch of Impala, and other products. The real-time calculation, there are open source Twitter Storm cluster with Hadoop design concept is the same, but can be calculated for real-time data streams, and instantly generate results. To do with the investigation out with.

With the support of various open source community, with the joint efforts of programmers around the world, the ability to handle large data also rapid development, programmers are working with their own wisdom transform this world.

Guess you like

Origin blog.51cto.com/14296550/2418869