Future of Big Data description super artificial intelligence

In Baidu large data open meeting, to engage in Huai Jinpeng president's speech to the academic computer is like a lose listeners of all, pregnant with the principal academic lectures to everyone confused foggy, the owners get dizzy, field be able to understand the absolute minority, probably you will feel a bit like a pregnant principals aliens generally in that self-serving speech. As a once those interested in artificial intelligence research, but missed and computer science graduates is heard the more people are excited about, somewhere it seems to have found the possibility of future artificial intelligence could arrive, then I now try to conceive professor speech conversion as we can understand the language of it.

First, understand big data

1, features four current big data: large-scale, rapid change, the kind of complex, low-value density.

In fact, this is very simple to understand, we look at Sina Weibo Big Data, why realization so difficult to know, Sina Weibo has Pang Dahai large amount of user data, but these cash but struggling behavioral data, the reason is that microblogging data generated enough vertical, involves a very wide, and can be associated with the value of the business is even more difficult to dig up.

2, industry achievement

Pregnant cited three principal content, Baidu and Google are familiar user browsing behavior, thereby providing personalized search. Amazon Taobao user shopping habits because of the familiarity, can provide accurate preference items for the user. Microblogging and twitter understand user habits of thought and social cognition, emotion and other people can provide data for the state and businesses.

Second, the large data to large data calculation

1, large data expansion, how to deal with algorithms and data? The above-mentioned purpose is achieved by changing the algorithm through the data, but in the real process data is still unable to do so efficient after all operations of the CPU bottleneck machine out there, in essence, the algorithm engineers do it is in now under some conditions of operation, design the optimal solution, to obtain the best results.

2, large data expansion, how to solve the search problem? The traditional algorithms when searching data is no problem, because the amount of data is small, but when the massive data growth issues will be prominent, with the original algorithm to calculate is certainly not, according to the current fastest search speed hard disk (60GPS ), the linear scan data End 1PB (15 10TB power) is required 1.9 days time, when the mass data expansion, data processing algorithm must do the reconstruction strategy. Baidu currently processing volume day 10PB of data processing web page, which includes the operation and read, considered to be the best of the algorithm.

The pregnant headmaster told us this challenge is, after the expansion of big data, not only to replace the original algorithm approximation algorithm, the same data will also be replaced to approximate data, only two possible force change in existing machines optimal results arrived at in the case of computing power.

The same is easier said than done, in changing this approximation algorithms and approximate data, in the end this approximation to what extent, to be able closest to the result of the original algorithm? You know, in the computer world, the difference between the loss of his thousands of miles, to change the amount may be small, but if once error correction, it will cause huge errors result, people understand a little program knows few lines of code regardless of CPU power to make a lot of powerful computer completely crashes, but the search engine is even more a more extensive trial and error engineering.

Finally, the principal shows two pregnant forefront of academic development, first, the solution is easy to define the kind of problem, easy to find this type of search problems from real-world applications, it will be classified and applied to other practice. Second, the large data in small data processing, looking for a measure of the accuracy of the conversion, which is an approximation of what he said before looking for the data.

Also in big data computing, the president also talked about the three pregnant large data base of operations, said the measure and understand. Because too professional, are sufficient to explain each word with an article to explain, but not necessarily clear where we stand, so here ignored.

Third, the paradigm shift in practice

Big data gives us a paradigm shift in research policy and practice.

1, from the nearest inexact. About this fact well understood, we took the traditional search era, in the era of traditional search, when we went to inquire some information, we need to get is all of the data, but the search engine is completely changed we know this, search engines provide only a few things before, and that some items are fully met our information needs.

In fact, search engines do is a set of fuzzy algorithm, through a series of algorithms to calculate the best result brought before the user, but also to subvert the traditional definition presents the perception of the target on this result, big data era, we are no longer pursuing an absolute goal, but derived from a macro trends to push down some vague imprecise unknown target.

2, from sample to sample a full, big big data determines the characteristic large and, in the tradition of the Church industries we do statistical sampling is the biggest way, such as systematic sampling, stratified sampling, quota sampling ......, these statistical methods It will increasingly cease to exist in the era of big data. Big data information can be statistical data you want everything to statistics, the statistical methods eliminate the industrial age.

3, from a causal association. This led directly to the West has produced astonishing remarks - "theory is dead," which is the "death of God", "Death of Man", "death of the author", "end of history" and "philosophy is dead "another bold remarks after. Previous decision-makers in order to decide an issue, must refer to various theories, which can be reached after the determination of cause and effect, but the era of big data to make decision-making easier, such as large supermarket data may be told in clear charts whenever you rainy day, the supermarket will sell more cake, this time the decision makers do not need to know any theory, any cause and effect, only we need to predict in advance to prepare the cake will be on the line tomorrow, rain in the weather forecast.

Fourth, the big data software engineering

1, how to solve the problem of big data computing support? Said the simple point is, large data processing is not necessarily one or several servers able to get the little things, big data processing requires a huge hardware support, hardware support is bound to be distributed design, how then the top-level system design architecture to meet the high-performance processing of large data? Approximation (Inexact), the increment of (Incremental) and inductive (Inductive) 3I feature of how the meet?

How collaborative hardware and software distributed under the big data, how to avoid the loss of extended treatment failure and energy consumption out of control, these are very big problem head. Challenging in the system design.

2, crowdsourcing whether big data can develop software? This is actually a very crazy idea, I think this interpretation is pregnant with the principal, assuming now we can do all the big data software package development, then the situation should be as follows: The data crawling machine data according to Sina Weibo read, Baidu index data, Baidu post bar of data, after Taobao transaction data, found that users demand curve as well as a variety of emotions, software developers and then develop a set of software models based on these data is then presented to the operator to place in the cloud, and the user then enters the participation of various cloud generated by the software, produced in such a variety of behaviors, so then the machine according to these user behavior, software modeling, planning.

This is an extremely fine tip interactive data mining technology, provided that the resolution algorithm and storage problems, anything is possible. Future of Big Data software will not be an inherent form, but according to a constantly changing data automatically super ecology, product manager may not rely on push, but on the algorithm engineers to push, so that the needs of users of natural exposure, and then to they go to achieve some functionality.

Big Data may be defined at a future stage: the real reduction of the human world, and continued to meet any of our wishes, we had to rely on its decision a few things, and now we rely on it to directly reach what we want to do things, all of our actions have become part of our decision-making.

Authors strongly recommend reading the article:

Big Data engineers must master the open source tools summary

Big Data senior teach you how to read a large data core technology

Top Big Data engineers need to master the skills

8 big factor data, machine learning and artificial intelligence for future development

 

Guess you like

Origin blog.csdn.net/sdddddddddddg/article/details/91348169