Big data processing There are five key technologies, specifically refers to what?

Big Data technologies, is quick access to valuable technical information from various types of data. Large data field has emerged a large number of new technologies, they become a major data collection, storage, processing and rendering a powerful weapon. Large data processing key techniques generally include: large data acquisition, large data preprocessing, large data storage and management of large data analysis and mining, large data presentation and application (large data retrieval, large data visualization, large data application, large data security Wait).

image.png

A large data acquisition technology

Data acquisition is structured by means of various types of RFID radio frequency data, sensor data, social network data interaction data and mobile Internet, etc. obtained, semi-structured (or so-called weakly structured) and unstructured mass data , fundamental knowledge data service model. Key to break the high-speed distributed crawling or reliable data acquisition, high-speed data-wide image and other large data collection techniques; breakthrough high-speed data parsing, conversion and loading and other large data integration technology; design quality assessment model, the development of data quality technology.

Data acquisition is generally divided into large large data intellisense layer: includes data sensing system, network communications system, the sensor system is adapted, identification system and the hardware resource access system, structured, semi-structured, non- intelligent identification, location structured mass data, tracking, access, transmission, signal conversion, monitoring, preliminary processing and management. Capture must focus for large intelligent recognition data source, perception, adaptation, transmission and access technologies. Base support layer: providing a virtual server services required for big data platform, database and physical network resources with structured, semi-structured and unstructured data and other infrastructure supporting environment. Focusing on capturing distributed virtual storage technology, big data acquisition, visualization interface technology to store, organize, analyze and decision-making operations, and network transmission of large data compression technology, big data privacy protection technology.

Second, the large data preprocessing

Discriminating main complete received data, extracting, washing and other operations.

1, drawn: because of acquired data may have various structures and types, data extraction process may help us to these complex data into a single configuration or to facilitate the process, in order to achieve rapid analysis process.

2, cleaning: big data, not all valuable, some data is not what we are concerned, while others are completely false data interference term, and therefore the data "de-noising" through the filter to extract the valid data.

Third, the large data storage and management technology

Large data storage and management use memory to store data collected up to establish the appropriate database, and manage and calls. Focused on solving complex structured, semi-structured and unstructured big data management and processing technology. Mainly to solve big data can be stored and can be expressed, can handle several key issues such as reliability and efficient transmission. Development of reliable distributed file system (DFS), to optimize the energy efficiency of storage, computing into the store, go big data redundancy and cost-efficient big data storage technology; breakthrough large distributed non-relational data management and processing technology, different data structure of data fusion, data organization techniques large data modeling techniques; break large data indexing techniques; break large data movement, backup, replication technology; the development of large data visualization techniques.

Development of new database technology, database into a relational database, non-relational databases and database caching system. Wherein the non-relational databases refers primarily NoSQL database, is divided into: the key database, the database storage column, the type of document databases and survival database. Relational database contains the traditional relational database systems and NewSQL database.

Development of big data security technology. Improved data destruction, transparent encryption and decryption, distributed access control, data auditing technology; breakthrough privacy and inference control, identification and forensic data authenticity, data integrity verification and other technologies hold.

Fourth, large data analysis and mining technology

Big data analysis techniques. Improved existing data mining and machine learning techniques; Development Network Data mining, excavation group specific, FIG mining new data mining techniques; break connection object based on the data, the similarity data connections large fusion technology; breakthrough user interest analysis, network behavioral analysis, large data fields for emotional semantic analysis mining technology.

Data mining is from a large number of incomplete, noisy, fuzzy, random data in practical application, extract hidden in them that people do not know in advance, but is potentially useful information and knowledge. Many data mining technology involved, there are a variety of classifications.

The mining tasks can be divided into classification or prediction models we found summary data, clustering, association rules, sequential patterns, dependency or dependency models found discovery exceptions and trends and the like;

The objects can be divided into mining relational databases, object-oriented database, database space, temporal database, the source text data, multimedia databases, heterogeneous databases, legacy databases and the Web Web;

According to mining points, it can be roughly divided into: machine learning methods, statistical methods, neural network and database methods. Machine learning, can be broken down as follows: inductive learning methods (decision trees, rule induction, etc.), learning-based paradigm, genetic algorithms. Statistical methods can be broken down as follows: regression analysis (multiple regression, autoregressive, etc.), discriminant analysis (Bayesian classifier, Fisher discrimination, non-parametric identification, etc.), cluster analysis (hierarchical clustering, dynamic clustering etc.), exploratory analyzes (principal component analysis, correlation analysis, etc.). Neural network approach can be broken down as follows: forward neural network (BP algorithm), self-organizing neural network (SOM competitive learning, etc.). The method is mainly a database or OLAP multidimensional data analysis methods, in addition to a method for the induction properties.

From the perspective of mining methods and mining tasks, focusing on breakthrough:

1, visual analysis. Data visualization both for ordinary users or data analysts, are the most basic functions. The image of the data allows data to speak for itself, allowing users to feel the results intuitive.

2, data mining algorithms. The image of a machine language translation posters, data mining is the mother tongue of the machine. Segmentation, clustering, outlier analysis as well as a wide variety of algorithms Let us refine a wide variety of data mining value. These algorithms must be able to cope with large amounts of data, but also has a high processing speed.

3, predictive analysis. Predictive Analytics allows analysts to make some forward-looking judgment based on the results of image analysis and data mining.

4, semantic engine. Semantic engine to be designed to have sufficient AI sufficient active extract information from the data. Language processing technology, including machine translation, sentiment analysis, public opinion analysis, intelligent input, question and answer system.

5, data quality and data management. Data quality and management is the best practice management processes data through standardized processes and machines can ensure a preset quality of analytical results.

Five large data presentation and application technology

Big Data technologies can be hidden in vast amounts of data in the information and knowledge excavated, provide the basis for human social and economic activities, thereby enhancing operational efficiency in all areas, greatly improving the overall socio-economic intensification.

In our country, we will focus on large data used in the following three areas: business intelligence, government decision-making, public services. For example: business intelligence technology, government decision-making technology, telecommunications, information processing and data mining technology, information processing and network data mining technology, meteorological information analysis, environmental monitoring technology, police cloud applications (road monitoring, video surveillance, network monitoring, intelligent transportation, anti telecommunications fraud, such as dispatching and public security information system), a large-scale gene sequence analysis and comparison technology, Web information mining technology, multimedia data-parallel processing technology, video production rendering technology, cloud computing various other industries and mass data processing application technology.

View more articles:

I did not want to learn the basis of large data difficult?

Big Data learning portal, you have to master these skills

Three major technical direction of the large data field

Self Big Data where to start

How big data professional future employment prospects?

You teach compulsory three big data skills, quickly recorded

Guess you like

Origin blog.csdn.net/kangshifu66/article/details/93782965