Big Data Lecture 1: Data Flow in Big Data

The article briefly describes the generation, processing and value of data. As a training lecture on big data before sniffing, this relevant knowledge point is a gluttonous feast not to be missed for big data enthusiasts! 

With the rapid development of information technology , the application of big data has penetrated into all areas of people's lives, and everyone is directly or indirectly in contact with big data, which shows the importance of big data technology.

For workers in the IT industry, the field of big data is both curious and mysterious. While learning with humility, the editor recorded the knowledge points of the internal staff training of Qianxiu in detail. The knowledge points of smell training ~~

(1) Generation of data

     ① web service agreement. The web ( World Wide Web ) is the global wide area network, also known as the World Wide Web, which is a global, dynamic interactive, cross-platform distributed graphic information system based on hypertext and HTTP .

     It is a network service built on the Internet , which provides a graphical, easy-to-access and intuitive interface for browsers to find and browse information on the Internet . The documents and hyperlinks in it organize the information nodes on the Internet into an interactive network. Associated network structure.

     Among them, this protocol includes HTTP-GET , HTTP-POST , SOAP .

     Each protocol consists of a series of HTTP request headers, which together with some other information define what the client is requesting from the server, and on success, the server will respond with a series of HTTP response headers and the requested data.

      ②Sensor data. For example, camera data, such as supermarkets, governments, and enterprises, will place cameras, and the data stored by these cameras is sensor data.

      ③ The data source medium includes barcode, two-dimensional code, and radio frequency code.

      ④ System data includes log data and monitoring data. When the crawler software collects data, the log records the history of the collection process and is used to manage the collected logs.

 

( 2 ) Data processing

       ①Storage . The data storage object includes the temporary files generated during the processing of the data stream and the information to be searched during the processing.

       ②Cleaning . _ It is to clean the junk data in the data, thereby improving the quality of the data.

The crawler software like ForeSpider uses a collection template, collects search engines, mines feature information of the entire network, data collection-mining-rearrangement-cleaning-weight analysis-collection and storage, synchronization is completed, and the function of cleaning is to remove duplication. data and junk data.

 

     ③ excavation. Data mining generally refers to the process of searching for information hidden in a large amount of data through algorithms. When collecting data through crawler software, if it is judged whether the data is needed according to the keywords in the text, it belongs to data mining.

      ④Simulation / learning . Analog data are continuously changing values ​​collected by sensors, such as temperature, pressure, and sounds and images present in telephone, radio, and television broadcasts.

 

 

 

( 3 ) The value of data

     ①Graphs . Collect huge data and display it intuitively in the form of charts. The data files exported by ForeSpider are csv files, which can be opened with excel , or unstructured data such as pictures, files, videos, and reports can be collected.

      ②Prediction includes model and guiding significance.

 

 

 

       The models include deterministic models and probabilistic models. A deterministic model is equivalent to an inevitable event in a probabilistic event, and a probabilistic model is equivalent to a probabilistic event.

       The guiding significance is equivalent to the application of data, such as autonomous driving. The application field of big data is very wide. For example, during the time when Xiaobian was working in Qianxiu, he came into contact with various customers who collected data and needed Taobao e-commerce data, or Information about government bids, or data from news websites, etc.

   

 

In a word, the field of big data is both mysterious and attractive. As an internal benefit of the company, the editor will also regularly share the results with you ~~~ Let us follow the friends from Qianxiu to start an adventure in the field of big data!

 

 

Front Sniffing Big Data —— Deep Big Data Expert

Forenose ( www.forenose.com ) is the first deep big data expert.

Provide data collection - analysis - processing - management - marketing - application,

A full set of big data products with independent intellectual property rights.

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325340435&siteId=291194637