Big data weekly meeting - summary of learning content this week 014

Meeting time: 2023.05.21 15:00 offline meeting

Table of contents

01【fhzn project】

02【Apache NiFi】

03【Data Collection-Research】

3.1 [Data collection, flowchart, architecture diagram, usage scenarios]

3.2 [Common data acquisition technologies and their implementation methods]

3.3 [Data collection technology that can be used in college entrance examination big data technology]

04【Patent】


01【fhzn project】

  1. GitLab
  2. MyBatis-Plus

02【Apache NiFi】

  1. Case 1: Synchronize files. Requirements: Synchronize local disk files and upload them to hdfs, and automatically monitor disk files through NIFI and upload them to the corresponding folders of hdfs.
  2. Case 2: Synchronize mysql data to hdfs offline. Requirement: export Mysql data and convert it to Json string and save it to hdfs.
  3. Case 3: Real-time monitoring of kafka data to hdfs, requirements: real-time monitoring of kafka topics, and synchronously sending data to hdfs.

03【Data Collection-Research】

Share "data collection", flow charts, architecture diagrams, usage scenarios, five minutes.

3.1 [Data collection, flowchart, architecture diagram, usage scenarios]

Data collection flow chart :

  1. Identify data needs and goals.
  2. Select a data source.
  3. Design data collection schemes and techniques.
  4. Implement a data collection plan to obtain raw data.
  5. Data cleaning and preprocessing.
  6. Data storage and management.
  7. Data analysis and application.

Data collection architecture diagram :

    +----------------+
    |   数据源       |
    +----------------+
           |
           v
    +----------------+
    |  数据采集工具   |
    +----------------+
           |
           v
    +----------------+
    | 数据清洗和预处理|
    +----------------+
           |
           v
    +----------------+
    |  数据存储系统   |
    +----------------+
           |
           v
    +----------------+
    |  数据分析应用   |
    +----------------+

Common data collection usage scenarios :

  1. E-commerce: Collect sales data, user behavior data, and product information of online stores for market research, recommendation systems, and personalized marketing.
  2. Social media: Collect user-generated data from social media platforms (such as Twitter and Facebook) for sentiment analysis, social network analysis, and public opinion monitoring, etc.
  3. Internet of Things: Collect real-time data generated by sensors for monitoring and control of Internet of Things applications and smart devices.
  4. Financial field: Collect financial data from institutions such as financial markets, exchanges, and banks for risk management, investment analysis, and trading decisions.
  5. Healthcare: Collect medical records, biosensor data, and health device data for disease prediction, patient monitoring, and medical research.
  6. Education field: Collect student learning data and school management data for student performance evaluation, teaching optimization, and education policy formulation.

These scenarios are just some examples of data collection, and the range of practical applications is very wide. According to specific business needs, you can design a suitable data collection process and architecture, and use appropriate tools and technologies to realize data collection.

3.2 [Common data acquisition technologies and their implementation methods]

In the field of big data, various technologies and methods are used for data collection , and which technology to choose depends on the source of data and collection requirements. The following are several common data acquisition techniques and how to implement them:

  1. Web crawler: A web crawler is an automated program used to extract information from web pages on the Internet. They collect data by sending HTTP requests and parsing the returned HTML pages. Crawlers can be implemented in a programming language such as Python, using libraries such as BeautifulSoup or Scrapy to parse and extract page data.
  2. Database connectivity: When data is stored in a structured database, database connectivity techniques can be used for data acquisition. By establishing a connection to the database, query statements can be executed to extract the required data. Common database connection methods include using libraries such as Java's JDBC (Java Database Connectivity) or Python's SQLAlchemy.
  3. File import: When the data exists in the form of files, data collection can be performed by file import. For example, CSV (Comma Separated Values) files, Excel files, or text files, etc. You can use the file operation functions provided by various programming languages ​​to read the file content, and extract the data into the memory for subsequent processing.
  4. API calls: Many applications and services provide APIs (application programming interfaces) to access their data. Data can be obtained from these applications or services in a structured manner by calling APIs. Usually, the API provides a set of HTTP request methods (such as GET, POST, PUT, DELETE, etc.), and the required data can be obtained by sending the request and parsing the response. Depending on the type and specification of the API, various programming languages ​​and libraries can be used to make API calls.
  5. Sensor data acquisition: In IoT and sensor networks, a large number of sensors can generate real-time data. These sensors can measure various environmental parameters such as temperature, humidity, pressure, etc. Data acquisition can be achieved by physically connecting to the sensor, or by using the network interface provided by the sensor. The collected data can be directly stored in the database, or passed to the subsequent processing system through the message queue.
  6. Log file analysis: Many systems and applications generate log files that contain various runtime information and event records. By analyzing these log files, useful data can be extracted from them. Data collection can be achieved by regularly checking log files and analyzing key information in them, or by using specialized log analysis tools.

In general, data collection techniques and implementations vary, depending on the source and format of the data. Common techniques include web crawlers, database connections, file imports, API calls, sensor data collection , and log file analysis , among others. According to the specific needs, select the appropriate technology and tools for data collection, and ensure that the collected data meet the requirements of subsequent analysis and processing.

3.3 [Data collection technology that can be used in college entrance examination big data technology]

In the college entrance examination big data project, the following technologies can be used for data collection:

  1. Official data sources: Relevant data such as college entrance examination scores are usually provided by education departments or admissions examination institutions. You can obtain data directly from these official agencies and may need to apply for access or cooperate with them. Official data sources are the best way to obtain accurate and authoritative data.
  2. Web crawler: If the official data source does not provide external access or the way to obtain data is inconvenient, you can use web crawler technology to collect data from relevant education websites or admissions examination websites. By crawling the score information and examinee data on the webpage, you can obtain college entrance examination data within a certain range.
  3. API calls: Some education departments or admissions examination institutions may provide APIs to access and obtain data. You can check out their open API documentation to learn how to use the API for data collection. Through the API call, the relevant data of the college entrance examination can be obtained in a structured way.
  4. Partner data sharing: Establish partnerships with schools, educational institutions or other educational data providers to obtain more comprehensive and accurate college entrance examination data. These partners may have broader data sources and more detailed data content, and be able to provide deeper data analysis.
  5. School system data extraction: Cooperate with various schools to extract college entrance examination-related data from their educational affairs management systems or student information systems. This requires establishing a data sharing agreement with the school and ensuring compliance and privacy protection of data collection.

No matter which data collection technology you choose, you need to abide by relevant laws, regulations and privacy policies to protect personal information and data security. When collecting data, it is necessary to carefully handle and store sensitive information, and maintain cooperation and communication with data providers or relevant institutions to ensure the accuracy and reliability of data.

04【Patent】

Smart Backend and Architecture

Data analysis, data visualization, data…

Guess you like

Origin blog.csdn.net/weixin_44949135/article/details/130791135