Big data weekly meeting - summary of learning content this week 018

Meeting time: 2023.06.18 15:00 offline meeting

01【Research-Data Analysis (Quality, ETL, Visualization)】

ETL , the abbreviation of Extract-Transform-Load in English , is used to describe the process of extracting ( extract ), transforming ( transform ), and loading ( load ) data from the source to the destination. The term ETL is more commonly used in data warehouses , but its objects are not limited to data warehouses.

Data analysis is the process of collecting, cleaning, organizing and interpreting data to extract valuable information and insights. In data analysis, there are several important aspects to consider, including data quality, ETL (extract, transform, and load), and visualization.

  1. Data Quality: Data quality is the measure of ensuring that data is accurate, complete, consistent, and reliable. In data analysis, data quality is crucial to draw accurate conclusions and make correct decisions. Here are some common ways to ensure data quality:
    1. Data cleaning: remove duplicates, null values, outliers and erroneous data in the data.
    2. Data Validation: Checks whether data conforms to pre-defined rules and constraints.
    3. Data Integration: Integrate data from different sources to ensure data consistency and integrity.
    4. Data Review: Assess data accuracy, completeness, and consistency, and address potential data quality issues.
  2. ETL (Extract, Transform and Load): ETL refers to the process of extracting data from various sources (such as databases, log files, API, etc.), performing necessary transformation and cleaning, and then loading it into the target system for analysis. Following are the main steps of the ETL process:
    1. Data Extraction: Extracting data from various sources, usually using methods such as queries, API calls, or file imports.
    2. Data transformation: Cleaning, integrating, transforming and normalizing the extracted data to meet the needs of analysis.
    3. Data Loading: Load the transformed data into the target system (such as data warehouse, data lake, etc.) for further analysis and visualization.
  3. Visualization: Visualization is the process of transforming data into charts, graphs, and other visual elements to better understand and communicate patterns, trends, and insights in the data. Here are some common methods and tools for visualization:
    1. Charts and graphs: Use chart types such as column charts, line charts, pie charts, scatter charts, etc. to present data.
    2. Dashboard: Provides a comprehensive and real-time visualization of data by combining multiple charts and indicators.
    3. Data visualization tools: Tools such as Tableau, Power BI, Matplotlib, D3.js, etc. provide rich visualization functions and interactivity to help users better explore and interpret data.

To sum up, the data analysis process involves ensuring data quality, conducting ETL , and performing data visualization to obtain accurate, reliable, and meaningful results and insights.

Introduction and comparison of three commonly used tools for ETL Datastage, Informatica and Kettle

Introduction and comparison of three commonly used tools for ETL Datastage, Informatica and Kettle

1.1 [Flowchart]

1.2 [Structure Diagram]

1.3 [Usage scenarios]

1.4 [Technical Architecture]

02【fhzn project】

2.1 [es multi-dimensional search interface]

Multi-dimensional retrieval scheme design, es multi-condition query interface.

Interface writing, git submit code.

2.2 [AI Algorithm Library Data Arrangement]

In the first phase of the task, refer to two recommended documents to abstract an overall structure in the AI ​​field, including different types of algorithms and commonly used algorithms contained in it. Here you can refer to the summary divided into four parts.

  1. Part 1: An overall description and description of the development process of the AI ​​field, from the expert system at the beginning to the neural network behind it.
  2. The second part: Explain some concepts or terminological explanations in the field of AI, such as training, loss, evaluator optimizer and other concepts.
  3. The third part: at the theoretical level, including common algorithms for tasks such as regression and classification.
  4. The fourth part: List different hot fields of AI in the current industrial field, such as NLP, vision field, and then subdivide it, list some more detailed directions and common algorithms in each direction, first put the overall database skeleton Abstract it out and then fill in the content.

Neural Network and Deep Learning-Qiu Xipeng.pdf

03【Patent】

fastDFS, round robin mechanism,

04【Learning content】

4.1【big】

  1. Shang Silicon Valley Big Data Flink1.17 Practical Tutorial-Notes 01 [Flink Overview, Flink Quick Start]
  2. Shang Silicon Valley Big Data Flink1.17 Practical Tutorial - Note 02 [Flink Deployment]

Guess you like

Origin blog.csdn.net/weixin_44949135/article/details/131276349