What is data analysis? What is the data analysis process?

1. What is data analysis

Refers to the use of professional statistical analysis methods to analyze a large amount of data, conduct detailed research and summarize, extract valuable information, and form effective analysis conclusions, thereby affecting business decisions

2. The importance of data analysis

Everything, if we can't quantify it, we can't really understand it; if we can't understand it, we can't really control it; if we can't control it, we can't really change it.

In the era of big data, the human brain cannot comprehend the complexities, but data analysis can interpret the meaning; in the face of unknown factors that are difficult to control, data analysis can predict the laws.

Data analysis can make up for our overconfidence in intuition, thinking about problems and making decisions more scientifically and rationally.

3. The role of data analysis

Current situation analysis, what happened in the past? Diagnose business conditions such as through descriptive statistics

Cause analysis, why did it happen? For example, through analysis methods such as dimension disassembly and index disassembly, combined with actual business, to find business abnormal points

Predictive analytics, what might happen in the future? For example, based on user behavior data, predict whether they are about to churn, and take measures to retain users who are about to churn

4. How to analyze data?

1. Clarify the purpose and thinking of the analysis

Thinking determines the result, it is necessary to clarify the purpose of data analysis, form a clear thinking framework, and avoid analysis for the sake of analysis

2. Data Collection

To collect relevant data sets based on analysis purposes, most of which are internal data of the company, and may also involve external data

Relational management database (RMDB, using SQL language to fetch data), data warehouse (WareHouse, using HiveSQL to fetch data)

File: excel, csv, txt, etc.

System/platform: manual export, python automation scripts such as selenium

Internet: Web crawlers

API: requests request library, parsing json files, etc.

3. Data cleaning

Organize the data into a structure and format that is neat and clean and conducive to subsequent analysis. The data may be scattered, and various data sets need to be integrated

Handling of outliers, error values, and missing values

Field splitting, merging, information extraction, format conversion, etc.

Table association: left, right, outer (full), inner join, Cartesian product table, etc. (left half, left anti join, etc.)

Table structure conversion: row to column (long table to wide table), column to row (wide table to long table, etc.), row and column transposition, data pivot (reverse pivot)

4. Data analysis

Need to master common analysis methods and machine learning algorithms

Basic analysis methods: composition analysis, comparative analysis, group analysis, cross analysis, trend analysis, etc.

Advanced analysis methods: linear regression, logistic regression, decision tree, random forest, clustering and other algorithms

5. Data Visualization

Present the analysis point of view in the form of a graph

Words are not as good as a table, a table is not as good as a picture, a picture is worth a thousand words

Basic statistical charts: pie charts, bar charts, line charts, scatter charts, radar charts, funnel charts, etc.

Professional statistical charts: histograms, heat maps, boxplots, violin plots, kernel density estimation maps, etc.

6. Data analysis report

Summarize important analysis conclusions and findings into PPT to form a complete data analysis report

Pyramid structure, total score total form

Conclusion first, top-down, inductive grouping, logical progression

The structure is clear, the hierarchy is clear, the key points are highlighted, and the main points are clarified

7. Data application

Apply feasible proposals to actual business scenarios and solve the company's actual business problems

Provide data support for business decision-making and realize data-driven business growth

4. Data Analysis Tools

If you want to do your job well, you must first sharpen your tools, and you need to master the mainstream data analysis tools

Excel, a very important foundation

PowerBI/Tableau, a powerful business intelligence BI tool

SQL, the necessary database data query language

Python, the computer language of choice for artificial intelligence

5. How to get started with data analysis

0 basic students, you can refer to this learning route to start learning.

 

Guess you like

Origin blog.csdn.net/JACK_SUJAVA/article/details/129592629