Data Analysis and Visualization Overview

Table of contents

1. Data Analysis

2. Data visualization

3. Commonly used software for data analysis and visualization

1.Microsoft Excel

2. R language

3. Python language

4.SAS Enterprise Miner

5.SPSS

6. Dedicated visual analysis tools

4. Python data analysis and visualization commonly used class library

1.Numpy

2.Scipy

3.Pandas

4.Matplotlib

5.Seaborn

6.Scikit-learn


1. Data Analysis

Data analysis refers to the process of using appropriate statistical analysis methods to analyze a large amount of collected data, extract useful information and form conclusions, so as to study and summarize the data in detail.
Data mining (Data Mining) refers to the process of mining potential value from a large number of incomplete, noisy, fuzzy and random practical application data by applying techniques such as clustering, classification regression and association rules.

Data analysis can be divided into narrow sense and broad sense. Data analysis in a narrow sense refers to processing and analyzing the collected data by means of analysis methods such as comparative analysis, cross analysis and regression analysis according to the purpose of analysis, extracting valuable information, giving full play to the role of data, and obtaining a characteristic statistical result the process of. In a broad sense, data analysis refers to the use of basic exploration, statistical analysis, deep mining and other methods to find useful information and unknown laws and patterns in the collected data, and then provide theoretical and practical basis for the next step of business decision-making. . In other words, generalized data analysis, in addition to narrow data analysis, also includes the part of data mining.

2. Data visualization

Data visualization is one of the key technologies of data analysis and data science, which encodes data or information into graphics or images, allowing the use of graphics and images for computer vision and user interface, user interface and display of three-dimensional surface properties and animations to data be explained visually.
The process of data visualization analysis includes data processing, visual coding and visualization generation. Data processing focuses on data collection, cleaning, preprocessing, analysis and mining; visual coding focuses on receiving optical images, extracting information, processing transformation, pattern recognition and storage display; visualization generation focuses on converting data into graphics and Do interactive processing.

3. Commonly used software for data analysis and visualization

1.Microsoft Excel

Excel is a commonly used office software that can process various data, perform statistical analysis and assist decision-making operations, and is widely used in many fields such as management, statistics, and finance.

2. R language

The R language is a language and operating environment for statistical analysis and drawing developed by Ross Ihaka and Robert Gentleman of the University of Auckland in New Zealand. It is a free, free and open source software belonging to the GNU system. and an excellent tool for statistical mapping.

3. Python language

Python was invented by the Dutchman Guido van Rossum in 1989. It is an easy-to-learn programming tool. The code it writes has the advantages of simplicity, readability and maintainability. It has a very rich third-party Modules, users can use these modules to complete work tasks in data science, such as Numpy, Pandas, Matplotlib, Seaborn, etc.

4.SAS Enterprise Miner

SAS Enterprise Miner is a general data mining tool. It integrates statistical analysis system and graphical user interface, and organically integrates data storage, management, analysis and display. It has powerful functions, complete statistical methods, and easy operation. Flexible features.

5.SPSS

SPSS is the earliest statistical analysis software in the world. It encapsulates advanced statistics and data mining technology to obtain predictive knowledge, and deploys corresponding decision-making solutions to existing business systems and business processes, thereby improving the benefits of enterprises.

6. Dedicated visual analysis tools

Currently commonly used professional visual analysis tools include Power BI, Tableau, Gehpi, Echarts, etc.

4. Python data analysis and visualization commonly used class library

1.Numpy

The Numpy package is the main force of data analysis, machine learning and scientific computing in the Python ecosystem. It greatly simplifies the operation and processing of vectors and matrices. In addition to slice and dice numerical data , using Numpy can also bring great convenience for processing and debugging advanced examples in the above libraries.

2.Scipy

SciPy is an advanced module developed based on Numpy. It provides the implementation of many mathematical algorithms and functions, which can easily solve some standard problems in scientific computing. It includes various functional modules for common problems in scientific computing. Different sub-modules are suitable for different application.

3.Pandas

Pandas is a tool based on Numpy, which provides a large number of functions and methods for convenient data processing. The main data structures in Pandas are Series, DataFrame and Panel. Among them, Series is a one-dimensional array, DataFrame is a two-dimensional tabular data structure, and Panel is a three-dimensional array, which can be regarded as a container of DataFrame.

4.Matplotlib

Matplotlib is a drawing library for Python. It is a desktop drawing package for generating publication-quality graphics. It can be used with Numpy to provide an effective open source alternative to MATLAB. It can also be used with the graphics toolkit. Easily visualize data.

5.Seaborn

Seaborn provides a high-level interface for drawing statistical graphics based on Matplotlib, which provides great convenience for the visual analysis of data and makes drawing easier.

6.Scikit-learn

Scikit-learn is a Python open source framework specifically for machine learning. It implements various mature machine learning algorithms and is easy to install and use. Its basic functions include classification, regression, clustering, data dimensionality reduction, model selection and data pre-processing. Deal with six major parts.


Reference books:

[1] Wei Weiyi, Li Xiaohong, Gao Zhiling. Python Data Analysis and Visualization. Tsinghua University Press

Guess you like

Origin blog.csdn.net/m0_64087341/article/details/123451608