The most commonly used python data analysis library

1. pandas library

Between 70% and 80% of a data analyst's day-to-day work involves understanding and cleaning data, aka data exploration and data mining.

Pandas is mainly used for data analysis and it is one of the most used Python libraries. It provides you with some of the most useful tools for exploring, cleaning and analyzing data. Using Pandas, you can load, prepare, manipulate, and analyze all kinds of structured data.

2. Numpy library

NumPy is mainly used to support N-dimensional arrays. These multidimensional arrays are 50 times more robust than Python lists, making NumPy a favorite of many data scientists.

NumPy is used by other libraries like TensorFlow for internal computation of tensors. NumPy provides fast precompiled functions for numerical routines that might be difficult to solve by hand. For better efficiency, NumPy uses array-oriented computation, which can easily handle multiple classes.

3. Scikit-learn library

Scikit-learn is arguably the most important machine learning library in Python. After cleaning and processing data using Pandas or NumPy, it can be used to build machine learning models through Scikit-learn, since Scikit-learn includes a large number of tools for predictive modeling and analysis.

There are many advantages to using Scikit-learn. For example, you can use Scikit-learn to build several types of machine learning models, including supervised and unsupervised models, cross-validate the accuracy of the models, and perform feature importance analysis.

4. Matplotlib library

Matplotlib is a Python library for drawing 2D graphics. It provides a wide range of plotting functions, including line charts, scatter plots, histograms, pie charts, contour plots, 3D plots, and more. Matplotlib is one of the most popular plotting libraries in Python, and it can be used with NumPy to make data visualization much easier.
The main components of Matplotlib include:

  • Figure object: represents the entire figure and can contain one or more subgraphs.
  • Axes object: Represents a subplot, including x-axis, y-axis, axis labels, legend, etc.
  • Axis object: Represents an axis in the graph, including tick marks, tick labels, axis labels, etc.
  • Artist object: Represents various elements in the diagram, such as text, lines, rectangles, etc.

Matplotlib can be used in a variety of ways, including interactive command-line plotting, script plotting, GUI application plotting, and more. It can output image files in various formats, including PNG, PDF, SVG, etc.

Matplotlib also has many available extension libraries, such as Seaborn, ggplot, etc., which provide more advanced drawing functions and more beautiful graphics styles.

5. Seaborn library

Seaborn is built on Matplotlib, a library capable of creating different visualizations.

One of the most important features of Seaborn is the creation of zoomed-in data visualizations. This brings into focus relevant properties that were initially not obvious, enabling data workers to more correctly understand the model.

Seaborn also has customizable themes and interfaces, and provides data visualization effects with a sense of design, which can better report data.

6. Summary of the use of the basic library

NumPy: A Python library for numerical calculations , including a large number of mathematical functions and data structures, such as ndarray array objects, which provide support for vectorized calculations, making it more efficient to process large-scale data.

Pandas: A Python library for data analysis, providing DataFrameand Seriestwo core data structures, which can facilitate data 清洗、筛选、切片、聚合operations, and also support the processing of different data types.

Matplotlib: A Python library for drawing 2D graphics, which can create various types of graphics, such as line charts, scatter plots, histograms, etc. Can be used for data visualization and exploratory data analysis.

Seaborn: A data visualization library based on Matplotlib, which can create various complex graphics, such as heat maps, density maps, violin maps, etc. Can be used for exploratory data analysis and presentation reports.

Scikit-learn: A Python library for machine learning, including many machine learning algorithms, such as regression, classification, clustering, etc., and also provides features such as feature engineering and model evaluation.

7. Other libraries

Built

Gradio lets you create aMachine Learning Model ConstructionandDeploy the web application. It serves the same purpose as Streamlight or Flask, but deploying models is much faster and easier.

The advantage of Gradio lies in the following points:

  • Allows for further model validation. Specifically, different inputs to the model can be tested interactively

  • easy to present

  • Easy to implement and distribute, anyone can access the web application through a public link.

TensorFlow

TensorFlow is used to implementNeural NetworksOne of the most popular Python libraries. It uses multidimensional arrays, also known as tensors, that can perform multiple operations on specific inputs. It can be used to establish various neural network models, such as convolutional neural network, recurrent neural network, etc.

Because it is highly parallel in nature, multiple neural networks and GPUs can be trained for efficient and scalable models. This feature of TensorFlow is also known as pipelining.

Hard

Based on TensorFlow's high-level neural network API, Keras provides an easy-to-use interface to easily build and train various deep learning models. Mainly used to create deep learning models, especiallyNeural Networks. It is built on top of TensorFlow and Theano, which can be used to easily build neural networks. But since Keras uses backend infrastructure to generate computational graphs, it is relatively slow compared to other libraries.

State models

Python library for statistical analysis, including various statistical models and methods, such asLinear regression, time series analysis, hypothesis testingwait.

PyTorch

A Python library for deep learning, developed by Facebook, that can be used to build variousNeural Networksmodel, providing an easy-to-use interface and support for dynamic computational graphs.

XGBoost

for gradient boostingdecision treeThe Python library can be used to solve various regression and classification problems, especially for large-scale data and high-dimensional data.

Guess you like

Origin blog.csdn.net/qq_54015136/article/details/129526747