Commonly used Python libraries for deep learning (core libraries, visualization, NLP, computer vision, deep learning, etc.)

(1) Core libraries and statistics: Numpy , Scipy, Pandas, StatsModels.

(2) Visualization: Matplotlib , Seaborn, Plotly, Bokeh, Pydot, Scikit-learn, XGBoost/LightGBM/CatBoost, Eli5.

(3) Deep learning : Tensorflow, PyTorch, Keras.

(4) Distributed deep learning: Distkeras / elephas/spark-deep-learning.

(5) Natural language processing: NLTK, SpaCy, Gensim.

(6) Data capture: Scrapy.

1. Core library and statistics

  • Numpy:  Let’s start with the scientific application library, NumPy is one of the main packages in this field. It is designed to handle large multidimensional arrays and matrices, and an extensive set of advanced mathematical functions and implemented methods make it possible to perform a variety of operations using these objects. There have been many updates to NumPy during the year. In addition to bug fixes and compatibility issues, key updates include the printing format of NumPy objects. Additionally, certain functions can now handle files in any encoding available in Python.
  • SciPy:  Another core library for scientific computing is SciPy . It is based on NumPy and extends its functionality. The main SciPy data structure is again a multidimensional array, implemented by Numpy. This package contains tools that help solve tasks in linear algebra, probability theory, integral calculations, and more . SciPy can adapt to different operating systems. This year, Scipy has brought many function updates, especially the optimizer. In addition, many new BLAS and LAPACK functions are encapsulated.
  • Pandas:  Pandas provides advanced data structures and various analysis tools. A great feature of this package is the ability to convert fairly complex data operations into one or two commands. Pandas includes many built-in methods for grouping, filtering, and combining data, as well as time series functionality. Pandas updates throughout the year include hundreds of new features, bug fixes, and API changes.
  • StatsModels:  Statsmodels is a method commonly used for statistical data analysis, such as statistical model estimation, performing statistical tests, etc. With its help you can implement many machine learning methods. This library is constantly being updated. This year brings time series improvements and new count models, namely GeneralizedPoisson, Zero-Inflated Model and NegativeBinomialP, as well as new multivariate methods - Factor Analysis, MANOVA and Repeated Measures in ANOVA.

2. Visualization

  • Matplotlib: Matplotlib is a low-level library for creating two-dimensional charts and graphs . With its help, you can build all kinds of charts, from histograms and scatter plots to non-Cartesian coordinate plots. Additionally, many popular plotting libraries are designed to be used with matplotlib. Colors, sizes, fonts, legend styles, etc. all change. For example, automatic alignment of axis legends and more friendly color matching.
  • Seaborn : Seaborn is essentially a higher-level API based on the matplotlib library. It contains a rich visualization library, including complex types such as time series, joint plots and violin plots (showing data density distribution). seaborn updates mainly include bug fixes. Additionally, compatibility between FacetGrid or PairGrid and the enhanced interactive matplotlib backend has been improved, adding parameters and options for visualization.
  • Plotly: Plotly is a popular library that allows you to build complex graphics easily. This package is suitable for interactive web applications. Its visualizations include contour graphics, ternary plots and 3D charts . Updates to the library this year include support for "multi-link views" as well as animation and crosstalk integration.
  • Bokeh: The Bokeh library uses JavaScript widgets to create interactive and scalable visualizations in the browser. The library provides a variety of interactive capabilities in the form of graphics, styles and link diagrams, defined callbacks and many more useful features. Bokeh can provide improved interactive features such as rotation of categorical tick labels, as well as small zoom tools and custom tooltip field enhancements.
  • Pydot: Pydot is an interface to Graphviz, written in pure Python. With its help, it is possible to display the structure of the graph, which is often used when building neural networks and algorithms based on decision trees.

3. Machine learning

  • Scikit-learn: This Python module based on NumPy and SciPy is one of the best libraries for working with data. It provides algorithms for many standard machine learning and data mining tasks such as clustering, regression, classification, dimensionality reduction and model selection. Updates to this library this year include: modifications to cross-validation that provide the ability to use multiple metrics; and some minor improvements to several training methods such as nearest neighbor and logistic regression.
  • XGBoost/LightGBM/CatBoost: Boosting is one of the most popular machine learning algorithms and consists in building a collection of basic models, namely decision trees . Therefore, there are specialized libraries designed to implement this method quickly and easily. XGBoost , LightGBM and CatBoost deserve special attention. These libraries provide highly optimized, scalable and fast gradient boosting implementations, which makes them very popular among data scientists and Kaggle competitions.
  • Eli5: Often, the results predicted by a machine learning model are not entirely clear, and the eli5 library helps solve the problem. It is a package for visually debugging machine learning models and tracking the working process of the algorithm step by step. It is compatible with scikit-learn , XGBoost, LightGBM, lightning and sklearn-crfsuite libraries.

4. Deep learning

  • TensorFlow: TensorFlow is a popular deep and machine learning framework developed by Google Brain. It provides the ability to use artificial neural networks with multiple data sets. The most popular TensorFlow applications include object recognition, speech recognition, etc. This library is very fast in new versions, introducing new features and functionality. The latest fixes include potential security vulnerabilities and improved TensorFlow and GPU integration, such that you can run Estimator models on multiple GPUs on a single computer.
  • PyTorch: PyTorch is a large-scale framework that allows performing tensor computations using GPU acceleration, creating dynamic computational graphs and automatically calculating gradients. On top of this, PyTorch provides a rich API for solving applications related to neural networks. The library is based on Torch, an open source deep learning library implemented in C with a wrapper in Lua. The Python API was launched in 2017, and since then, the framework has grown in popularity and attracted more and more data scientists.
  • Keras: Keras is a high-level library for processing neural networks, running on top of TensorFlow and Theano. Now it can also use CNTK and MxNet as backend. It simplifies many specific tasks and greatly reduces the amount of monotonous code. However, it may not be suitable for some complex things. The library has certain improvements in performance, usability, documentation and API. Some new features are Conv3DTranspose layer, new MobileNet application and self-normalized network.
  • Dist-keras/elephas/spark-deep-learning: Such large amounts of data can be processed more easily using distributed computing systems like Apache  Spark , which again expands the possibilities of deep learning. Therefore, dist-keraselephas and spark-deep-learning are developing rapidly. These packages can train neural networks directly based on the Keras library with the help of Apache Spark. Spark-deep-learning also provides tools for creating pipelines using Python neural networks.

5. Natural language processing

  • NLTK: NLTK is a set of libraries that is a complete platform for natural language processing. With the help of NLTK, you can process and analyze text in various ways, tag it, extract information, etc. NLTK is also used for prototyping and building research systems . Enchantments of this library include minor changes to the API and compatibility as well as a new interface for CoreNLP.
  • SpaCy: SpaCy is a natural language processing library that includes excellent demos, API documentation, and demonstration applications. This library is written in Cython language, which is a C extension of Python. It supports nearly 30 languages, provides simple deep learning integration, and guarantees robustness and high accuracy. Another important feature of SpaCy is that it is designed for entire document processing without dividing the document into phrases.
  • Gensim: Gensim is a Python library for powerful semantic analysis, topic modeling and vector space modeling , built on Numpy and Scipy. It provides implementations of popular NLP algorithms such as word2vec.
  • Scrapy: Scrapy is a crawler library for creating crawlers that scan website pages and collect structured data. Additionally, Scrapy can extract data from APIs. This library happens to be very convenient due to its extensibility and portability.
  •  Tokenizers (developed by Huggingface): state-of-the-art fast tokenizers optimized for research and production.

6. Computer Vision

  • Pillow: PIL (Python Imaging Library) is a free Python programming language library that adds support for opening, manipulating, and saving many different image formats . However, its development stalled and its last version was released in 2009. Fortunately, Pillow is an actively developed fork of PIL that is easier to install, runs on all major operating systems, and supports Python 3. The library contains basic image processing functions, including point operations, filtering using a set of built-in convolution kernels, and color space conversion.
  • scikit-image: scikit-image is an open source Python package for use with NumPy arrays. It implements algorithms and utilities for research, education, and industry applications. It includes algorithms for segmentation, geometric transformations, color space operations, analysis, filtering, morphology, feature detection, and more.
  • OpenCV-Python:  OpenCV (Open Source Computer Vision Library) is one of the most widely used libraries in computer vision applications. OpenCV-Python is the Python API of OpenCV. Because the backend consists of code written in C/C++, OpenCV-Python is fast, but it's also easy to code and deploy (thanks to the Python wrapper on the frontend). This makes it an excellent choice for performing computationally intensive computer vision programs. It includes algorithms for object detection, video analysis, and image recognition. 
  • SimpleCV: SimpleCV is another open source framework for building computer vision applications. Allows users to access and manipulate digital images. The library provides a variety of functions for image processing, including filters, morphological operations, color conversion, and edge detection.
  • Mahotas:  Mahotas is another computer vision and image processing library for Python. It contains traditional image processing functions such as filtering and morphological operations , as well as more modern computer vision functions for feature calculation, including interest point detection and local descriptors.

Guess you like

Origin blog.csdn.net/qq_43687860/article/details/132797409