Python common module (known almost reproduced)

Core libraries and statistics

NumPy 1. (Submission: 17911, contributors: 641)

First introduced the application of scientific library, which can not be ignored NumPy choice. NumPy for working with large multi-dimensional arrays and matrices, and perform various operations through a large number of advanced mathematical functions and implementation methods. NumPy carried out in the past year a number of improvements. In addition to bug fixes and compatibility issues, but also to style the possibility that the formatted print NumPy object.

Recommended Resources:

NumPy numerical foundation courses _ machine learning - laboratory building www.shiyanlou.com icon

SciPy 2. (Submission: 19150, contributors: 608)

Another core library computing science is SciPy. SciPy therefore based on NumPy NumPy extends the functionality. The main data structure is a multidimensional array SciPy implemented by Numpy. These include a number of tools to solve linear algebra, probability theory, integration and other tasks. The main improvements include SciPy, continuous integration to a different operating system, as well as new features and new ways to add. In addition, many new package BLAS and LAPACK functions.

Recommended Resources:

SciPy Scientific Computing Foundation Course _ machine learning - laboratory building www.shiyanlou.com icon

Pandas 3. (submitted: 17144, contributors: 1165)

Pandas is a Python library that provides advanced data structures and a variety of analysis tools, the main feature is the ability to fairly complex data manipulation converted into a two commands. Pandas contains a number of built-in method for a packet, and filtering the combined data, and time series functions. Pandas library has launched a number of new versions including hundreds of new features, enhancements, bug fixes and API improvements. These improvements include the classification and sorting data, application of the method is more suitable for output, and perform a custom action.

Recommended Resources:

Pandas data processing machine learning basic course _ - laboratory building www.shiyanlou.com icon

StatsModels 4. (Submission: 10067, contributors: 153)

Statsmodels is a Python module for statistical model estimation, statistical tests to perform statistical data analysis. Under its help, you can use machine learning methods, various drawing attempts. Statsmodels continuous improvement. Add this aspect of the improvement of the time sequence and a new count model, generalized Poisson i.e., zero and negative binomial model expansion. Also includes a new method of multivariate - factor analysis, multivariate analysis of variance and repeated measures of analysis of variance.

Visualization

Matplotlib 5. (Submission: 25747, contributors: 725)

Matplotlib is a low-level library for creating two-dimensional charts and graphs. Use Matplotlib, you can build a chart histograms, scatter plots, non-Cartesian coordinate diagrams. In addition, many popular graphics library can be used Matplotlib have some improvements in color, size, fonts, legends, etc. in combination with Matplotlib. Appearance comprising automatically align the axis legend; color also make improvements, more friendly for color blindness.

Recommended Resources:

Matplotlib data plotting basic course www.shiyanlou.com icon

Seaborn 6. (Submission: 2044 Contributors: 83)

Seaborn matplotlib library is based on a higher level API. It contains the default settings are better suited for the chart. Moreover, further comprising visually rich library time series like. Seaborn updates include bug fixes. It also includes compatibility FacetGrid and PairGrid enhance the matplotlib back-end interaction, and add parameters and options in the visualization.

Recommended Resources:

Seaborn data visualization foundation courses www.shiyanlou.com icon

Plotly 7. (Submission: 2906 Contributors: 48)

Plotly allows you to easily build complex graphics. Plotly for interactive Web applications. Visualization aspects include contour map, and a three-dimensional ternary diagram of FIG. Plotly constantly adding new images and features of the aspects of animation can also provide support.

Bokeh 8. (Submission: 16983, contributors: 294)

Bokeh library uses JavaScript widgets, create interactive and scalable visualization in the browser. Bokeh collection provides a variety of graphics, styles, and FIG via link, add widgets and other forms defined callbacks enhance the interaction. Bokeh in interactive features has been improved, such as rotation category labels, small zoom tools, and enhanced self-defined tooltip fields.

Pydot 9. (submitted: 169, contributors: 12)

Pydot for generating a directed graph and complex non-directional FIG. It is written in Python Graphviz interface. Use Pydot capable of displaying graphical structure, which is often used in the construction of decision trees and neural network algorithm.

Machine Learning

Learn-Scikit 10. (Submission: 22753, contributors: 1084)

Scikit-learn and SciPy based NumPy Python module, and is a good choice in terms of data processing. Scikit-learn many machine learning and data mining tasks provided algorithms, such as clustering, regression, classification, dimensionality reduction and model selection.

Scikit-learn and SciPy based NumPy Python module, and is a good choice in terms of data processing. Scikit-learn as many machine learning and data mining tasks, providing algorithms, such as clustering, regression, classification, dimensionality reduction and model selection. Scikit-learn has made many improvements, including improved cross-validation, using multiple indicators, sampling and neighboring logistic regression and other training methods are also small improvements. The main update also includes a complete glossary of commonly used terms and API elements, which can help users familiar with Scikit-learn the terminology and rules.

XGBoost 11. / LightGBM / CatBoost (submitted by: 3277/1083/1509, contributors: 280/79/61)

Gradient upgrade (gradient boosting) is one of the most popular machine learning algorithm, which is crucial in the decision tree model, so we need to focus on XGBoost, LightGBM and CatBoost. These libraries are used in the same way solving common problems. These libraries can be more optimized, scalable, and quickly achieve gradient upgrade, so that they are highly sought after in data scientists and Kaggle competition, many of whom won the match with the help of these algorithms.

Eli5 12. (submitted: 922, contributors: 6)

Machine learning models generally predict the results are not particularly clear, then you need to use the eli5. It can be used to visualize and debug the machine learning model, and gradually tracking algorithm operation. Meanwhile eli5 able to support scikit-learn, XGBoost, LightGBM, lightning and sklearn-crfsuite library.

Depth study

TensorFlow 13. The (Submission: 33339, contributors: 1469)

TensorFlow is a popular framework for deep learning and machine learning, the Google brain development. Artificial neural networks can be used TensorFlow plurality of data sets. TensorFlow main applications include object recognition, voice recognition and so on. The new release adds new features. Recent improvements include fixes security vulnerabilities, as well as improved TensorFlow and GPU integration, such as assessment model can run on multiple GPU on a single machine.

Recommended Resources:

TensorFlow depth study basic courses www.shiyanlou.com icon

PyTorch 14. The (Submission: 11306, contributors: 635)

PyTorch is a large frame tensor calculations performed by GPU-accelerated, creating a dynamic and automatically calculate the gradient calculating map. In addition, PyTorch provides a rich API for application-related neural network solutions. PyTorch based Torch, it is open source deep learning library in C language. Python API was introduced in 2017, since then this framework, more and more popular, and attracted a large number of data scientists.

Recommended Resources:

PyTorch depth study basic courses www.shiyanlou.com icon

Keras 15. The (Submission: 4539, contributors: 671)

Keras is a senior library for neural networks can be run with TensorFlow and Theano. Now that the new version, you can also use CNTK and MxNet as the back end. It simplifies many tasks and greatly reduces the amount of code. But the drawback is not suited to handle complex tasks. Keras has improved performance, availability, ie the API documentation aspects. New features include Conv3DTranspose layer, MobileNet new applications.

Recommended Resources:

Use Keras pre-training model to achieve transfer learning www.shiyanlou.com icon

Distributed deep learning

Dist-keras 16. The / elephas / the Spark-Deep-Learning (submitted by: 1125/170/67, contributors: 5/13/11)

As more and more use cases, it requires a lot of effort and time to learn the depth of the problem becomes more important. However, the use of distributed computing systems Apache Spark and the like, can more easily handle large volumes of data which in turn extends the possibility of deep learning. So dist-keras, elephas, and spark-deep-learning has become more popular since they can be used to solve the same task, it is difficult to choose from, these packages allow you to with the help of Apache Spark directly by Keras library train the neural network. Spark-deep-learning also provides a conduit to create a neural network using Python tools.

Natural Language Processing

NLTK 17. The (Submission: 13041, contributors: 236)

NLTK is a set of libraries, a natural language processing platform. With the help of NLTK, you can process and analyze text in various ways, mark it and extract information. NLTK also be used for prototyping and constructing research system. NLTK API and compatibility improvements include small changes, and new interfaces of CoreNLP.

Spacy 18. The (Submission: 8623, contributors: 215)

SpaCy natural language processing library has excellent example, API documentation, and presentation applications. The library is written in Cython Cython C language extensions in Python. It supports nearly 30 languages, providing simple integration of deep learning, and to ensure the stability and high accuracy. Another powerful feature SpaCy is no need to break down the document, the whole process the entire document.

Gensim 19. The (Submission: 3603, contributors: 273)

Gensim is a Python library for semantic analysis, modeling and theme vector space model is built on Numpy and Scipy. It provides NLP algorithm word2vec and so on. Although gensim own models.wrappers.fasttext achieve, but fasttext library can also be used to efficiently learn the words represented.

Data Capture

Scrapy 20. The (Submission: 6625, contributors: 281)

Scrapy scan can be used to create pages and collect structured data. Further, Scrapy can also extract data from the API. Due to its portability and scalability, Scrapy very easy to use. Scrapy this year's update includes a proxy server upgrades, as well as error notification and problem identification systems. This mechanical energy is used scrapy parse metadata provides a new method is provided.

Author: laboratory building online educational
link: https: //www.zhihu.com/question/322291562/answer/818168087
Source: know almost
copyrighted by the author. Commercial reprint please contact the author authorized, non-commercial reprint please indicate the source.