[Master Python in 100 Days] Day74: Python machine learning ecosystem (numpy, scipy, scikit-learn, etc.), library installation environment construction (conda virtualenv), and introductory code examples

The Python machine learning ecosystem is a large and growing community that includes many open source libraries, frameworks, and tools, providing a wide range of choices for machine learning practitioners.

Here are some key components of the Python machine learning ecosystem:

1.1 NumPy and SciPy:

(1)NumPy (Numerical Python): Provides multi-dimensional array and matrix operations and is the basis for almost all data science and machine learning libraries Base.

(2) SciPy (Scientific Python): is built on the basis of NumPy and includes many advanced scientific computing functions, such as optimization, signal processing, linear algebra, etc.

Code example: NumPy is used to process multi-dimensional arrays, and SciPy provides more scientific computing tools.

import numpy as np
from scipy import optimize

# NumPy数组操作
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape)

# SciPy最小化例子
def objective_function(x):
    return x[0]**2 + x[1]**2

result = optimize.minimize(objective_function, [1, 1])
print(result.x)

1.2 Pandas：

It provides data structures (such as DataFrame) and data analysis tools to make data processing more convenient and suitable for cleaning, conversion and analysis of structured data.

Pandas is used for processing and analyzing structured data, such as DataFrames.

import pandas as pd

# 创建一个简单的DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]}
df = pd.DataFrame(data)

# 查看 DataFrame 头部
print(df.head())

1.3 Matplotlib 和 Seaborn：

Seaborn: Based on Matplotlib, it provides higher-level statistical graphics, making drawing easier.

Matplotlib: Used to generate a variety of static, dynamic and interactive charts, essential for data visualization.

Matplotlib is used to draw various charts, and Seaborn is its high-level interface that simplifies drawing.

import matplotlib.pyplot as plt
import seaborn as sns

# Matplotlib折线图
plt.plot([1, 2, 3, 4], [10, 20, 25, 30])
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Simple Line Plot')
plt.show()

# Seaborn散点图
sns.scatterplot(x='Age', y='Name', data=df)
plt.show()

1.4 Scikit-Learn：

Provides a wealth of machine learning algorithms and models, including classification, regression, clustering, dimensionality reduction, etc., as well as tools for model evaluation and selection.

Scikit-Learn provides a variety of machine learning algorithms and models.

Code example:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# 加载数据集
X, y = np.array([[1], [2], [3]]), np.array([2, 4, 6])

# 拆分数据集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 线性回归模型
model = LinearRegression()

# 拟合模型
model.fit(X_train, y_train)

# 预测
predictions = model.predict(X_test)

# 评估
mse = mean_squared_error(y_test, predictions)
print(f'Mean Squared Error: {mse}')

1.5 TensorFlow 和 PyTorch：

TensorFlow: Provided by Google, it is a powerful deep learning framework that supports static calculation graphs.

PyTorch: Provided by Facebook, it is famous for dynamic calculation graphs, making it more flexible to define and modify models.

TensorFlow and PyTorch are deep learning frameworks that can be used to build and train neural networks.

Code example:

# TensorFlow示例
import tensorflow as tf

# 创建一个简单的神经网络
model = tf.keras.Sequential([
    tf.keras.layers.Dense(units=1, input_shape=[1])
])

# 编译模型
model.compile(optimizer='sgd', loss='mean_squared_error')

# 训练模型
model.fit(X_train, y_train, epochs=100)

# PyTorch示例
import torch
import torch.nn as nn
import torch.optim as optim

# 创建一个简单的神经网络
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc = nn.Linear(1, 1)

    def forward(self, x):
        return self.fc(x)

# 初始化模型、损失函数和优化器
model = SimpleNN()
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# 训练模型
for epoch in range(100):
    inputs = torch.Tensor(X_train).float()
    labels = torch.Tensor(y_train).float()

    optimizer.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, labels.view(-1, 1))
    loss.backward()
    optimizer.step()

1.6 Jupyter Notebooks：

An interactive computing and visualization environment widely used for educating, experimenting, and sharing machine learning projects.

Jupyter Notebooks provide an interactive computing and visualization environment.

# 在Jupyter Notebook中可视化
%matplotlib inline

# Matplotlib折线图
plt.plot([1, 2, 3, 4], [10, 20, 25, 30])
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Simple Line Plot')
plt.show()

1.7 NLTK（Natural Language Toolkit）：

Ideal for processing and analyzing text data, including natural language processing (NLP) tasks such as tokenization, stemming, and bag-of-word models.

NLTK is a library for processing human language data and a powerful tool in the field of natural language processing (NLP). It includes various text processing and analysis functions, such as text tokenization, word stemming, bag-of-words model, etc.

Using NLTK for text processing in Python:

import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords

# 下载 NLTK 数据
nltk.download('punkt')
nltk.download('stopwords')

# 示例文本
text = "NLTK is a powerful library for natural language processing."

# 分词
tokens = word_tokenize(text)
print("Tokens:", tokens)

# 词干提取
stemmer = PorterStemmer()
stemmed_tokens = [stemmer.stem(token) for token in tokens]
print("Stemmed Tokens:", stemmed_tokens)

# 去除停用词
stop_words = set(stopwords.words('english'))
filtered_tokens = [token for token in stemmed_tokens if token.lower() not in stop_words]
print("Filtered Tokens:", filtered_tokens)

1.8 Statsmodels：

Provides tools for statistical model estimation and testing, suitable for statistical analysis and empirical research. Statsmodels is a library for estimating and testing statistical models, providing statistical tools such as linear models, time series analysis, and non-parametric estimation.

Use Statsmodels in Python for statistical analysis:

import statsmodels.api as sm
import numpy as np

# 生成示例数据
np.random.seed(42)
X = np.random.rand(100, 2)
y = 2 * X[:, 0] + 3 * X[:, 1] + np.random.randn(100)

# 添加常数列作为截距
X = sm.add_constant(X)

# 拟合线性模型
model = sm.OLS(y, X).fit()

# 打印模型摘要
print(model.summary())

1.9 Virtualenv 和 Conda：

Tools for creating and managing Python virtual environments, helping to isolate dependencies between projects.

These libraries and tools work together to form a powerful and rich ecosystem, allowing machine learning practitioners to flexibly apply it in different fields and tasks.

virtualenvIs a tool for creating and managing Python virtual environments. Virtual environments allow you to isolate dependencies between different projects and avoid version conflicts.

`（1）virtualenv` Installation and use

First, make sure you have installed virtualenv:

pip install virtualenv

Then, create a virtual environment:

# 在项目目录下创建虚拟环境
virtualenv venv

# 激活虚拟环境
# 在 Windows 下：
venv\Scripts\activate
# 在 Linux/Mac 下：
source venv/bin/activate

Your command line prompt should now display the name of the virtual environment. In a virtual environment, you can install project-specific dependencies without affecting the global Python environment.

`（2）conda`Installation and use

CondaIt is an open source package management and environment management system for many programming languages, including Python. It can manage not only Python packages but also system dependencies.

First, make sure you have installed Anaconda or Miniconda (A streamlined version of Anaconda, containing only Conda and Python).

Create a Conda virtual environment

# 创建一个新环境
conda create --name myenv

# 激活环境
conda activate myenv

In this environment, you can use conda install to install Python packages without affecting other environments.

Deactivate virtual environment:

# 在 Windows 下：
conda deactivate
# 在 Linux/Mac 下：
deactivate

Code example:

# 使用 virtualenv 创建虚拟环境
# 在项目目录下创建虚拟环境
virtualenv venv

# 激活虚拟环境
# 在 Windows 下：
venv\Scripts\activate
# 在 Linux/Mac 下：
source venv/bin/activate

# 使用 conda 创建虚拟环境
# 创建一个新环境
conda create --name myenv

# 激活环境
conda activate myenv

1.10 Flask 和 Django：

Web framework that makes it possible to deploy machine learning models, provide APIs or build complete web applications.

Joblib:Tools for efficient parallel processing, especially suitable for training and evaluating large models.

1.11 Scrapy：

Framework for crawling and extracting data from web pages, useful for building training datasets.

Scrapy is an advanced Python framework for crawling and extracting data from web pages. It provides powerful tools and structures that enable users to define how to crawl websites and what to do with the extracted data. Scrapy uses the Twisted asynchronous network library to make crawling more efficient.

2 Environment installation

2.1 Install python

First, install Python. You can download the latest version of the Python installer from the Python official website and follow the prompts to install it.

Reference:[100 days to master python] Day1: Getting started with python_First introduction to python, setting up a python environment, running the first python applet_100 days of python_LeapMay's Blog - CSDN BlogThe article has been viewed and read 3k times, liked 22 times, and collected 82 times. Python is a high-level, general-purpose, interpreted programming language. It has easy-to-learn syntax and powerful functions, and is suitable for a variety of application fields, including web development, data analysis, artificial intelligence, and scientific computing. Python has huge community support and a wealth of third-party libraries and tools, making development more efficient and convenient. The python language can not only be applied to fields such as network programming and game development, but can also exert its expertise in graphics and image processing, intelligent robots, data crawling, automated operation and maintenance, etc., providing developers with a simple and elegant programming experience. _Python One Hundred Dayshttps://blog.csdn.net/qq_35831906/article/details/131671309

When the installation is complete. Enter the following command on the terminal command line to confirm the python version:

python --version

2.2 Install Scipy

SciPy is usually installed together with NumPy because it is built on top of NumPy. You can install SciPy using the following command:

pip install scipy

2.3 Install scikit-learn

Scikit-learn is a library for machine learning that provides many commonly used algorithms and tools. You can install scikit-learn using the following command:

pip install scikit-learn

2.4 A more convenient installation method anaconda

Anaconda is an open source distribution for scientific computing and machine learning, including a large number of commonly used libraries and tools. You can install Anaconda by following these steps:

Download Anaconda: Download the Anaconda installer for your operating system from the Anaconda official website.
Install Anaconda: Execute the downloaded installer and follow the prompts to install. During installation, you can choose whether to add Anaconda to the system path.
Create and activate the environment: Open the command line or the terminal provided by Anaconda, create a new environment and activate it:
```
conda create --name myenv conda activate myenv
```
Install SciPy and scikit-learn: In the activated environment, use conda to install SciPy and scikit-learn:
```
conda install scipy scikit-learn
```
Anaconda will take care of resolving dependencies and installing the appropriate packages.

With these steps, you can install Python, SciPy, and scikit-learn, either through pip or Anaconda. Using Anaconda also makes it easier to manage the environments and dependencies of different projects.

3 Summary

This article mainly introduces Python and its ecosystem in machine learning and the installation of related class libraries, including the following: Python and its class libraries and applications in machine learning, the main functions of Scipy and the extended class libraries it depends on . scikit-learn and the machine learning algorithms it provides.

Next, a machine learning example will be introduced. Through this example, readers can have a preliminary understanding of machine learning projects and understand the basic steps and processes of machine learning projects.