XGBoost study concluded (b)

1_XGBoost principle

\[ \begin{align} X\!G\!Boost&=eXtreme+GBDT\\ &=eXtreme+(Gradient+BDT) \\ &=eXtreme+Gradient+(Boosting+DecisionTree) \end{align} \]

\[Boosting \to BDT \to GBDT \to X\!G\!Boost\]

  * Improved method of boosting + tree DecisionTree -> BDT to enhance the decision tree

  * BDT + Gradient Gradient fit residuals -> GBDT gradient enhance tree

  * GBDT + eXtreme Engineering Optimization -> XGBoost

Decision Tree Representation:

 (1)树形结构,由根结点到叶结点
 (2)规则集表示方法,if--else---
 (3)回归树 在坐标系中画出回归决策树
 (4)用公式表示

Feature selection method of the decision tree

Tree pruning method

1_1_ upgrade method (Boosting)

  The method of using the lifting additive model and a forward stepwise algorithm .

  Additive Models: Model required to have additive (eg: decision tree)

   Additive model

\[f\left(x\right)=\sum_{m=1}^M\beta_m b\left(x;\gamma_m\right) \tag{1.1}\]

Wherein, \ (B \ left (X; \ gamma_m \ right) \) as the base function, \ (\ gamma_m \) as a function of the parameter group, \ (\ beta_m \) is a basis function coefficients.

A given data set, the model learning addition, experience points into a problem of minimizing the risk

  Given training data \ (\ {\ left (x_i , y_i \ right) \} _ {i = 1} ^ N \) and the loss function \ (L \ left (y, f \ left (x \ right) \ right) \) under conditions of learning additive model \ (f \ left (x \ right) \) be the empirical risk minimization problem:

\[\min_{\beta_m,\gamma_m}\sum_{i=1}^N L\left(y_i,\sum_{m=1}^M\beta_m b\left(x_i;\gamma_m\right)\right)\tag{1.2}\]

Additive model algorithm is not optimized for the additive model, but for the additive model of a part of a base-Models.

After optimization of a model group, accumulated into the current model, then the next group model optimization, so the problem disassembled into a small number of problems.

  Solving this optimization problem prior to step algorithm ideas: the additive model is because learning can be from front to back, each step only a learning function and its coefficients yl, gradually approach the objective function of formula (1.2), can be simplified optimization the complexity. Specifically, the following optimization at each step only loss function:

\[\min_{\beta,\gamma}\sum_{i=1}^N L\left(y_i,\beta b\left(x_i;\gamma\right)\right)\tag{1.3}\]

Former algorithm 1.1 algorithm to step

Input: training data set \ (T = \ {\ left (x_1, Y_1 \ right), \ left (x_2, Y_2 \ right), \ DOTS, \ left (x_n,, y_N \ right) \} \) ; Loss Function \ (L \ left (Y, F \ left (X \ right) \ right) \) ; basis functions \ (\ B {\ left (X; \ Gamma \ right) \} \) ;

Output: the additive model \ (f \ left (x \ right) \)

(1) Initialization \ (f_0 \ left (x \ right) = 0 \)

(2) \ (m = 1,2, \ dots , M \)

(A) minimize the loss function

\[\left(\beta_m,\gamma_m\right)=\mathop{\arg\min}_{\beta,\gamma} \sum_{i=1}^N L\left(y_i, f_{m-1}\left(x_i\right)+\beta b\left(x_i;\gamma\right)\right) \tag{1.4}\]

Obtain parameters \ (\ beta_m \) , \ (\ gamma_m \)

(B) update

\[f_m\left(x\right)=f_{m-1}\left(x\right)+\beta_m b\left(x;\gamma_m\right) \tag{1.5}\]

(3) to give Additive Models

\[f\left(x\right)=f_M\left(x\right)=\sum_{m=1}^M\beta_m b\left(x;\gamma_m\right) \tag{1.6}\]

  Prior to solve from the sub-step algorithm while \ (m = 1 \) to \ (M \) all parameters \ (\ beta_m, \ gamma_m \ ) optimization problem is reduced successively solving each \ (\ beta_m, \ gamma_m \ ) the optimization problem.

Each basis function of different learning ability, to each group plus the weight function, learning, and learning the weights \ (\ beta_m \)

\ ({\ gamma_m} \) is a parameter in the model

When general boosting, a decision tree is not necessarily a function group

1_2_ enhance decision tree (BDT, Boosting Decision Tree)

  A method to enhance the decision tree based functions to enhance the decision tree.

  Enhance decision tree model can be represented as a decision tree model additions:

\[f_M=\sum_{m=1}^M T\left(x;\Theta_m\right) \tag{2.1}\]

Wherein, \ (T \ left (X; \ Theta_m \ right) \) represents the decision tree; \ (Theta_m \ \) is the parameter decision tree; (M \) \ number of the tree.

There is no weight, the default weight omitted

  Enhance decision tree using a forward stepwise algorithm. First, a decision tree to determine the initial lift \ (f_0 \ left (X \ right) = 0 \) , the first \ (m \) step model

\[f_m\left(x\right)=f_{m-1}\left(x\right)+T\left(x;\Theta_m\right) \tag{2.2}\]

Wherein, \ (. 1-F_ {m} \ left (X \ right) \) as the current model, minimizing the risk of empirically determined parameters a decision tree \ (\ Theta_m \) ,

\[\hat{\Theta}_m=\mathop{\arg\min}_{\Theta_m}\sum_{i=1}^N L\left(y_i,f_{m-1}\left(x_i\right)+T\left(x_i;\Theta_m\right)\right) \tag{2.3}\]

  Known training data set \ (T = \ {\ left (x_1, Y_1 \ right), \ left (x_2, Y_2 \ right), \ DOTS \ left (x_n,, y_N \ right) \} \) , \ (x_i \ in \ mathcal {X-} \ subseteq \ mathbb {R & lt} ^ n-\) , \ (\ mathcal {X-} \) is the input space, \ (y_i \ in \ mathcal {the Y} \ subseteq \ mathbb {R & lt} \ ) , \ (\ mathcal the Y} {\) is the output space. If the input space \ (\ mathcal {X} \ ) is divided into \ (J \) region disjoint \ (R_1, R_2, \ DOTS, r_j \) , and determine the output at each constant area \ (C_J \) , then the decision tree can be expressed as

\[T\left(x;\Theta\right)=\sum_{j=1}^J c_j I\left(x\in R_j\right) \tag{2.4}\]

Wherein the parameter \ (\ Theta = \ {\ left (R_1, c_1 \ right), \ left (R_2, c_2 \ right), \ dots, \ left (R_J, c_J \ right) \} \) represents the decision tree the constant values and the regional zoning. \ (J \) is the complexity of the decision tree leaf node, i.e. the number of points.

Decision tree represents: the J input space X into disjoint regions, corresponding to the leaf node, each region of constant output \ (C_J \) , the output value of the leaf is

X is judged \ (C_J \) sample leaves, will output \ (C_J \) values, \ (the I \) is a function of the conditions, if x is \ ({{R_j}} \ ) , the \ (. 1 the I = \) ; when x is not \ (r_j \) , \ (the I = 0 \) .

Regression Trees + softmax / sigmoid -> Classification

  Enhance the use of decision trees to the following step before the algorithm:
\ [\} f_0 the begin align = left {\ left (X \ right) = 0 \\ & f_m \ left (X \ right) & F_ = {}. 1-m \ left ( x \ right) + T \ left (x; \ Theta_m \ right), \ quad m = 1,2, \ dots, m \\ f_M \ left (x \ right) & = \ sum_ {m = 1} ^ MT \ left (x; \ Theta_m \ right) \ end {align} \]

The first step to the previous algorithm \ (m \) step, given current model \ (. 1-F_ {m} \ left (X \ right) \) , need to be solved

\[\hat{\Theta}_m=\mathop{\arg\min}_{\Theta_m}\sum_{i=1}^N L\left(y_i,f_{m-1}\left(x_i\right)+T\left(x_i;\Theta_m\right)\right)​\]

To give \ ({\ Theta} _M \ \ Hat) , ie \ (m \) parameter tree.

  When using the squared error loss function , the

\[L\left(y,f\left(x\right)\right)=\left(y-f\left(x\right)\right)^2\]

Its loss becomes

\[\begin{align} L\left(y,f_{m-1}\left(x\right)+T\left(x;\Theta_m\right)\right) &=\left[y-f_{m-1}\left(x\right)-T\left(x;\Theta_m\right)\right]^2 \\ &=\left[r-T\left(x;\Theta_m\right)\right]^2 \end{align}\]

among them,

\[r=y-f_{m-1}\left(x\right) \tag{2.5}\]

Is the current model fits the data residuals (residual). Decision Tree to enhance the return of the problem, simply fit the residual of the current model.

Equation (2.5) and the squared error loss function \ (L \ left (y, f \ left (x \ right) \ right) = \ left (yf \ left (x \ right) \ right) ^ 2 \) Comparative

\ (R & lt \) if the actual output, \ ({T (X; \ Theta_m)} \) is fitted predicted output,

\ (r \) represents the residual of the current model, the formula (2.5) represents the fitting of a new tree

Decision Tree algorithm to enhance the regression of 2.1

Input: training data set \ (T = \ {\ left (x_1, Y_1 \ right), \ left (x_2, Y_2 \ right), \ DOTS, \ left (x_n,, y_N \ right) \} \) ;

Output: to enhance the decision tree \ (f_M \ left (x \ right) \)

(1) Initialization \ (f_0 \ left (x \ right) = 0 \)

(2) \ (m = 1,2, \ dots , M \)

(A) according to formula (2.5) calculates a residual

\[r_{mi}=y_i-f_{m-1}\left(x_i\right), \quad i=1,2,\dots,N\]

(B) fit residuals \ (r_ {mi} \) learning a regression tree, to give \ (T \ left (x; \ Theta_m \ right) \)

(c)更新$f_m\left(x\right)=f_{m-1}\left(x\right)+T\left(x;\Theta_m\right) $

(3) enhance the regression tree

\[f_M\left(x\right)=\sum_{m=1}^M T\left(x;\Theta_m\right) \]

When the cycle end?

  • 1. Specify the number of cycles
  • 2. When the residual small to a certain extent

1_3_ gradient lifting tree (GBDT, Gradient Boosting Decision Tree)

  Negative gradient algorithm uses gradient lifting loss function values ​​in the current model

\[-\left[\frac{\partial L\left(y,f\left(x_i\right)\right)}{\partial f\left(x_i\right)}\right]_{f\left(x\right)=f_{m-1}\left(x\right)} \tag{3.1}\]

As a regression tree algorithm to enhance the approximation residuals fitting a regression tree.

By (3.1) instead of doing an approximation of the current residuals

Loss function \ (L \ left (y, f \ left (x_i \ right) \ right) \) is a binary function, where \ (Y \) are known variables

This is not in the parameter space, but in the loss of function space is getting smaller and smaller

In the function space, to function with the loss function \ (f (x_i) \) required negative gradient of the guide, approximately residual value

3.1 gradient algorithm lifting scheme

Input: training data set \ (T = \ {\ left (x_1, Y_1 \ right), \ left (x_2, Y_2 \ right), \ DOTS, \ left (x_n,, y_N \ right) \} \) ; Loss Function \ (L \ left (y, f \ left (x \ right) \ right) \)

Output: Gradient lifting tree \ (\ hat {f} \ left (x \ right) \)

(1) Initialization

\[f_0\left(x\right)=\mathop{\arg\min}_c\sum_{i=1}^N L\left(y_i,c\right)\]

(2) \ (m = 1,2, \ dots , M \)

(A) a \ (I = 1,2, \ DOTS, N \) , calculated

\[r_{mi}=-\left[\frac{\partial L\left(y,f\left(x_i\right)\right)}{\partial f\left(x_i\right)}\right]_{f\left(x\right)=f_{m-1}\left(x\right)}\]

(B) of \ (r_ {mi} \) fitting a regression tree, to obtain a first \ (m \) tree leaf node region \ (R_ {mj}, j = 1,2, \ dots, J \ )

(C) of \ (J = 1,2, \ DOTS, J \) , calculated

\[c_{mj}=\mathop{\arg\min}_c\sum_{x_i\in R_{mj}} L\left(y_i, f_{m-1}\left(x_i\right)+c\right)\]

(d)更新\(f_m\left(x\right)=f_{m-1}\left(x\right)+\sum_{j=1}^J c_{mj} I\left(x\in R_{mj}\right)\)

(3) a gradient of the regression tree lifting

\[\hat{f}\left(x\right)=f_M\left(x\right)=\sum_{m=1}^M \sum_{j=1}^J c_{mj} I\left(x\in R_{mj}\right) \]

1_4_ extreme gradient lifting tree (XGBoost, eXtreme Gradient Boosting Decision Tree)

  训练数据集\(\mathcal{D}=\{\left(\mathbf{x}_i,y_i\right)\}\),其中\(\mathbf{x}_i\in\mathbb{R}^m,y_i\in\mathbb{R},\left|\mathcal{D}\right|=n\)

  Decision Tree Model

\[f\left(\mathbf{x}\right)=w_{q\left(\mathbf{x}\right)} \tag{4.1}\]

Wherein, \ (Q: \ R & lt mathbb {m} ^ \ to \ {. 1, \ DOTS, T \}, W \ in \ R & lt mathbb {T} ^ \) , \ (T \) decision tree leaf nodes .

Equation (4.1) in the \ (w_ {q \ left ( \ mathbf {x} \ right)} \) represents: Enter a \ (X \) , through the function \ ({q \ left (\ mathbf {x} \ right )} \) , the sample can be obtained \ (X \) which belongs to the leaf nodes, \ (W \) represents the score that the leaf node

\ ({q \ left (\ mathbf {x} \ right)} \) output is the leaf number

\ (W \) is a vector space, there are \ (T \) number, a score corresponding to the number of a leaf

\ (w_ {q \ left ( \ mathbf {x} \ right)} \) is a score of a leaf

  Enhance decision tree model predicted output

\[\hat{y}_i=\phi\left(\mathbf{x}_i\right)=\sum_{k=1}^K f_k\left(\mathbf{x}_i\right) \tag{4.2}\]

Wherein, \ (F_k \ left (\ mathbf {X} \ right) \) for the first \ (K \) decision trees.

  Regularization objective function

\[\mathcal{L}\left(\phi\right)=\sum_i l\left(\hat{y}_i,y_i\right)+\sum_k \Omega\left(f_k\right) \tag{4.3}\]

其中,\(\Omega\left(f\right)=\gamma T+\frac{1}{2}\lambda\|w\|^2=\gamma T+\frac{1}{2}\lambda\sum_{j=1}^T w_j^2\)

Minimized the loss function, will make the model more complex, should be reflected in the complexity of regularization in

  • Can not let the score \ (w \) is too high, scoring to put the regularization of
  • The more leaves, the more complex the trees, the leaves should be placed in the number of regularization, rather than adding constraints on each leaf, and processing constraints on all but leaves, leaves and small overall score can guarantee the number of leaves less, thereby enabling simple tree model.

  The first \ (t \) round of the objective function

\[\mathcal{L}^{\left(t\right)}=\sum_{i=1}^n l\left(y_i,\hat{y}^{\left(t-1\right)}_i+f_t\left(\mathbf{x}_i\right)\right)+\Omega\left(f_t\right) \tag{4.4}\]

Second-order Taylor expansion in

\ (\ hat {y} ^ {\ left (t-1 \ right)} _ i \) are known variables, corresponding to \ (X \) , \   ({F_T (_i {X})} \) corresponds \ (\ Delta {x} \ )

\[f(x+\Delta x)\approx f(x) + f^{\prime}\left(x\right)\Delta x + \frac{f^{\prime \prime}\left(x\right)\Delta x^{2}}{2} + O\]

  The first \ (T \) wheel objective function \ (\ hat {y} ^ {\ left (t-1 \ right)} \) at the second order Taylor expansion

\[\mathcal{L}^{\left(t\right)}\simeq\sum_{i=1}^n\left[l\left(y_i,\hat{y}^{\left(t-1\right)}\right)+g_i f_t\left(\mathbf{x}_i\right)+\frac{1}{2}h_i f^2_t\left(\mathbf{x}_i\right)\right]+\Omega\left(f_t\right) \tag{4.5}\]

其中,\(g_i=\partial_{\hat{y}^{\left(t-1\right)}}l\left(y_i,\hat{y}^{\left(t-1\right)}\right),h_i=\partial^2_{\hat{y}^{\left(t-1\right)}}l\left(y_i,\hat{y}^{\left(t-1\right)}\right)\)

Optimization goal \ ({F_T ({X} _i)} \) , but \ (l \ left (y_i, \ hat {y} ^ {\ left (t-1 \ right)} \ right) \) Current loss, known variables, no effect on solving the equation for the derivative \ (0 \) , it can be removed

  The first \ (t \) second-order Taylor expansion of the objective function wheel to remove about

\ (f_t \ left (\ mathbf {x} _i \ right) \) constant term

\[\begin{align} \tilde{\mathcal{L}}^{\left(t\right)}&=\sum_{i=1}^n\left[g_i f_t\left(\mathbf{x}_i\right)+\frac{1}{2}h_i f^2_t\left(\mathbf{x}_i\right)\right]+\Omega\left(f_t\right) \tag{4.6}\\ &=\sum_{i=1}^n\left[g_i f_t\left(\mathbf{x}_i\right)+\frac{1}{2}h_i f^2_t\left(\mathbf{x}_i\right)\right]+\gamma T+\frac{1}{2}\lambda\sum_{j=1}^T w_j^2 \end{align} \\\]

Defines leaf nodes \ (J \) index set of samples on the \ (I_j = \ {I | Q \ left (\ mathbf {X} _i \ right) = J \} \) , then the objective function can be expressed as leaf nodes by accumulating the form

$$$$

\ (I_j \) represents the leaf nodes \ (J \) sample sets

由公式\(f\left(\mathbf{x}\right)=w_{q\left(\mathbf{x}\right)}\)\({f_t({x}_i)}\)-->\(w_j\)

\[\tilde{\mathcal{L}}^{\left(t\right)}=\sum_{j=1}^T\left[\left(\sum_{i\in I_j}g_i\right)w_j+\frac{1}{2}\left(\sum_{i\in I_j}h_i+\lambda\right)w_j^2\right]+\gamma T \tag{4.7}\]

When derivation, T a leaf, only one leaf is present, then the other leaf leaves derivation of the current 0

In the above calculation, \ (T \) has as a constant, \ (T \) are the leaves of the decision tree of nodes, i.e., to calculate the optimum output value for each leaf node in the previous time, \ ( T \) a leaf node must be finalized, it means that the shape of the tree must be determined.

The above assumptions, when the shape of the tree determines, through calculation, the output value is determined leaf \ (w ^ * \)

But we do not know the structure of the tree, this time need to \ (w ^ * \) Anti back to the objective function.

由于\[w_j^*=\mathop{\arg\min}_{w_j}\tilde{\mathcal{L}}^{\left(t\right)}\]

可令\[\frac{\partial\tilde{\mathcal{L}}^{\left(t\right)}}{\partial w_j}=0\]

Get each leaf node \ (j \) the optimal fraction

\[w_j^*=-\frac{\sum_{i\in I_j}g_i}{\sum_{i\in I_j} h_i+\lambda} \tag{4.8}\]

Substituting each leaf node \ (J \) optimal fraction, the value of the objective function to be optimized

\[\tilde{\mathcal{L}}^{\left(t\right)}\left(q\right)=-\frac{1}{2}\sum_{j=1}^T \frac{\left(\sum_{i\in I_j} g_i\right)^2}{\sum_{i\in I_j} h_i+\lambda}+\gamma T \tag{4.9}\]

\(\tilde{\mathcal{L}}^{\left(t\right)}\left(q\right)\)\(\frac{\left(\sum_{i\in I_j} g_i\right)^2}{\sum_{i\in I_j} h_i+\lambda}\)\(w^*\)中的\(\frac{\sum_{i\in I_j}g_i}{\sum_{i\in I_j} h_i+\lambda}\)非常相似

Can be understood as \ (\ tilde {\ mathcal { L}} ^ {\ left (t \ right)} \ left (q \ right) \) of one leaf node obtained cumulative (accumulated each leaf node loss, optimal loss), considered a measure of the leaf node output contribution of the objective function

  Suppose \ (I_L \) and \ (I_R \) are set around the node instance after splitting, so \ (I_L the I = \ Cup I_R \) , after the division reduces the amount of loss is given by the formula

\[\mathcal{L}_{split}=\frac{1}{2}\left[\frac{\left(\sum_{i\in I_L} g_i\right)^2}{\sum_{i\in I_L}h_i+\lambda}+\frac{\left(\sum_{i\in I_R} g_i\right)^2}{\sum_{i\in I_R}h_i+\lambda}-\frac{\left(\sum_{i\in I} g_i\right)^2}{\sum_{i\in I}h_i+\lambda}\right]-\gamma \tag{4.10}\]

To assess the node to be split.

xgboost breadth-first, layer by layer split


Each traversal of the entire data set for each sample are calculated feature to \ (\ mathcal {L} _ {split} \) as the metrics, the optimal value of the minimum

XGBoost split indicators :

1. Assume the tree structure -> 2 After calculating the predicted output of each node -> the objective function back to the inverse 3 -> 4 each of which each leaf node is considered a measure of loss -> 5. this represents a loss to measure the time division basis for split -> 6 all data through the data set characteristics, to find the optimal split nodes \ (\ mathcal {L} _ {split} \) minimum -> after 7 split into around a tree branch, division according to the same rules

Accurate algorithm greedy algorithm to find the 4.1 division

Input: Set the current node Examples \ (the I \) ; characteristic dimension \ (D \)

Output: splitting the maximum score

(1)\(gain\leftarrow 0\)

(2)\(G\leftarrow\sum_{i\in I}g_i\)\(H\leftarrow\sum_{i\in I}h_i\)

(3)for \(k=1\) to \(d\) do

(3.1)\(G_L \leftarrow 0\)\(H_L \leftarrow 0\)

(3.2)for \(j\) in sorted(\(I\), by \(\mathbf{x}_{jk}\)) do

(3.2.1)\(G_L \leftarrow G_L+g_j\)\(H_L \leftarrow H_L+h_j\)

(3.2.2)\(G_R \leftarrow G-G_L\)\(H_R=H-H_L\)

(3.2.3)\(score \leftarrow \max\left(score,\frac{G_L^2}{H_L+\lambda}+\frac{G_R^2}{H_R+\lambda}-\frac{G^2}{H+\lambda}\right)\)

(3.3)end

(4)end

4.2 division algorithm greedy algorithm to find approximate

(1)for \(k=1\) to \(d\) do

(1.1) by the feature \ (K \) percentile seeking candidate division points \ (S_k = \ {s_ { k1}, s_ {k2}, \ dots, s_ {kl} \} \)

(1.2) can be (globally), after each division can be (partially) after every single tree generation

(2)end

(3)for \(k=1\) to \(m\) do

(3.1)\(G_{kv}\gets =\sum_{j\in\{j|s_{k,v}\geq\mathbf{x}_{jk}>s_{k,v-1}\}}g_j\)

(3.2)\(H_{kv}\gets =\sum_{j\in\{j|s_{k,v}\geq\mathbf{x}_{jk}>s_{k,v-1}\}}h_j\)

(4)end

In the same step in the previous section, the maximum is found in the proposed segmentation.

Candidate division points \ (S_k = \ {s_ { k1}, s_ {k2}, \ dots, s_ {kl} \} \) , the

make

\ [S_ {k1} = \ min_i \ mathbf {x} _ {i}, s_ {k} = \ max_i \ mathbf {x} _ {i} \]

Meet the rest of the split point

\[|r_k\left(s_{k,j}\right)-r_k\left(s_{k,j+1}\right)|<\epsilon \tag{4.11}\]

Wherein, the function \ (r_k: \ mathbb {R } \ to [0, + \ infty) \)

\[r_k\left(z\right)=\frac{1}{\sum_{\left(x,h\right)\in\mathcal{D}_k}h}\sum_{\left(x,h\right)\in\mathcal{D}_k,x<z}h\]

\(\mathcal{D}_k=\{\left(x_{1k},h_1\right),\left(x_{2k},h_2\right),\dots,\left(x_{nk},h_n\right)\}\)

\ (H_i \) as a cause of data points is due to the weight, the formula (4.6) can be rewritten

可得\[\sum_{i=1}^n\frac{1}{2}h_i\left(f_t\left(\mathbf{x}_i\right)-g_i/h_i\right)^2+\Omega\left(f_t\right)+constant\]

That is, the weight of \ (H_i \) a \ (f_t \ left (\ mathbf {x} _i \ right) \) of \ (g_i / h_i \) weighted squared loss.

Benpian study notes from the July online courseware machine learning Chen's training camp, I've seen so far the most detailed xgboost derivation,
derivation on Chen lesson more exciting, so I deduced according to the teacher in class, on the basis of the original courseware refine some of the content, we want to understand help.
If the content is wrong, please correct me, thank you!

Guess you like

Origin www.cnblogs.com/conan-ai/p/11343011.html