交叉熵损失函数原理和推导

一 交叉熵原理

1 信息量

信息量的大小与信息发生的概率成反比。
公式如下:
I ( x ) = − l o g ( P ( x ) ) I(x)=-log (P(x)) I(x)=log(P(x))
其中, I ( x ) I(x) I(x)为信息量, P ( x ) P(x) P(x)为某一事件发生的概率

2 信息熵(熵)

信息熵用来表示所有信息量的期望。
公式如下:
H ( X ) = − ∑ i = 1 n P ( x i ) log ⁡ ( P ( x i ) ) H(\mathrm{X})=-\sum_{i=1}^{n} P\left(x_{i}\right) \log \left(P\left(x_{i}\right)\right) H(X)=i=1nP(xi)log(P(xi))
其中 X X X为离散变量 ( X = x 1 , x 2 , … , x n ) (X=x 1, x 2, \ldots, x n) (X=x1,x2,,xn)

3 相对熵(KL散度)

使用KL散度来衡量对于同一随机变量的两个单独概率分布之间的差异。
公式如下:

D K L ( p ∥ q ) = ∑ i = 1 n p ( x i ) log ⁡ ( p ( x i ) q ( x i ) ) D_{K L}(p \| q)=\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(\frac{p\left(x_{i}\right)}{q\left(x_{i}\right)}\right) DKL(pq)=i=1np(xi)log(q(xi)p(xi))
P ( x ) P(x) P(x)表示样本的真实分布, Q ( x ) Q(x) Q(x)表示模型所预测的分布。
KL散度越小,表示 P ( x ) P(x) P(x) Q ( x ) Q(x) Q(x)的分布更接近,反复训练 Q ( x ) Q(x) Q(x)使其分布逼近 P ( x ) P(x) P(x)

4 交叉熵

交叉熵=相对熵-信息熵
H ( p , q ) = [ − ∑ i = 1 n p ( x i ) log ⁡ ( q ( x i ) ) ] H(p, q)=\left[-\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(q\left(x_{i}\right)\right)\right] H(p,q)=[i=1np(xi)log(q(xi))]
注:
D K L ( p ∥ q ) = ∑ i = 1 n p ( x i ) log ⁡ ( p ( x i ) q ( x i ) ) = ∑ i = 1 n p ( x i ) log ⁡ ( p ( x i ) ) − ∑ i = 1 n p ( x i ) log ⁡ ( q ( x i ) ) = H ( p ( x ) ) + [ − ∑ i = 1 n p ( x i ) log ⁡ ( q ( x i ) ) ] \begin{gathered} D_{K L}(p \| q)=\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(\frac{p\left(x_{i}\right)}{q\left(x_{i}\right)}\right) \\ =\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(p\left(x_{i}\right)\right)-\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(q\left(x_{i}\right)\right) \\ =H(p(x))+\left[-\sum_{i=1}^{n} p\left(x_{i}\right) \log \left(q\left(x_{i}\right)\right)\right] \end{gathered} DKL(pq)=i=1np(xi)log(q(xi)p(xi))=i=1np(xi)log(p(xi))i=1np(xi)log(q(xi))=H(p(x))+[i=1np(xi)log(q(xi))]
训练网络时输入数据与标签已经确定,即 P ( x ) P(x) P(x)确定,信息熵为常量。KL值越小,预测结果越好,需最小化KL散度,即用交叉熵损失函数计算。

5 小结

交叉熵源于信息论,主要用于度量两个概率分布间的差异性。
在线性回归问题中,常使用MSE作为损失函数;在分类问题中常使用交叉熵作为损失函数,在输出层使用softmax将输出的结果进行处理,使其多个分类的预测值和为1,再通过交叉熵来计算损失。

二 推导

1 Logistic交叉熵损失函数

公式
J ( θ ) = − 1 m ∑ i = 1 m y ( i ) log ⁡ ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right) J(θ)=m1i=1my(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))
导数
∂ ∂ θ j J ( θ ) = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \frac{\partial}{\partial \theta_{j}} J(\theta)=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} θjJ(θ)=m1i=1m(hθ(x(i))y(i))xj(i)
推导
对于logistic回归,m组样本,输入样本 x ( i ) = ( 1 , x 1 ( i ) , x 2 ( i ) , … , x p ( i ) ) T x^{(i)}=\left(1, x_{1}^{(i)}, x_{2}^{(i)}, \ldots, x_{p}^{(i)}\right)^{T} x(i)=(1,x1(i),x2(i),,xp(i))T,为 p + 1 p+1 p+1维向量(考虑bias); y ( i ) y^{(i)} y(i)表示类别,此处取0或1;模型的参数为 θ = ( θ 0 , θ 1 , … , θ p ) T \theta=\left(\theta_{0}, \theta_{1, \ldots,} \theta_{p}\right)^{T} θ=(θ0,θ1,,θp)T
θ T x ( i ) : = θ 0 + θ 1 x 1 ( i ) + ⋯ + θ p x p ( i ) . \theta^{T} x^{(i)}:=\theta_{0}+\theta_{1} x_{1}^{(i)}+\cdots+\theta_{p} x_{p}^{(i)} . θTx(i):=θ0+θ1x1(i)++θpxp(i).
假设函数定义为: h θ ( x ( i ) ) = 1 1 + e − θ T x ( i ) h_{\theta}\left(x^{(i)}\right)=\frac{1}{1+e^{ {-\theta ^T}x^{(i)}}} hθ(x(i))=1+eθTx(i)1
P ( y ^ ( i ) = 1 ∣ x ( i ) ; θ ) = h θ ( x ( i ) ) P ( y ^ ( i ) = 0 ∣ x ( i ) ; θ ) = 1 − h θ ( x ( i ) ) log ⁡ P ( y ^ ( i ) = 1 ∣ x ( i ) ; θ ) = log ⁡ h θ ( x ( i ) ) = log ⁡ 1 1 + e − θ T x ( i ) log ⁡ P ( y ^ ( i ) = 0 ∣ x ( i ) ; θ ) = log ⁡ ( 1 − h θ ( x ( i ) ) ) = log ⁡ e − θ T x ( i ) 1 + e − θ T x ( i ) \begin{gathered} P\left(\hat{y}^{(i)}=1 \mid x^{(i)} ; \theta\right)=h_{\theta}\left(x^{(i)}\right) \\ P\left(\hat{y}^{(i)}=0 \mid x^{(i)} ; \theta\right)=1-h_{\theta}\left(x^{(i)}\right) \\ \log P\left(\hat{y}^{(i)}=1 \mid x^{(i)} ; \theta\right)=\log h_{\theta}\left(x^{(i)}\right)=\log \frac{1}{1+e^{ {-\theta ^{T}} x^{(i)}}} \\ \log P\left(\hat{y}^{(i)}=0 \mid x^{(i)} ; \theta\right)=\log \left(1-h_{\theta}\left(x^{(i)}\right)\right)=\log \frac{e^{-\theta^{T} x^{(i)}}}{1+e^{-\theta^{T} x^{(i)}}} \end{gathered} P(y^(i)=1x(i);θ)=hθ(x(i))P(y^(i)=0x(i);θ)=1hθ(x(i))logP(y^(i)=1x(i);θ)=loghθ(x(i))=log1+eθTx(i)1logP(y^(i)=0x(i);θ)=log(1hθ(x(i)))=log1+eθTx(i)eθTx(i)
对于第 i i i组样本,假设函数表征正确的组合对数概率为:
I { y ( i ) = 1 } log ⁡ P ( y ^ ( i ) = 1 ∣ x ( i ) ; θ ) + I { y ( i ) = 0 } log ⁡ P ( y ^ ( i ) = 0 ∣ x ( i ) ; θ ) = y ( i ) log ⁡ P ( y ^ ( i ) = 1 ∣ x ( i ) ; θ ) + ( 1 − y ( i ) ) log ⁡ P ( y ^ ( i ) = 0 ∣ x ( i ) ; θ ) = y ( i ) log ⁡ ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) \begin{gathered} I\left\{y^{(i)}=1\right\} \log P\left(\hat{y}^{(i)}=1 \mid x^{(i)} ; \theta\right)+I\left\{y^{(i)}=0\right\} \log P\left(\hat{y}^{(i)}=0 \mid x^{(i)} ; \theta\right) \\ =y^{(i)} \log P\left(\hat{y}^{(i)}=1 \mid x^{(i)} ; \theta\right)+\left(1-y^{(i)}\right) \log P\left(\hat{y}^{(i)}=0 \mid x^{(i)} ; \theta\right) \\ =y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right) \end{gathered} I{ y(i)=1}logP(y^(i)=1x(i);θ)+I{ y(i)=0}logP(y^(i)=0x(i);θ)=y(i)logP(y^(i)=1x(i);θ)+(1y(i))logP(y^(i)=0x(i);θ)=y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))
对于 m m m组样本可得损失函数:
J ( θ ) = − 1 m ∑ i = 1 m y ( i ) log ⁡ ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right) J(θ)=m1i=1my(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))
J J J取负号的原因:表征正确的概率值越大,模型对数据的表达能力越好;但在衡量模型优劣时表现误差的损失函数且越小越好。两相矛盾,所以令损失函数对表征正确的组合对数概率取反。
求导
第一步:
J ( θ ) = − 1 m ∑ i = 1 m y ( i ) log ⁡ ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) = − 1 m ∑ i = 1 m [ − y ( i ) ( log ⁡ ( 1 + e − θ T x ( i ) ) ) + ( 1 − y ( i ) ) ( − θ T x ( i ) − log ⁡ ( 1 + e − θ T x ( i ) ) ) ] = − 1 m ∑ i = 1 m [ y ( i ) θ T x ( i ) − θ T x ( i ) − log ⁡ ( 1 + e − θ T x ( i ) ) ] = − 1 m ∑ i = 1 m [ y ( i ) θ T x ( i ) − log ⁡ e θ T x ( i ) − log ⁡ ( 1 + e − θ T x ( i ) ) ] = − 1 m ∑ i = 1 m [ y ( i ) θ T x ( i ) − ( log ⁡ e θ T x ( i ) + log ⁡ ( 1 + e − θ T x ( i ) ) ) ] = − 1 m ∑ i = 1 m [ y ( i ) θ T x ( i ) − log ⁡ ( e θ T x ( i ) + 1 ) ] \begin{gathered} J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\\ =-\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)}\left(\log \left(1+e^{-\theta^{T} x^{(i)}}\right)\right)+\left(1-y^{(i)}\right)\left(-\theta^{T} x^{(i)}-\log \left(1+e^{-\theta^{T} x^{(i)}}\right)\right)\right] \\ =-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \theta^{T} x^{(i)}-\theta^{T} x^{(i)}-\log \left(1+e^{-\theta^{T} x^{(i)}}\right)\right] \\ =-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \theta^{T} x^{(i)}-\log e^{\theta^{T} x^{(i)}}-\log \left(1+e^{-\theta^{T} x^{(i)}}\right)\right]_{} \\ =-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \theta^{T} x^{(i)}-\left(\log e^{\theta^{T} x^{(i)}}+\log \left(1+e^{-\theta^{T} x^{(i)}}\right)\right)\right]_{} \\ =-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \theta^{T} x^{(i)}-\log \left(e^{\theta^{T} x^{(i)}}+1\right)\right] \end{gathered} J(θ)=m1i=1my(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))=m1i=1m[y(i)(log(1+eθTx(i)))+(1y(i))(θTx(i)log(1+eθTx(i)))]=m1i=1m[y(i)θTx(i)θTx(i)log(1+eθTx(i))]=m1i=1m[y(i)θTx(i)logeθTx(i)log(1+eθTx(i))]=m1i=1m[y(i)θTx(i)(logeθTx(i)+log(1+eθTx(i)))]=m1i=1m[y(i)θTx(i)log(eθTx(i)+1)]
第二步:
∂ ∂ θ j J ( θ ) = ∂ ∂ θ j ( 1 m ∑ i = 1 m [ log ⁡ ( 1 + e θ T x ( i ) ) − y ( i ) θ T x ( i ) ] ) = 1 m ∑ i = 1 m ( x j ( i ) e θ T x ( i ) 1 + e θ T x ( i ) − y ( i ) x j ( i ) ) = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \begin{gathered} \frac{\partial}{\partial \theta_{j}} J(\theta)=\frac{\partial}{\partial \theta_{j}}\left(\frac{1}{m} \sum_{i=1}^{m}\left[\log \left(1+e^{\theta^{T} x^{(i)}}\right)-y^{(i)} \theta^{T} x^{(i)}\right]\right) \\ =\frac{1}{m} \sum_{i=1}^{m}\left(\frac{x_{j}^{(i)} e^{\theta^{T} x^{(i)}}}{1+e^{\theta^{T} x^{(i)}}}-y^{(i)} x_{j}^{(i)}\right) \\ =\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} \end{gathered} θjJ(θ)=θj(m1i=1m[log(1+eθTx(i))y(i)θTx(i)])=m1i=1m(1+eθTx(i)xj(i)eθTx(i)y(i)xj(i))=m1i=1m(hθ(x(i))y(i))xj(i)

2 Softmax交叉熵损失函数

公式
C = − ∑ i y i ln ⁡ a i C=-\sum_{i} y_{i} \ln a_{i} C=iyilnai
a i = e z i ∑ k e z k , z i = ∑ j w i j x i j + b a_{i}=\frac{e^{z _{i}}}{\sum_{k} e^{z _{k}}},z_{i}=\sum_{j} w_{i j} x_{i j}+b ai=kezkezi,zi=jwijxij+b
其中, y i y_{i} yi表示真实的分类结果, z i z_{i} zi为神经元的输出
w i j w_{i j} wij为第 i i i个神经元的第 j j j个权重, b b b是偏移值, z i z_{i} zi表示该网络的第 i i i个输出, a i a_{i} ai为给第 i i i个输出加softmax函数:
导数
∂ C ∂ z i = a i − y i \frac{\partial C}{\partial z_{i}}=a_{i}-y_{i} ziC=aiyi
推导
∂ C ∂ z i = ∑ j ( ∂ C j ∂ a j ∂ a j ∂ z i ) \frac{\partial C}{\partial z_{i}}=\sum_{j}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right) ziC=j(ajCjziaj)
∂ C j ∂ a j = ∂ ( − y j ln ⁡ a j ) ∂ a j = − y j 1 a j \frac{\partial C_{j}}{\partial a_{j}}=\frac{\partial\left(-y_{j} \ln a_{j}\right)}{\partial a_{j}}=-y_{j} \frac{1}{a_{j}} ajCj=aj(yjlnaj)=yjaj1
对于 ∂ a j ∂ z i \frac{\partial a_{j}}{\partial z_{i}} ziaj有如下两种情况:
(1) i = j i=j i=j
∂ a i ∂ z i = ∂ ( e z i ∑ k e z k ) ∂ z i = ∑ k e z k e z i − ( e z i ) 2 ( ∑ k e z k ) 2 = ( e z i ∑ k e z k ) ( 1 − e z i ∑ k e z k ) = a i ( 1 − a i ) \frac{\partial a_{i}}{\partial z_{i}}=\frac{\partial\left(\frac{e^{z _{i}}}{\sum_{k} e^{z _{k}}}\right)}{\partial z_{i}}=\frac{\sum_{k} e^{z _{k}} e^{z _{i}}-\left(e^{z _{i}}\right)^{2}}{\left(\sum_{k} e^{z _{k}}\right)^{2}}\\ =\left(\frac{e^{z_{i}}}{\sum_{k} e^{z k}}\right)\left(1-\frac{e^{z_{i}}}{\sum_{k} e^{z k}}\right)=a_{i}\left(1-a_{i}\right) ziai=zi(kezkezi)=(kezk)2kezkezi(ezi)2=(kezkezi)(1kezkezi)=ai(1ai)
(2) i ≠ j i \neq j i=j
∂ a j ∂ z i = ∂ ( e z j ∑ k e z k ) ∂ z i = − e z j ( 1 ∑ k e z k ) 2 e z i = − a i a j \frac{\partial a_{j}}{\partial z_{i}}=\frac{\partial\left(\frac{e^{z _{j}}}{\sum k e^{z_{k}}}\right)}{\partial z_{i}}=-e^{z_{ j}}\left(\frac{1}{\sum_{k} e^{z k}}\right)^{2} e^{z_ {i}}=-a_{i} a_{j} ziaj=zi(kezkezj)=ezj(kezk1)2ezi=aiaj
综上:
∂ C ∂ z i = ∑ j ( ∂ C j ∂ a j ∂ a j ∂ z i ) = ∑ j ≠ i ( ∂ C j ∂ a j ∂ a j ∂ z i ) + ∑ i = j ( ∂ C j ∂ a j ∂ a j ∂ z i ) = ∑ j ≠ i − y j 1 a j ( − a i a j ) + ( − y i 1 a i ) ( a i ( 1 − a i ) ) = ∑ j ≠ i a i y j + ( − y i ( 1 − a i ) ) = ∑ j ≠ i a i y j + a i y i − y i = a i ∑ j y j − y i \begin{aligned} &\frac{\partial C}{\partial z_{i}}=\sum_{j}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right)=\sum_{j \neq i}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right)+\sum_{i=j}\left(\frac{\partial C_{j}}{\partial a_{j}} \frac{\partial a_{j}}{\partial z_{i}}\right) \\ &=\sum_{j \neq i}-y_{j} \frac{1}{a_{j}}\left(-a_{i} a_{j}\right)+\left(-y_{i} \frac{1}{a_{i}}\right)\left(a_{i}\left(1-a_{i}\right)\right) \\ &=\sum_{j \neq i} a_{i} y_{j}+\left(-y_{i}\left(1-a_{i}\right)\right) \\ &=\sum_{j \neq i} a_{i} y_{j}+a_{i} y_{i}-y_{i} \\ &=a_{i} \sum_{j} y_{j}-y_{i} \end{aligned} ziC=j(ajCjziaj)=j=i(ajCjziaj)+i=j(ajCjziaj)=j=iyjaj1(aiaj)+(yiai1)(ai(1ai))=j=iaiyj+(yi(1ai))=j=iaiyj+aiyiyi=aijyjyi
针对分类问题, y i yi yi最终只会有一个类别是1,其他类别都是0
所以 ∂ C ∂ z i = a i − y i \frac{\partial C}{\partial z_{i}}=a_{i}-y_{i} ziC=aiyi

附录 求导公式和法则

基本初等函数求导公式
(1) ( C ) ′ = 0 \quad(C)^{\prime}=0 (C)=0
(2) ( x μ ) ′ = μ x μ − 1 \quad\left(x^{\mu}\right)^{\prime}=\mu x^{\mu-1} (xμ)=μxμ1
(3) ( sin ⁡ x ) ′ = cos ⁡ x (\sin x)^{\prime}=\cos x (sinx)=cosx
(4) ( cos ⁡ x ) ′ = − sin ⁡ x (\cos x)^{\prime}=-\sin x (cosx)=sinx
(5) ( tan ⁡ x ) ′ = sec ⁡ 2 x (\tan x)^{\prime}=\sec ^{2} x (tanx)=sec2x
(6) ( cot ⁡ x ) ′ = − csc ⁡ 2 x (\cot x)^{\prime}=-\csc ^{2} x (cotx)=csc2x
(7) ( sec ⁡ x ) ′ = sec ⁡ x tan ⁡ x (\sec x)^{\prime}=\sec x \tan x (secx)=secxtanx
(8) ( csc ⁡ x ) ′ = − csc ⁡ x cot ⁡ x (\csc x)^{\prime}=-\csc x \cot x (cscx)=cscxcotx
(9) ( a x ) ′ = a x ln ⁡ a \left(a^{x}\right)^{\prime}=a^{x} \ln a (ax)=axlna
(10) ( e x ) ′ = e x \left(\mathrm{e}^{x}\right)^{\prime}=\mathrm{e}^{x} (ex)=ex
(11) ( log ⁡ a x ) ′ = 1 x ln ⁡ a \left(\log _{a} x\right)^{\prime}=\frac{1}{x \ln a} (logax)=xlna1
(12) ( ln ⁡ x ) ′ = 1 x (\ln x)^{\prime}=\frac{1}{x} (lnx)=x1,
(13) ( arcsin ⁡ x ) ′ = 1 1 − x 2 (\arcsin x)^{\prime}=\frac{1}{\sqrt{1-x^{2}}} (arcsinx)=1x2 1
(14) ( arccos ⁡ x ) ′ = − 1 1 − x 2 (\arccos x)^{\prime}=-\frac{1}{\sqrt{1-x^{2}}} (arccosx)=1x2 1
(15) ( arctan ⁡ x ) ′ = 1 1 + x 2 (\arctan x)^{\prime}=\frac{1}{1+x^{2}} (arctanx)=1+x21
(16) ( arccot ⁡ x ) ′ = − 1 1 + x 2 (\operatorname{arccot} x)^{\prime}=-\frac{1}{1+x^{2}} (arccotx)=1+x21
求导法则
u = u ( x ) , v = v ( x ) u=u(x), v=v(x) u=u(x),v=v(x) 都可导, 则
(1) ( u ± v ) ′ = u ′ ± v ′ \quad(u \pm v)^{\prime}=u^{\prime} \pm v^{\prime} (u±v)=u±v
(2) ( C u ) ′ = C u ′ ( C (C u)^{\prime}=C u^{\prime}(C (Cu)=Cu(C 是常数)
(3) ( u v ) ′ = u ′ v + u v ′ \quad(u v)^{\prime}=u^{\prime} v+u v^{\prime} (uv)=uv+uv
(4) ( u v ) ′ = u ′ v − u v ′ v 2 \left(\frac{u}{v}\right)^{\prime}=\frac{u^{\prime} v-u v^{\prime}}{v^{2}} (vu)=v2uvuv
复合函数求导法则
y = f ( u ) y=f(u) y=f(u), 而 u = φ ( x ) u=\varphi(x) u=φ(x) f ( u ) f(u) f(u) φ ( x ) \varphi(x) φ(x) 都可导, 则复合函数 y = f [ φ ( x ) ] y=f[\varphi(x)] y=f[φ(x)] 的导数为
d y d x = d y d u ⋅ d u d x  或  y ′ = f ′ ( u ) ⋅ φ ′ ( x ) \frac{d y}{d x}=\frac{d y}{d u} \cdot \frac{d u}{d x} \text { 或 } y^{\prime}=f^{\prime}(u) \cdot \varphi^{\prime}(x) dxdy=dudydxdu  y=f(u)φ(x)

扫描二维码关注公众号,回复: 14811963 查看本文章

猜你喜欢

转载自blog.csdn.net/weixin_50008473/article/details/120505614