[Mathematics] Popular understanding of Taylor's formula (Newton iteration method is useful)

[Mathematics] Popular understanding of Taylor's formula (Newton iteration method is useful)

1 Introduction

Recently, I was looking at some methods related to machine learning optimization (gradient descent, Newton iteration, etc.), which also involved Taylor formula expansion, etc. I almost forgot what I learned in college, so I read some blogs and sorted them out.

Taylor formula, also known as Taylor expansion. It is a formula that uses the information of a function at a certain point to describe its nearby values. If the function is smooth enough, when the derivative values ​​of each order of the function at a certain point are known, the Taylor formula can use these derivative values ​​as coefficients to construct a polynomial approximation function to obtain the value in the neighborhood of this point.

  • So what is the Taylor formula for?
    • To put it simply, it is to use a polynomial function to approximate a given function (that is, to make the polynomial function image fit the given function image as much as possible). Note that when approximating, it must be expanded from a certain point on the function image.
    • If you want to find a certain value of a very complex function, but it is impossible to find it directly, you can use Taylor's formula to approximate the value, which is one of the applications of Taylor's formula.
    • Taylor's formula is mainly used in gradient iteration (Newton iteration method) in machine learning.

2. Popular understanding

The following formula is the simplest class of elementary functions, namely polynomials.
insert image description here

  • The operation of polynomial itself is only addition, subtraction and multiplication of finite items, so in terms of numerical calculation, polynomial is a tool that people are willing to use. Therefore, we often use polynomials to approximate expression functions. This is why Taylor's formula chooses polynomial functions to approximate a given function.

2.1 Approximate calculation

Elementary mathematics has learned some functions such as:
insert image description here
but elementary mathematics has not answered how to calculate them, f ( x ) = cos ⁡ xf(x) = \small \cos xf(x)=cosThe approximate calculation of x is taken as an example, here is the calculation:

1) One-time (linear) approximation

Use differential approximation formula f ( x ) ≈ f ( x 0 ) + f ′ ( x 0 ) ( x − x 0 ) f(x) \small \approx f(\small x_{0}) + {f}' (\small x_{0})(x - \small x_{0})f(x)f(x0)+f(x0)(xx0) (this formula is converted from the limit expression formula of derivative/differentiation), forx 0 = 0 \small x_{0} = 0x0=f ( x ) f(x)around 0The linear approximation of f ( x ) is:f ( x ) ≈ f ( 0 ) + f ′ ( 0 ) xf(x) \small \approx f(0) + {f}'(0) xf(x)f(0)+f(0)x , 所以 f ( x ) = cos ⁡ x ≈ 1 f(x) = \small \cos x \small \approx 1 f(x)=cosx1 , sof ( x ) f(x)f ( x ) atx 0 = 0 \small x_{0} = 0x0=Linear approximation function around 0 P 1 ( x ) = 1 P_{1}(x) = 1P1(x)=1 , as shown in the figure below:
insert image description here

  • Linear approximation-advantages: simple form, convenient calculation;
  • Linear Approximation - Disadvantage: The further you are from the origin O, the worse the approximation.

2) Quadratic approximation

Using the quadratic polynomial p 2 ( x ) = a 0 + a 1 x + a 2 x 2 p_2(x) = a_0 + a_1x + a_2 x^2p2(x)=a0+a1x+a2x2 to approximatef ( x ) = cos ⁡ xf(x) = \small \cos xf(x)=cosx , we expect:

insert image description here
所以 cos ⁡ x ≈ P 2 ( x ) = 1 − x 2 2 \small \cos x \small \approx \small P_{2}\left ( x \right ) = 1 - \small \frac{x^{2}}{2} cosxP2(x)=12x2,As shown below:

insert image description here

  • Quadratic approximation is much better than linear approximation, but limited to [ − π 2 , π 2 ] [ \small -\frac{\pi }{2}, \small \frac{\pi }{2} ][2p2p] , outside of this range, the images are significantly different.
  • Why do we expect the function value, first derivative value, and second derivative value of two functions to be equal at a certain point?
    • Because these values ​​express the most basic and main properties of the function (image), the approximation of these properties can make the two functions approximate (it can be seen intuitively from the above function image). This is also the basic idea of ​​Taylor's formula.

3) Eight approximations

Using an octagonal polynomial p 8 ( x ) = a 0 + a 1 x + a 2 x 2 + . . . + a 8 x 8 p_8(x) = a_0 + a_1x + a_2 x^2 + ... + a_8 x ^8p8(x)=a0+a1x+a2x2+...+a8x8 to approximatef ( x ) = cos ⁡ xf(x) = \small \cos xf(x)=cosx , we expect:
insert image description here
Therefore, we get:
insert image description here

  • The image is as follows:
    insert image description here

  • P 8 ( x ) \small P_{8}\left ( x \right ) P8( x ) (green image) thanP 2 ( x ) \small P_{2}\left ( x \right )P2( x ) (blue image) is closer to the cosine function in a larger range (red image)

In summary, it can be seen from the above three different degrees of function approximation: when the accuracy requirement is high and the error needs to be estimated, a high-degree polynomial must be used to approximate the expression function, and the error formula is given at the same time. The above is a process of using polynomial functions to approximate a given function.

3. Derivation of Taylor's formula

This leads to a question: Given a function f ( x ) \small f\left ( x \right )f( x ) , to find a point at the specified pointx 0 \small x_{0}x0Nearby with f ( x ) \small f\left ( x \right )f( x ) very approximate polynomial functionP ( x ) \small P\left ( x \right )P( x ) , written as:
insert image description here
makingf ( x ) ≈ P n ( x ) \small f\left ( x \right ) \small \approx \small P_{n}\left ( x \right )f(x)Pn( x ) and make both errorsR n ( x ) = f ( x ) − P n ( x ) \small R_{n}\left ( x \right ) = f\left ( x \right ) - P_{n }\left ( x \right )Rn(x)=f(x)Pn( x ) can be estimated. So what conditions should the polynomial to find meet, and what is the error?

  • Geometrically, y = f ( x ) \small y = f\left ( x \right )y=f(x) y = P n ( x ) \small y = P_{n}\left ( x \right ) y=Pn( x ) represents two curves, as shown in the figure below:

insert image description here
make them at x 0 \small x_{0}x0The vicinity of is very close, it is obvious:

  1. Firstly, two curves are required to be in ( x 0 , f ( x 0 ) ) \small \left ( x_{0},f\left ( x_{0} \right ) \right )(x0,f(x0) ) point intersection, that is,P n ( x 0 ) = f ( x 0 ) \small P_{n}\left ( x_{0} \right ) = f\left ( x_{0} \right )Pn(x0)=f(x0)

  2. If you want to get closer, it is also required that the two curves are at ( x 0 , f ( x 0 ) ) \small \left ( x_{0},f\left ( x_{0} \right ) \right )(x0,f(x0) ) points are tangent, (It can be seen intuitively from the image that the intersection [brown and red image] and tangent [green and red image], the two curves are atx 0 \small x_{0}x0The proximity of the vicinity is obviously different, and the tangency is closer), that is, P n ′ ( x 0 ) = f ′ ( x 0 ) \small {P_{n}}'\left ( x_{0} \right ) = {f}'\left ( x_{0} \right )Pn(x0)=f(x0)

  3. If you want to get closer, the curve is also required to be at ( x 0 , f ( x 0 ) ) \small \left ( x_{0},f\left ( x_{0} \right ) \right )(x0,f(x0) ) points have the same bending direction, (as shown in the figure above, the bending direction is opposite [green and red images]; the bending direction is the same [blue and red images], obviously at a distance fromx 0 \small x_{0}x0Far away, the difference between the two functions in the same bending direction is smaller), that is, P n ′ ′ ( x 0 ) = f ′ ′ ( x 0 ) \small {P_{n}}''\left ( x_{0 } \right ) = {f}''\left ( x_{0} \right )Pn′′(x0)=f′′(x0) , and then it can be deduced: if at( x 0 , f ( x 0 ) ) \small \left ( x_{0},f\left ( x_{0} \right ) \right )(x0,f(x0) ) nearP n ′ ( x 0 ) = f ′ ( x 0 ) \small {P_{n}}'\left ( x_{0} \right ) = {f}'\left ( x_{0} \ right)Pn(x0)=f(x0) P n ′ ′ ( x 0 ) = f ′ ′ ( x 0 ) ⋯ ⋯ ⋯ P n ( n ) ( x 0 ) = f n ( x 0 ) \small {P_{n}}''\left ( x_{0} \right ) = {f}''\left ( x_{0} \right ) \small \cdots \cdots \cdots \small P_{n}^{\left ( n \right )}\left ( x_{0} \right ) = f^{n}\left ( x_{0} \right ) Pn′′(x0)=f′′(x0)⋯⋯⋯Pn(n)(x0)=fn(x0) , the approximation gets better and better.

To sum up, the polynomial to be found should satisfy the following conditions:
insert image description here

  • Explain how the above conversion is done, taking the second derivative of the third line above as an example:
    • Transformation of the first arrow: put P n ( x ) \small P_{n}\left ( x \right )Pn( x ) After calculating the second order derivative function,x 0 \small x_{0}x0Bring in, get P n ′ ′ ( x 0 ) = 2 ! a 2 \small {P_{n}}''\left ( x_{0} \right ) = 2!a_{2}Pn′′(x0)=2!a2
    • Transformation of the second arrow: so f ′ ′ ( x 0 ) = 2 ! a 2 \small {f}''\left ( x_{0} \right ) = 2!a_{2}f′′(x0)=2!a2,所以 a 2 = 1 2 ! f ′ ′ ( x 0 ) \small a_{2} = \frac{1}{2!}{f}''\left ( x_{0} \right ) a2=2!1f′′(x0)

polynomial function pn ( x ) = a 0 + a 1 ( x − x 0 ) + a 2 ( x − x 0 ) 2 + . . . + an ( x − x 0 ) n p_n(x) = a_0 + a_1( x - x0) + a_2 (x - x0)^2 + ... + a_n (x - x0)^npn(x)=a0+a1(xx 0 )+a2(xx 0 )2+...+an(xx 0 )Coefficient a \small ain na can be all composed off ( x ) \small f\left ( x \right )f( x ) , then get:
insert image description here
where the error isR n ( x ) = f ( x ) − P n ( x ) \small R_{n} \left ( x \right ) = f\left (x \right ) - P_{n}\left ( x \right )Rn(x)=f(x)Pn( x ) . Because polynomial functions are used to infinitely approximate a given function, there must be a slight error between the two.

4. Definition of Taylor formula

So we get the definition of Taylor's formula:

If the function f ( x ) \small f\left ( x \right )f( x ) containingx 0 \small x_{0}x0An open interval of ( a , b ) \small \left ( a,b \right )(a,b ) with up to( n + 1 ) \small \left ( n+1 \right )(n+1 ) order derivative, then for∀ x ∈ ( a , b ) \small \forall x \in \left ( a,b \right )x(a,b ) , there are:
insert image description here
the remainder (ie error)R n ( x ) = f ( n + 1 ) ( ξ ) ( n + 1 ) ! ( x − x 0 ) n + 1 \small R_{n}\left ( x \right ) = \frac{f^{\left ( n+1 \right )}(\xi )}{\left ( n+1 \right )!}(x-x_{0})^{n +1}Rn(x)=(n+1)!f( n + 1 ) (ξ)(xx0)n + 1 \xiξ atx 0 \small x_{0}x0with xxbetween x . There are several ways to express the remainder of Taylor's formula. The previous expression is called the Lagrangian remainder of n-th order Taylor expansion.

  • The Lagrangian remainder is the nth-order Taylor formula expanded by one more order, and n becomes n+1.
  • Note that the residual term here is the error, because using a polynomial function to expand at a certain point and approximate a given function will definitely have a little bit of error in the end, which we call the residual term.

5. Extension - McLaughlin formula

McLaughlin's formula is a special case of Taylor's formula: when x 0 = 0 \small x_{0} = 0x0=Taylor's formula at 0 . Sox 0 = 0 \small x_{0} = 0x0=Substituting 0 into the formula, we get:
insert image description here
McLaughlin's formula with Peano remainders for several common elementary functions:
insert image description here
Peano remainders are( x − x 0 ) n \small \left ( x-x_{0 } \right )^{n}(xx0)Higher-order infinitesimals of n :
insert image description here

reference

【1】https://blog.csdn.net/xiaojinger_123/article/details/127442655

Guess you like

Origin blog.csdn.net/qq_51392112/article/details/130645876