[Mathematics] Popular understanding of Taylor's formula (Newton iteration method is useful)
Article directory
1 Introduction
Recently, I was looking at some methods related to machine learning optimization (gradient descent, Newton iteration, etc.), which also involved Taylor formula expansion, etc. I almost forgot what I learned in college, so I read some blogs and sorted them out.
Taylor formula, also known as Taylor expansion. It is a formula that uses the information of a function at a certain point to describe its nearby values. If the function is smooth enough, when the derivative values of each order of the function at a certain point are known, the Taylor formula can use these derivative values as coefficients to construct a polynomial approximation function to obtain the value in the neighborhood of this point.
- So what is the Taylor formula for?
- To put it simply, it is to use a polynomial function to approximate a given function (that is, to make the polynomial function image fit the given function image as much as possible). Note that when approximating, it must be expanded from a certain point on the function image.
- If you want to find a certain value of a very complex function, but it is impossible to find it directly, you can use Taylor's formula to approximate the value, which is one of the applications of Taylor's formula.
- Taylor's formula is mainly used in gradient iteration (Newton iteration method) in machine learning.
2. Popular understanding
The following formula is the simplest class of elementary functions, namely polynomials.
- The operation of polynomial itself is only addition, subtraction and multiplication of finite items, so in terms of numerical calculation, polynomial is a tool that people are willing to use. Therefore, we often use polynomials to approximate expression functions. This is why Taylor's formula chooses polynomial functions to approximate a given function.
2.1 Approximate calculation
Elementary mathematics has learned some functions such as:
but elementary mathematics has not answered how to calculate them, f ( x ) = cos xf(x) = \small \cos xf(x)=cosThe approximate calculation of x is taken as an example, here is the calculation:
1) One-time (linear) approximation
Use differential approximation formula f ( x ) ≈ f ( x 0 ) + f ′ ( x 0 ) ( x − x 0 ) f(x) \small \approx f(\small x_{0}) + {f}' (\small x_{0})(x - \small x_{0})f(x)≈f(x0)+f′(x0)(x−x0) (this formula is converted from the limit expression formula of derivative/differentiation), forx 0 = 0 \small x_{0} = 0x0=f ( x ) f(x)around 0The linear approximation of f ( x ) is:f ( x ) ≈ f ( 0 ) + f ′ ( 0 ) xf(x) \small \approx f(0) + {f}'(0) xf(x)≈f(0)+f′(0)x , 所以 f ( x ) = cos x ≈ 1 f(x) = \small \cos x \small \approx 1 f(x)=cosx≈1 , sof ( x ) f(x)f ( x ) atx 0 = 0 \small x_{0} = 0x0=Linear approximation function around 0 P 1 ( x ) = 1 P_{1}(x) = 1P1(x)=1 , as shown in the figure below:
- Linear approximation-advantages: simple form, convenient calculation;
- Linear Approximation - Disadvantage: The further you are from the origin O, the worse the approximation.
2) Quadratic approximation
Using the quadratic polynomial p 2 ( x ) = a 0 + a 1 x + a 2 x 2 p_2(x) = a_0 + a_1x + a_2 x^2p2(x)=a0+a1x+a2x2 to approximatef ( x ) = cos xf(x) = \small \cos xf(x)=cosx , we expect:
所以 cos x ≈ P 2 ( x ) = 1 − x 2 2 \small \cos x \small \approx \small P_{2}\left ( x \right ) = 1 - \small \frac{x^{2}}{2} cosx≈P2(x)=1−2x2,As shown below:
- Quadratic approximation is much better than linear approximation, but limited to [ − π 2 , π 2 ] [ \small -\frac{\pi }{2}, \small \frac{\pi }{2} ][−2p,2p] , outside of this range, the images are significantly different.
- Why do we expect the function value, first derivative value, and second derivative value of two functions to be equal at a certain point?
- Because these values express the most basic and main properties of the function (image), the approximation of these properties can make the two functions approximate (it can be seen intuitively from the above function image). This is also the basic idea of Taylor's formula.
3) Eight approximations
Using an octagonal polynomial p 8 ( x ) = a 0 + a 1 x + a 2 x 2 + . . . + a 8 x 8 p_8(x) = a_0 + a_1x + a_2 x^2 + ... + a_8 x ^8p8(x)=a0+a1x+a2x2+...+a8x8 to approximatef ( x ) = cos xf(x) = \small \cos xf(x)=cosx , we expect:
Therefore, we get:
-
The image is as follows:
-
P 8 ( x ) \small P_{8}\left ( x \right ) P8( x ) (green image) thanP 2 ( x ) \small P_{2}\left ( x \right )P2( x ) (blue image) is closer to the cosine function in a larger range (red image)
In summary, it can be seen from the above three different degrees of function approximation: when the accuracy requirement is high and the error needs to be estimated, a high-degree polynomial must be used to approximate the expression function, and the error formula is given at the same time. The above is a process of using polynomial functions to approximate a given function.
3. Derivation of Taylor's formula
This leads to a question: Given a function f ( x ) \small f\left ( x \right )f( x ) , to find a point at the specified pointx 0 \small x_{0}x0Nearby with f ( x ) \small f\left ( x \right )f( x ) very approximate polynomial functionP ( x ) \small P\left ( x \right )P( x ) , written as:
makingf ( x ) ≈ P n ( x ) \small f\left ( x \right ) \small \approx \small P_{n}\left ( x \right )f(x)≈Pn( x ) and make both errorsR n ( x ) = f ( x ) − P n ( x ) \small R_{n}\left ( x \right ) = f\left ( x \right ) - P_{n }\left ( x \right )Rn(x)=f(x)−Pn( x ) can be estimated. So what conditions should the polynomial to find meet, and what is the error?
- Geometrically, y = f ( x ) \small y = f\left ( x \right )y=f(x), y = P n ( x ) \small y = P_{n}\left ( x \right ) y=Pn( x ) represents two curves, as shown in the figure below:
make them at x 0 \small x_{0}x0The vicinity of is very close, it is obvious:
-
Firstly, two curves are required to be in ( x 0 , f ( x 0 ) ) \small \left ( x_{0},f\left ( x_{0} \right ) \right )(x0,f(x0) ) point intersection, that is,P n ( x 0 ) = f ( x 0 ) \small P_{n}\left ( x_{0} \right ) = f\left ( x_{0} \right )Pn(x0)=f(x0)
-
If you want to get closer, it is also required that the two curves are at ( x 0 , f ( x 0 ) ) \small \left ( x_{0},f\left ( x_{0} \right ) \right )(x0,f(x0) ) points are tangent, (It can be seen intuitively from the image that the intersection [brown and red image] and tangent [green and red image], the two curves are atx 0 \small x_{0}x0The proximity of the vicinity is obviously different, and the tangency is closer), that is, P n ′ ( x 0 ) = f ′ ( x 0 ) \small {P_{n}}'\left ( x_{0} \right ) = {f}'\left ( x_{0} \right )Pn′(x0)=f′(x0)
-
If you want to get closer, the curve is also required to be at ( x 0 , f ( x 0 ) ) \small \left ( x_{0},f\left ( x_{0} \right ) \right )(x0,f(x0) ) points have the same bending direction, (as shown in the figure above, the bending direction is opposite [green and red images]; the bending direction is the same [blue and red images], obviously at a distance fromx 0 \small x_{0}x0Far away, the difference between the two functions in the same bending direction is smaller), that is, P n ′ ′ ( x 0 ) = f ′ ′ ( x 0 ) \small {P_{n}}''\left ( x_{0 } \right ) = {f}''\left ( x_{0} \right )Pn′′(x0)=f′′(x0) , and then it can be deduced: if at( x 0 , f ( x 0 ) ) \small \left ( x_{0},f\left ( x_{0} \right ) \right )(x0,f(x0) ) nearP n ′ ( x 0 ) = f ′ ( x 0 ) \small {P_{n}}'\left ( x_{0} \right ) = {f}'\left ( x_{0} \ right)Pn′(x0)=f′(x0), P n ′ ′ ( x 0 ) = f ′ ′ ( x 0 ) ⋯ ⋯ ⋯ P n ( n ) ( x 0 ) = f n ( x 0 ) \small {P_{n}}''\left ( x_{0} \right ) = {f}''\left ( x_{0} \right ) \small \cdots \cdots \cdots \small P_{n}^{\left ( n \right )}\left ( x_{0} \right ) = f^{n}\left ( x_{0} \right ) Pn′′(x0)=f′′(x0)⋯⋯⋯Pn(n)(x0)=fn(x0) , the approximation gets better and better.
To sum up, the polynomial to be found should satisfy the following conditions:
- Explain how the above conversion is done, taking the second derivative of the third line above as an example:
- Transformation of the first arrow: put P n ( x ) \small P_{n}\left ( x \right )Pn( x ) After calculating the second order derivative function,x 0 \small x_{0}x0Bring in, get P n ′ ′ ( x 0 ) = 2 ! a 2 \small {P_{n}}''\left ( x_{0} \right ) = 2!a_{2}Pn′′(x0)=2!a2
- Transformation of the second arrow: so f ′ ′ ( x 0 ) = 2 ! a 2 \small {f}''\left ( x_{0} \right ) = 2!a_{2}f′′(x0)=2!a2,所以 a 2 = 1 2 ! f ′ ′ ( x 0 ) \small a_{2} = \frac{1}{2!}{f}''\left ( x_{0} \right ) a2=2!1f′′(x0)
polynomial function pn ( x ) = a 0 + a 1 ( x − x 0 ) + a 2 ( x − x 0 ) 2 + . . . + an ( x − x 0 ) n p_n(x) = a_0 + a_1( x - x0) + a_2 (x - x0)^2 + ... + a_n (x - x0)^npn(x)=a0+a1(x−x 0 )+a2(x−x 0 )2+...+an(x−x 0 )Coefficient a \small ain na can be all composed off ( x ) \small f\left ( x \right )f( x ) , then get:
where the error isR n ( x ) = f ( x ) − P n ( x ) \small R_{n} \left ( x \right ) = f\left (x \right ) - P_{n}\left ( x \right )Rn(x)=f(x)−Pn( x ) . Because polynomial functions are used to infinitely approximate a given function, there must be a slight error between the two.
4. Definition of Taylor formula
So we get the definition of Taylor's formula:
If the function f ( x ) \small f\left ( x \right )f( x ) containingx 0 \small x_{0}x0An open interval of ( a , b ) \small \left ( a,b \right )(a,b ) with up to( n + 1 ) \small \left ( n+1 \right )(n+1 ) order derivative, then for∀ x ∈ ( a , b ) \small \forall x \in \left ( a,b \right )∀x∈(a,b ) , there are:
the remainder (ie error)R n ( x ) = f ( n + 1 ) ( ξ ) ( n + 1 ) ! ( x − x 0 ) n + 1 \small R_{n}\left ( x \right ) = \frac{f^{\left ( n+1 \right )}(\xi )}{\left ( n+1 \right )!}(x-x_{0})^{n +1}Rn(x)=(n+1)!f( n + 1 ) (ξ)(x−x0)n + 1 \xiξ atx 0 \small x_{0}x0with xxbetween x . There are several ways to express the remainder of Taylor's formula. The previous expression is called the Lagrangian remainder of n-th order Taylor expansion.
- The Lagrangian remainder is the nth-order Taylor formula expanded by one more order, and n becomes n+1.
- Note that the residual term here is the error, because using a polynomial function to expand at a certain point and approximate a given function will definitely have a little bit of error in the end, which we call the residual term.
5. Extension - McLaughlin formula
McLaughlin's formula is a special case of Taylor's formula: when x 0 = 0 \small x_{0} = 0x0=Taylor's formula at 0 . Sox 0 = 0 \small x_{0} = 0x0=Substituting 0 into the formula, we get:
McLaughlin's formula with Peano remainders for several common elementary functions:
Peano remainders are( x − x 0 ) n \small \left ( x-x_{0 } \right )^{n}(x−x0)Higher-order infinitesimals of n :
reference
【1】https://blog.csdn.net/xiaojinger_123/article/details/127442655