Optimization problem
All optimization problems can be written as:
Optimization Categories
-
convex v.s. non-convex
Deep Neural Network is non-convex -
continuous v.s.discrete
Most are continuous variable; tree structure is discrete -
constrained v.s. non-constrained
We add prior to make it a constrained problem -
smooth v.s.non-smooth
Most are smooth optimization
Different initialization brings different optimum (if not convex)
Idea: Give up global optimal and find a good local optimal.
-
Purpose of pre-training: Find a good initialization to start training, and then find a better local optimal.
-
Relaxation: Convert to a convex optimization problem.
-
Brute force: If a problem is small, we can use brute force.
Affine sets
A set is affine if the line through any two distinct points in lies in , i.e., if for any , and , we have
Note: The line passing throught and : .
Affine combination
We refer to a point of the form , where as an affine combination of the points . An affine set contains every affine combination of its points.
Affine hull
The set of all affine combinations of points in some set is called the affine hull of , and denoted :
The affine hull is the smallest affine set that contains
, in the following sense: if
is any affine set with
, then
.
Affine dimension: We define the affine dimension of a set as the dimension of its affine hull.
Convex Sets
A set
is convex if the line segment between any two points in
lies in
, i.e., if for any
,
and any
with
, we have
Roughly speaking, a set is convex if every point in the set can be seen by every other
point. Every affine set is also convex, since it contains the entire line between any two distinct points in it, and therefore also the line segment between the points.
Convex combination
We call a point of the form , where and , a convex combination of the points .
Convex hull
The convex hull of a set , denoted , is the set of all convex combinations of points in :
The convex hull is always convex. It is the smallest convex set that contains : If is any convex set that contains , then .
Cones
A set is called a cone, or nonnegative homogeneous, if for every and we have . A set is a convex cone if it is convex and a cone, which means that for any and , we have
Hyperplanes and halfspaces
A hyperplane is a set of the form
where , and .
This geometric interpretation can be understood by expressing the hyperplane in the form
where
is any point in the hyperplane.
A hyperplane divides into two halfspaces. A (closed) halfspace is a set of the form
where . Halfspaces are convex but not affine. The set $ {x | a^T < b }$ is called an open halfspace.
Polyhedra
A polyhedron is defined as the solution set of a finite number of linear equalities and inequalities:
A polyhedron is thus the intersection of a finite number of halfspaces and hyperplanes. Here is the compact notations:
Linearly Independent v.s. Affinely Independent
Consider the vectors (1,0), (0,1) and (1,1). These are affinely independent, but not independent. If you remove any one of them, their affine hull has dimension one. In contrast, the span of any two of them is all of , and hence these are not independent.
Simplexes
Suppose the points are affinely independent, which means are linearly independent. The simplex determined by them is given by
Note:
- The affine dimension of this simplex is .
A 1-dimensional simplex is a line segment; a 2-dimensional simplex is a triangle (including its interior); and a 3-dimensional simplex is a tetrahedron.
What is the key distinction between a convex hull and a simplex?
If the elements of the set on which the convex hull is defined are affinely independent, then the convex hull and the simplex defined on this set are the same. Otherwise, simplex can’t be defined on this set, but convex hull can.
Convex Functions
- A function is convex if dom is a convex set and if for all , , and with $ 0 \leq \theta \leq 1$, we have
-
We say is concave is is convex, and strictly concave if is strictly convex.
-
A function is convex if and only if it is convex when restricted to any line that intersects its domain. In other words f is convex if and only if for all and all , the function is convex (on its domain, ).
First-order conditions
- Suppose
is differentiable, then
is convex if and only if
is convex and
holds for all
-
For a convex function, the first-order Taylor approximation is in fact a global underestimator of the function. Conversely, if the first-order Taylor approximation of a function is always a global underestimator of the function, then the function is convex.
-
The inequality shows that from local information about a convex function (i.e., its value and derivative at a point) we can derive global information (i.e., a global underestimator of it).
Second-order conditions
-
Suppose that is twice differentiable. The is convex if and only if is convex and its Hessian is positive semidefinite: for all ,
-
is concave if and only if is convex and for all .
-
If $ \nabla^2f(x) \succ 0$ for all , then is strictly convex. The converse is not true. e.x. has zero second derivative at but is strictly convex.
-
Quadratic functions: Consider the quadratic function , with , given by
with , and . Since for all x, f is convex if and only if (and concave if and only if ).
Examples of Convex and Concave Functions
-
Exponential. is convex on , for any .
-
Powers. is convex on when or , and concave for .
-
Powers of absolute value. , for , is convex on .
-
Logarithm. is concave on .
-
Negative Entropy. (either on , or on , defined as for ) is convex.
-
Norms. Every norm on is convex.
-
Max function. is convex on .
-
Quadratic-over-linear function. The function , with
is convex. -
Log-sum-exp. The function is convex on .
-
Geometric mean. The geometric mean is concave on
-
Log-determinant. The function is concave.
Reference: Convex Optimization by Stephen Boyd and Lieven Vandenberghe.