UA SIE545 优化理论基础3 Fritz-John与Kuhn-Tucker理论总结 带等式约束与不等式约束的极值问题

UA SIE545 优化理论基础3 Fritz-John与Kuhn-Tucker理论总结 带等式约束与不等式约束的极值问题

对于函数 f : X → Y f:X \to Y f:X→Y,我们希望 X X X是一个凸的度量空间(或者拓扑空间), Y Y Y是一个定义了良序的内积空间。 N ϵ ( x ) N_{\epsilon}(x) Nϵ​(x)表示中心为 x x x,半径为 ϵ \epsilon ϵ的领域,下面先列出几个会用到的概念。

Global minimum If x ˉ ∈ X \bar x \in X xˉ∈X, ∀ x ∈ X \forall x \in X ∀x∈X, f ( x ˉ ) ≤ f ( x ) f(\bar x) \le f(x) f(xˉ)≤f(x), then x ˉ \bar x xˉ is a global minimum. If equality in f ( x ˉ ) ≤ f ( x ) f(\bar x) \le f(x) f(xˉ)≤f(x) will never hold, then x ˉ \bar x xˉ is strict global minimum.

Local minimum If x ˉ ∈ X \bar x \in X xˉ∈X, ∃ ϵ > 0 \exists \epsilon>0 ∃ϵ>0, ∀ x ∈ N ϵ ( x ˉ ) \forall x \in N_{\epsilon}(\bar x) ∀x∈Nϵ​(xˉ), f ( x ˉ ) ≤ f ( x ) f(\bar x) \le f(x) f(xˉ)≤f(x), then x ˉ \bar x xˉ is a local minimum. If equality in f ( x ˉ ) ≤ f ( x ) f(\bar x) \le f(x) f(xˉ)≤f(x) will never hold, then x ˉ \bar x xˉ is strict local minimum.

Pseudoconvex ∀ x 1 , x 2 ∈ X \forall x_1,x_2 \in X ∀x1​,x2​∈X such that ∇ f ( x 1 ) T ( x 2 − x 1 ) ≥ 0 \nabla f(x_1)^T(x_2-x_1) \ge 0 ∇f(x1​)T(x2​−x1​)≥0, we have f ( x 2 ) ≥ f ( x 1 ) f(x_2) \ge f(x_1) f(x2​)≥f(x1​).

Quasiconvex ∀ x 1 , x 2 ∈ X , λ ∈ [ 0 , 1 ] \forall x_1,x_2 \in X,\lambda \in [0,1] ∀x1​,x2​∈X,λ∈[0,1] such that f ( λ x 1 + ( 1 − λ ) x 2 ) ≤ max ⁡ ( f ( x ) , f ( y ) ) f(\lambda x_1+ (1-\lambda)x_2) \le \max(f(x),f(y)) f(λx1​+(1−λ)x2​)≤max(f(x),f(y))


min ⁡ x ∈ S f ( x ) \min_{x \in S} f(x) minx∈S​f(x),其中 S S S是 X X X的子集。

Cone of feasible directions at x ˉ ∈ c l S \bar x \in clS xˉ∈clS,
D = { d ≠ 0 : ∃ δ > 0 , ∀ λ ∈ ( 0 , δ ) , x ˉ + λ d ∈ S } D=\{d \ne 0:\exists \delta>0, \forall \lambda \in (0,\delta),\bar x+\lambda d \in S\} D={d​=0:∃δ>0,∀λ∈(0,δ),xˉ+λd∈S}

Each d ∈ D d \in D d∈D is called feasible direction.

Cone of improving direction at x ˉ ∈ c l S \bar x \in clS xˉ∈clS,
F = { d : ∃ δ > 0 , ∀ λ ∈ ( 0 , δ ) , f ( x ˉ + λ d ) < f ( x ˉ ) } F=\{d:\exists \delta>0,\forall \lambda \in (0,\delta),f(\bar x+\lambda d)<f(\bar x)\} F={d:∃δ>0,∀λ∈(0,δ),f(xˉ+λd)<f(xˉ)}Each d ∈ F d \in F d∈F is called improving direction.


无约束极值问题
min ⁡ x ∈ X f ( x ) \min_{x \in X}f(x) x∈Xmin​f(x)

Let x ˉ ∈ X \bar x \in X xˉ∈X be a candidate point. PSD = positive semi-definite, PD = positive definite.

First order necessary condition:
∇ f ( x ˉ ) = 0 \nabla f(\bar x) = 0 ∇f(xˉ)=0

Second order necessary condition:
H ( x ˉ )   P S D H(\bar x)\ PSD H(xˉ) PSD

Second order sufficient condition:
H ( x ˉ )   P D H(\bar x)\ PD H(xˉ) PD

For convex optimization, sufficient and necessary condition:
∇ f ( x ˉ ) = 0 \nabla f(\bar x) = 0 ∇f(xˉ)=0


含等式约束与不等式约束的极值问题
min ⁡ x ∈ X f ( x ) s . t . g ( x ) ≤ 0 h ( x ) = 0 \min_{x \in X} f(x) \\ s.t. g(x) \le 0 \\ h(x)=0 x∈Xmin​f(x)s.t.g(x)≤0h(x)=0

Let L L L be Lagrange function:
L ( x ) = f ( x ) + u T g ( x ) + v T h ( x ) L(x) = f(x)+u^Tg(x)+v^Th(x) L(x)=f(x)+uTg(x)+vTh(x)

Fritz-John (FJ First order necessary condition):
u 0 T ∇ f ( x ˉ ) + u T ∇ g ( x ˉ ) + v T ∇ h ( x ˉ ) = 0 u T g ( x ˉ ) = 0 ( u 0 , u ) ≥ 0 , ( u 0 , u , v ) ≠ 0 u_0^T\nabla f(\bar x)+u^T\nabla g(\bar x)+v^T\nabla h(\bar x)=0 \\ u^T g(\bar x) = 0 \\ (u_0,u) \ge 0, (u_0,u,v) \ne 0 u0T​∇f(xˉ)+uT∇g(xˉ)+vT∇h(xˉ)=0uTg(xˉ)=0(u0​,u)≥0,(u0​,u,v)​=0

Karush-Kuhn-Tucker (KKT first order necessary condition):
g ( x ˉ ) ≤ 0 ∇ f ( x ˉ ) + u T ∇ g ( x ˉ ) + v T ∇ h ( x ˉ ) = 0 , u ≥ 0 u T g ( x ˉ ) = 0 g(\bar x) \le 0\\ \nabla f(\bar x)+u^T\nabla g(\bar x)+v^T\nabla h(\bar x)=0,u \ge 0 \\ u^T g(\bar x) = 0 g(xˉ)≤0∇f(xˉ)+uT∇g(xˉ)+vT∇h(xˉ)=0,u≥0uTg(xˉ)=0

and ∇ g ( x ˉ ) \nabla g(\bar x) ∇g(xˉ) and ∇ h ( x ˉ ) \nabla h(\bar x) ∇h(xˉ) are linearly independent. 第一个条件叫primal feasibility(PF)、第二个条件dual feasibility(DF),第三个条件约束的是complementary slacks (CS),最后一个条件叫constraint qualification,保证KKT与FJ等价。

KKT second-order necessary condition:
H L ( x ˉ )   P S D HL(\bar x)\ PSD HL(xˉ) PSD

KKT second-order sufficient condition:
H L ( x ˉ )   P D HL(\bar x)\ PD HL(xˉ) PD


考虑下面的优化问题
min ⁡ f ( x 1 , x 2 ) = ( x 1 − 1 ) 2 + x 2 2 s . t . g ( x 1 , x 2 ) = 2 k x 1 − x 2 2 ≤ 0 , k > 0 \min f(x_1,x_2)=(x_1-1)^2+x_2^2 \\ s.t.g(x_1,x_2)=2kx_1-x_2^2 \le 0, k > 0 minf(x1​,x2​)=(x1​−1)2+x22​s.t.g(x1​,x2​)=2kx1​−x22​≤0,k>0

因为
H f ( x 1 , x 2 ) = [ 2 0 0 2 ] ,   H g ( x 1 , x 2 ) = [ 0 0 0 − 2 ] Hf(x_1,x_2) = \left[ \begin{matrix} 2 & 0 \\ 0 & 2 \end{matrix}\right], \ Hg(x_1,x_2) = \left[ \begin{matrix} 0 & 0 \\ 0 & -2 \end{matrix}\right] Hf(x1​,x2​)=[20​02​], Hg(x1​,x2​)=[00​0−2​]

显然 f f f是凸函数但 g g g不是,因此这个优化不是凸优化,那么如何计算这个优化的最优解呢?


注意到目标函数是圆,约束的边界是抛物线,所以可以用作图法求解,非常简单读者可以自行尝试。我们来讨论一下代数解法,考虑KKT条件,但需要注意这个问题非凸,所以KKT条件并不是充分条件。因为
∇ g ( x 1 , x 2 ) = 2 [ k − x 2 ] , k > 0 \nabla g(x_1,x_2) = 2 \left[ \begin{matrix} k \\ -x_2 \end{matrix} \right],k>0 ∇g(x1​,x2​)=2[k−x2​​],k>0

也就是说 ∇ g ( x 1 , x 2 ) \nabla g(x_1,x_2) ∇g(x1​,x2​)非零,所以 ∇ g ( x ˉ ) \nabla g(\bar x) ∇g(xˉ) and ∇ h ( x ˉ ) \nabla h(\bar x) ∇h(xˉ) are linearly independent这个条件成立,因此KKT是必要条件。

定义Lagrange函数:
L ( x 1 , x 2 , u ) = f ( x 1 , x 2 ) + u g ( x 1 , x 2 ) = ( x 1 − 1 ) 2 + x 2 2 + u ( 2 k x 1 − x 2 2 ) L(x_1,x_2,u)=f(x_1,x_2)+ug(x_1,x_2) \\ =(x_1-1)^2+x_2^2 +u(2kx_1-x_2^2 ) L(x1​,x2​,u)=f(x1​,x2​)+ug(x1​,x2​)=(x1​−1)2+x22​+u(2kx1​−x22​)

写出KKT条件:
{ 2 ( x 1 − 1 ) + 2 k u = 0 2 x 2 − 2 u x 2 = 0 u g ( x 1 , x 2 ) = u ( 2 k x 1 − x 2 2 ) = 0 g ( x 1 , x 2 ) = 2 k x 1 − x 2 2 ≤ 0 u ≥ 0 \begin{cases} 2(x_1-1)+2ku = 0 \\ 2x_2 -2ux_2 = 0 \\ug(x_1,x_2) = u(2kx_1-x_2^2) = 0 \\ g(x_1,x_2)=2kx_1-x_2^2 \le 0 \\ u \ge 0 \end{cases} ⎩⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎧​2(x1​−1)+2ku=02x2​−2ux2​=0ug(x1​,x2​)=u(2kx1​−x22​)=0g(x1​,x2​)=2kx1​−x22​≤0u≥0​

接下来做分类讨论:

  1. u = 0 u=0 u=0, 则 x 2 = 0 x_2=0 x2​=0, x 1 = 1 x_1=1 x1​=1, not feasible ( x 1 = 1 , x 2 = 0 ⇒ g ( x 1 , x 2 ) > 0 x_1=1,x_2=0 \Rightarrow g(x_1,x_2) > 0 x1​=1,x2​=0⇒g(x1​,x2​)>0)
  2. u > 0 u>0 u>0, 第二个方程说明 2 x 2 ( u − 1 ) = 0 2x_2(u-1)=0 2x2​(u−1)=0,如果 x 2 = 0 x_2 =0 x2​=0,则 x 1 = 0 , u = 1 / k x_1=0,u=1/k x1​=0,u=1/k;如果 x 2 ≠ 0 x_2 \ne 0 x2​​=0, 则 u = 1 u=1 u=1,那么 x 1 = 1 − k x_1=1-k x1​=1−k, x 2 = ± 2 k ( 1 − k ) x_2=\pm \sqrt{2k(1-k)} x2​=±2k(1−k)

现在我们有了三组可能的解,但还需要根据 k k k的值做分类讨论:

  1. k > 1 k>1 k>1,只有 ( 0 , 0 ) (0,0) (0,0)是KKT point,因此 ( 0 , 0 ) (0,0) (0,0)是最优解;
  2. k ≤ 1 k \le 1 k≤1, ( 0 , 0 ) , ( 1 − k , 2 k ( 1 − k ) ) , ( 1 − k , − 2 k ( 1 − k ) ) (0,0),(1-k,\sqrt{2k(1-k)}),(1-k,-\sqrt{2k(1-k)}) (0,0),(1−k,2k(1−k) ​),(1−k,−2k(1−k) ​)都是KKT point,我们需要通过比较它们对应的目标函数的取值来确定谁是最优解

{ f ( 0 , 0 ) = 1 f ( 1 − k , ± 2 k ( 1 − k ) ) = 2 k − k 2 \begin{cases} f(0,0) = 1 \\ f(1-k,\pm \sqrt{2k(1-k)})=2k-k^2 \end{cases} {f(0,0)=1f(1−k,±2k(1−k) ​)=2k−k2​

因为
2 k − k 2 − 1 = − ( k − 1 ) 2 ≤ 0 2k-k^2-1 = -(k-1)^2 \le 0 2k−k2−1=−(k−1)2≤0

所以 ( 1 − k , 2 k ( 1 − k ) ) , ( 1 − k , − 2 k ( 1 − k ) ) (1-k,\sqrt{2k(1-k)}),(1-k,-\sqrt{2k(1-k)}) (1−k,2k(1−k) ​),(1−k,−2k(1−k) ​)是最优点。

上一篇:关于近端梯度下降的理解


下一篇:05