Lecture 1

Seminar in Advanced Machine Learning Rong Jin

Course Description Introduction to the state-of-the-art techniques in machine learning Focus of this semester Convex optimization Semi-supervised learning

Course Organization Each group has 1 to 3 students Each group covers one or two topics Usually each topic will take two lectures Please send me the information about each group and interesting topics to cover by the end of this week May take 1~2 credits as enrolling in independent study (CSE890)

Course Organization Course website: http://www.cse.msu.edu/~rongjin/adv_ml The best way to learn is discussion, discussion and discussion Never hesitate to raise questions Never ignore any details Let’s have fun!

Convex Programming and Classification Problems Rong Jin

Outline Connection between classification and linear programming (LP), convex quadratic programming (QP) has a long history Recent progresses in convex optimization: conic and semi-definite programming; robust optimization The purpose of this lecture is to outline some connections between convex optimization and classification problems

Support Vector Machine (SVM) Training examples Can be solved efficiently by quadratic programming D = f ( x 1 ; y 1 ) ; ( x 2 ; y 2 ) ; : : : ; ( x n ; y n ) g w h e r e x i 2 R d ; y i 2 f ¡ 1 ; + 1 g m i n k w k 2 2 s . t . y i ( w > x i ¡ b ) ¸ 1 ; i = 1 ; 2 ; : : : ; n

SVM: Robust Optimization SVMs are a way to handle noise in data points assume each data point is unknown-but-bounded in a sphere of radius  and center x i find the largest  such that separation is still possible between the two classes of perturbed points ½

SVM: Robust Optimization How to solve it? m a x ½ s . t . 8 i = 1 ; 2 ; : : : ; n j x ¡ x i j 2 · ½ ! y i ( w > x ¡ b ) ¸ 1

SVM: Robust Optimization j x ¡ x i j 2 · ½ ! y i ( w > x ¡ b ) ¸ 1 y i ( w > x i ¡ b ) ¡ ½ k w k 2 ¸ 1 m a x ½ s . t . 8 i = 1 ; 2 ; : : : ; n y i ( w > x i ¡ b ) ¡ ½ k w k 2 ¸ 1

Robust Optimization Linear programming (LP) Assume a i 's are unknown-but-bounded in ellipsoids Robust LP m i n c > x ; : a > i x · b i ; i = 1 ; 2 ; : : : ; n m i n c > x ; : 8 a i 2 E i ; a > i x · b i ; i = 1 ; 2 ; : : : ; n E i = f a i j ( a i ¡ ^ a i ) > § ¡ 1 i ( a i ¡ ^ a i ) · 1 g

Minimax Probability Machine (MPM) How to decide the decision boundary ? Negative Class Positive Class x > a · b N e g a t i v e c l a s s : x ¡ » N ( ¹ x ¡ ; § ¡ ) x > a ¸ b P o s i t i v e c l a s s : x + » N ( ¹ x + ; § + ) x > a = b

Minimax Probability Machine (MPM) w h e r e x ¡ » N ( ¹ x ¡ ; § ¡ ) a n d x + » N ( ¹ x + ; § + ) x > a · b x > a ¸ b Negative Class Positive Class m i n m a x ( ² + ; ² ¡ ) s . t . P r ( x > + a · b ) = 1 ¡ ² + P r ( x > ¡ a ¸ b ) = 1 ¡ ² ¡

Minimax Probability Machine (MPM) w h e r e x ¡ » N ( ¹ x ¡ ; § ¡ ) a n d x + » N ( ¹ x + ; § + ) x > a · b x > a ¸ b Negative Class Positive Class m i n ² s . t . P r ( x > + a · b ) ¸ 1 ¡ ² P r ( x > ¡ a ¸ b ) ¸ 1 ¡ ²

Minimax Probability Machine (MPM) Assume x follows the Gaussian distribution Negative Class Positive Class N ( ¹ x ; § ) P r ( x > a · b ) · 1 ¡ ² ¹ x > a + · k § a k 2 · b w h e r e · = © ¡ 1 ( 1 ¡ ² ) x > a · b x > a ¸ b

Minimax Probability Machine (MPM) m a x · s . t . x > + a + · k § + a k 2 2 · b x > ¡ a ¡ · k § ¡ a k 2 ¸ b Second order cone constraints m i n ® + ¯ s . t . a > ( x ¡ ¡ x + ) = 1 ® ¸ k § + a k 2 2 ; ¯ ¸ k § ¡ a k 2 2

Second Order Cone Programming (SOCP) x o ¸ p x 2 1 + x 2 2 ® ¸ k § ¡ a k 2 y = § ¡ a ® ¸ k y k 2 z 2 Q Ã ! z º Q 0 C o n e : Q = f z j z 0 ¸ k ¹ z k 2 g

Second Order Cone Programming (SOCP) SOCP LP Generalize the inequality definition

Minimax Probability Machine (MPM) m i n ² s . t . P r ( x > + a · b ) ¸ 1 ¡ ² P r ( x > ¡ a ¸ b ) ¸ 1 ¡ ² w h e r e x ¡ » N ( ¹ x ¡ ; § ¡ ) a n d x + » N ( ¹ x + ; § + ) m i n ² s . t . i n f x + » ( ¹ x + ; § + ) P r ( x > + a · b ) ¸ 1 ¡ ² i n f x ¡ » ( ¹ x ¡ ; § ¡ ) P r ( x > ¡ a ¸ b ) ¸ 1 ¡ ²

MPM Chebychev inequality i n f x » ( ¹ x ; § ) P r ( x > a · b ) = ( b ¡ ¹ x > a ) 2 + ( b ¡ ¹ x > a ) 2 + + a > § a w h e r e [ x ] + o u t p u t s 0 w h e n x < 0 a n d x w h e n x ¸ 0 . m i n ® + ¯ s . t . a > ( x ¡ ¡ x + ) = 1 ® ¸ k § + a k 2 2 ; ¯ ¸ k § ¡ a k 2 2

Pattern Invariance In Images Translation Rotation Shear

Learning from Invariance Trans. Á 1 Á 2

Incorporating Invariance Trans. Invariance transformation SVM incorporating invariance trans. Infinite number of examples T ( x ; µ = 0 ) = x x ( µ ) = T ( x ; µ ) : R d £ R ! R d m i n k w k 2 2 s . t . 8 µ 2 R ; i = 1 ; 2 ; : : : ; n y i ( w > x i ( µ ) ¡ b ) ¸ 1

Taylor Approximation of Invariance Taylor Expansion about  0 =0 gives

Polynomial Approximation What is the necessary and sufficient condition for a polynomial to always be non-negative? 8 µ 2 R : y w > x ( µ ) ¡ 1 ¸ 0

Non-Negative Polynomials (I) Theorem (Nesterov,2000): If r =2 l , the necessary and sufficient condition for polynomial p ( µ ) to be non-negative everywhere is Example: 9 P º 0 ; s . t . p ( µ ) = µ > P µ

Semidefinite Programming Machines A j := g 1, j g i , j g m , j B := 1 1 1 1 G 1, j G i , j G m , j Semi-definite programming 1 0 0 0 1 0 0 0 1 0 0 0

Semidefinite Programming (SDP) SDP LP Generalize the inequality definition

Beyond Convex Programming In most cases, problems are non-convex optimization Approximation Linear programming approximation LMI relaxation (drop rank constraints) Submodular function approximation Difference of two convex functions (DC)

Example: MAXCUT Problem Exponential # of points = NP-hard problem ! m i n x > Q x s . t . x i 2 f ¡ 1 ; + 1 g m i n x > Q x s . t . x 2 i = 1

LMI Relaxation x 2 i = 1 X = x x > X i ; i = 1 m i n x > Q x s . t . x i 2 f ¡ 1 ; + 1 g m i n X i ; j Q i ; j X i ; j s . t . X i ; i = 1 X º 0 ; r a n k ( X ) = 1 m i n X i ; j Q i ; j X i ; j s . t . X i ; i = 1 ; X º 0

How Good is the Approximation? Nesterov prove recently d ¤ = m i n x > Q x s . t . x i 2 f ¡ 1 ; + 1 g g ¤ = m i n X i ; j Q i ; j X i ; j s . t . X i ; i = 1 ; X º 0 1 ¸ g ¤ d ¤ ¸ 2 ¼ = 0 : 6 3 6 6

What you should learn ? Basic concepts of convex sets and functions Basic theory of convex optimization How to formulate a problem into the standard convex optimization? How to efficiently approximate the solution given large datasets? (optional) How to approximate the non-convex programming problems into a convex one?

Lecture 1

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Lecture 1 (20)

More from butest (20)

Lecture 1

Editor's Notes