SlideShare a Scribd company logo
Paper Report for SDM course in 2016
Ad Click Prediction: a View from the Trenches
(Online Machine Learning)
報 告 者:蔡宗倫、洪紹嚴、蔡佳盈
日期:2016/12/22
2Final Presentation for SDM-2016
https://aci.in
fo/2014/07/
12/the-data-
explosion-in-
2014-
minute-by-
minute-
infographic/
3Final Presentation for SDM-2016
4Final Presentation for SDM-2016
READ DATA Time Memory
read.csv 264.5 (secs) 8.73 (GB)
fread 33.18 (secs) 2.98 (GB)
read.big.matrix 205.03 (secs) 0.2 (MB)
2GB 資料,四百萬筆資料,200個變數
lm Time Memory
read.csv X X
fread X X
read.big.matrix 2.72 (mins) 83.6 (MB)
5Final Presentation for SDM-2016
6Final Presentation for SDM-2016
7Final Presentation for SDM-2016
8Final Presentation for SDM-2016
Big Data (TB, PB, ZB)
Model
Train • Memory
• Time/Accuracy
Problem
• Parallel Computation: Hadoop, MapReduce, Spark (TB, PB, ZB)
• R-package: read.table, bigmemory, ff (GB)
• Online learning algorithms
Solutions
9Final Presentation for SDM-2016
TG
(2009, Microsoft)
FOBOS
(2009, Google)
RDA
(2010, Microsoft)
FTRL-Proximal
(2011, Google)
Logistic Regression
AOGD
(2007, IBM)
Adaptive online
gradient descend
Truncated Gradient
Online learning algorithms
Regularized dual averaging
Follow-the-regularized-Leader Proximal
Forward-Backward Splitting
10Final Presentation for SDM-2016
Big Data (TB, PB, ZB)
Model
Train
New
data
Renew
weights
• Memory
• Time/accuracy
Sparsity (LASSO)
SGD/OGD (NN/GBM)
Problem
11Final Presentation for SDM-2016
TG
(2009, Microsoft)
FOBOS
(2009, Google)
RDA
(2010, Microsoft)
FTRL-Proximal
(2011, Google)
Logistic Regression
AOGD
(2007, IBM)
+ =
Online learning algorithms
Adaptive online
gradient descend
Truncated Gradient
Regularized dual averaging
Follow-the-regularized-Leader Proximal
Forward-Backward Splitting
12
Online Gradient Descent-OGD
Kind of algorithms used on the online convex optimization
 Can be formulated as a repeated game between a player and an adversary
 At round 𝑡, the player chooses an action 𝑥𝑡 from some convex subset 𝐾 𝑜𝑓 ℝ 𝑛,
and then the adversary chooses a convex loss function 𝑓𝑡
 ℛ 𝑇 = 𝑡=1
𝑇
𝑓𝑡(𝑥 𝑡) − min
𝑥∈𝐾
𝑡=1
𝑇
𝑓𝑡 𝑥 , where 𝑥 is any fixed action
A central question is how the regret grows with the number of rounds
of the game
Final Presentation for SDM-2016
13
Online Gradient Descent-OGD
Zinkevich considered the following gradient descent algorithm, with step
size 𝜂 𝑡 = Θ
1
𝑡
.
 1: 𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑒 𝑥1 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑖𝑙𝑦.
 2: 𝒇𝒐𝒓 𝑡 = 1 𝑡𝑜 𝑇 𝒅𝒐
3: 𝑃𝑟𝑒𝑑𝑖𝑐𝑡 𝑥 𝑡, 𝑜𝑏𝑠𝑒𝑟𝑣𝑒 𝑓𝑡.
 4: 𝑈𝑝𝑑𝑎𝑡𝑒 𝑥𝑡+1 = 𝐾(𝑥𝑡 − 𝜂 𝑡+1 𝛻𝑓𝑡(𝑥𝑡)) .
 5: 𝒆𝒏𝒅 𝒇𝒐𝒓
Here, 𝐾 𝑣 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑡ℎ𝑒 𝐸𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛 𝑝𝑟𝑜𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑣 𝑜𝑛 𝑡𝑜 𝑡ℎ𝑒 𝑐𝑜𝑛𝑐𝑒𝑥 𝑠𝑒𝑡 𝐾
Final Presentation for SDM-2016
14
Forward-Backward Splitting (FOBOS)
(1)Loss function of Logistic Regression:
𝑊𝑡+1 = 𝑊𝑡 − η 𝜕
𝑙 𝑊𝑡, 𝑋
𝜕𝑊𝑡
𝑙(𝑊, 𝑋) = 𝑡=1
𝑛
log(1 + 𝑒−𝑦𝑡 (𝑊 𝑇 𝑥 𝑡))
Batch gradient descend formula:
Online gradient descend formula: 𝑔𝑡 =𝑙 𝑊𝑡, 𝑥𝑖
η𝜕
𝑙 𝑊𝑡, 𝑋
𝜕𝑊𝑡
Final Presentation for SDM-2016
15
Forward-Backward Splitting (FOBOS)
Final Presentation for SDM-2016
(1)Loss function of Logistic Regression:
𝑊𝑡+1 = 𝑊𝑡 − η 𝜕
𝑙 𝑊𝑡, 𝑋
𝜕𝑊𝑡
𝑙(𝑊, 𝑋) = 𝑡=1
𝑛
log(1 + 𝑒−𝑦𝑡 (𝑊 𝑇 𝑥 𝑡))
Batch gradient descend formula:
Online gradient descend formula:
(2) FOBOS的梯度下降公式,可以細分為兩部分:
 前部分:微調發生在梯度下降的結果(𝑾 𝒕+
𝟏
𝟐
)附近
 後部分:處理正則化,產生稀疏性
r(w) = λ||𝒙|| 𝟏
(regularization functions)
𝑔𝑡 =𝑙 𝑊𝑡, 𝑥𝑖
16Final Presentation for SDM-2016
(3) 要求得(2)最佳解的充分條件: 0 屬於其subgradient set之中
(4) 因為 ,(3) 可以改寫成:
(5) 換句話說,把(4)移項之後:
 迭代前的狀態𝑾 𝒕 與梯度
backward
 當次迭代的正則項資訊 𝝏𝒓(𝑾𝒕+𝟏)
forward
x
y
Forward-Backward Splitting (FOBOS)
17
FOBOS, RDA, FTRL-Proximal
Final Presentation for SDM-2016
(A):過去的累積梯度量
(B):regularization functions
(C):proximal: 𝑄𝑠 = learning rate (保證微調不會離0或已迭代後的解太遠)
𝚿 𝒙 ∶ λ||𝒙|| 𝟏 (non-smooth convex function)
𝚽𝒕 : certain subgradient of 𝚿 𝒙
18
FOBOS, RDA, FTRL-Proximal
Final Presentation for SDM-2016
OGD不夠稀疏
FOBOS能產生更
加好的稀疏特徵
梯度下降類方法,精度比較好
RDA可以在精度與稀疏
性之間做更好的平衡
稀疏性更加出色
最關鍵的不同點
是累積L1懲罰項
的處理方式
FTRL-Proximal
綜合FOBOS的精度和RDA的稀疏性
19Final Presentation for SDM-2016
20Final Presentation for SDM-2016
f(x) = 0.5A + 1.1B + 3.8C + 0.1D + 11E + 41F
1 2 3 4
Per-Coordinate
21Final Presentation for SDM-2016
f(x) = 0.4A + 0.8B + 3.8C + 0.8D + 0E + 41F
1 2 3 4
8 5 7 3
Per-Coordinate
22Final Presentation for SDM-2016
f(x) = 0.4A + 1.2B + 3.5C + 0.9D + 0.3E + 41F
1 2 3 4
8 5 7 3
Per-Coordinate
23Final Presentation for SDM-2016
Big Data (TB, PB, ZB)
Model
Train
New
data
Renew Weights
(per-coordinate)
• Memory
• Time/Accuracy
Sparsity (LASSO)
SGD/OGD (NN/GBM)
Problem
FOBOS
(2009, Google)
RDA
(2010, Microsoft)
FTRL-Proximal
(2011, Google)
Logistic Regression
+
24Final Presentation for SDM-2016
25Final Presentation for SDM-2016
R package: FTRLProximal
26Final Presentation for SDM-2016
27Final Presentation for SDM-2016
28Final Presentation for SDM-2016
29Final Presentation for SDM-2016
30Final Presentation for SDM-2016
31Final Presentation for SDM-2016
32Final Presentation for SDM-2016
33Final Presentation for SDM-2016
34Final Presentation for SDM-2016
https://w
ww.kaggle.
com/c/ava
zu-ctr-
prediction
35Final Presentation for SDM-2016
5.87GB
Prediction result
36Final Presentation for SDM-2016
37Final Presentation for SDM-2016
[1] John Langford, Lihong Li & Tong Zhang. Sparse Online Learning via Truncated
Gradient. Journal of Machine Learning Research, 2009.
[2] John Duchi & Yoram Singer. Efficient Online and Batch Learning using Forward
Backward Splitting. Journal of Machine Learning Research, 2009.
[3] Lin Xiao. Dual Averaging Methods for Regularized Stochastic Learning and Online
Optimization. Journal of Machine Learning Research, 2010.
[4] H. B. McMahan. Follow-the-regularized-leader and mirror descent: Equivalence
theorems and L1 regularization. In AISTATS, 2011.
[5] H. Brendan McMahan,Gary Holt, D. Sculley et al. Ad Click Prediction: a View from
the Trenches. In KDD , 2013.
[6] Peter Bartlett, Elad Hazan, and Alexander Rakhlin. Adaptive online gradient
descent. Technical Report UCB/EECS-2007-82, EECS Department, University of
California, Berkeley, Jun 2007.
[7] Martin Zinkevich. Online convex programming and generalized infinitesimal
gradient ascent. In ICML, pages 928–936, 2003.
Reference

More Related Content

PPTX
Online Optimization Problem-1 (Online machine learning)
PPT
Craig-Bampton Method
PDF
DFA Minimization in Map-Reduce
PPTX
DFA minimization algorithms in map reduce
PPTX
Block Diagram Reduction
PDF
2011 quick reference guide
PPTX
층류 익형의 설계 최적화
PDF
Using Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort
Online Optimization Problem-1 (Online machine learning)
Craig-Bampton Method
DFA Minimization in Map-Reduce
DFA minimization algorithms in map reduce
Block Diagram Reduction
2011 quick reference guide
층류 익형의 설계 최적화
Using Derivation-Free Optimization Methods in the Hadoop Cluster with Terasort

What's hot (17)

PDF
2015 01 09 - Rende - Unical - Martin Gebser: Clingo = Answer Set Programming ...
PDF
Incremental Control Dependency Frontier Exploration for Many-Criteria Test C...
PPTX
Assembly language (addition and subtraction)
PDF
PDF
Aistats RTD
PPTX
RaVioli: A Parallel Vide Processing Library with Auto Resolution Adjustability
PDF
PDF
Towards a General Approach for Symbolic Model-Checker Prototyping
PDF
Colored inversion
PPTX
Instruction Set Of 8086 DIU CSE
PPTX
Ration-by-Weight of Efficiency and Equity
PDF
PFDet: 2nd Place Solutions to Open Images Competition
PPTX
Fast rcnn
PPTX
8086 instruction set
PPT
Video lecture for bca
PDF
Orthogonal Faster than Nyquist Transmission for SIMO Wireless Systems
PDF
Reduction of multiple subsystem [compatibility mode]
2015 01 09 - Rende - Unical - Martin Gebser: Clingo = Answer Set Programming ...
Incremental Control Dependency Frontier Exploration for Many-Criteria Test C...
Assembly language (addition and subtraction)
Aistats RTD
RaVioli: A Parallel Vide Processing Library with Auto Resolution Adjustability
Towards a General Approach for Symbolic Model-Checker Prototyping
Colored inversion
Instruction Set Of 8086 DIU CSE
Ration-by-Weight of Efficiency and Equity
PFDet: 2nd Place Solutions to Open Images Competition
Fast rcnn
8086 instruction set
Video lecture for bca
Orthogonal Faster than Nyquist Transmission for SIMO Wireless Systems
Reduction of multiple subsystem [compatibility mode]
Ad

Similar to Introduction of Online Machine Learning Algorithms (20)

PDF
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
PDF
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
PDF
UK ATC 2015: Automated Post Processing of Multimodel Optimisation Data
PDF
Crude-Oil Scheduling Technology: moving from simulation to optimization
PDF
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
PDF
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
PPTX
JVM and OS Tuning for accelerating Spark application
PDF
[第34回 WBA若手の会勉強会] Microsoft AI platform
PPTX
Reporting Summary Information of Spatial Datasets and Non-Compliance Issues U...
PDF
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
PDF
Load testing of HELIDEM geo-portal: an OGC open standards interoperability ex...
PDF
Streaming your Lyft Ride Prices - Flink Forward SF 2019
PDF
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
PDF
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
PDF
Sparse matrix computations in MapReduce
PDF
pmux
PDF
LIAO TSEN YUNG Cover Letter
PPT
Intermachine Parallelism
PPTX
OOP 2012 - Kanban at Scale and why traditional approaches set you up for failure
PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
UK ATC 2015: Automated Post Processing of Multimodel Optimisation Data
Crude-Oil Scheduling Technology: moving from simulation to optimization
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
JVM and OS Tuning for accelerating Spark application
[第34回 WBA若手の会勉強会] Microsoft AI platform
Reporting Summary Information of Spatial Datasets and Non-Compliance Issues U...
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Load testing of HELIDEM geo-portal: an OGC open standards interoperability ex...
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Flink Forward San Francisco 2019: Streaming your Lyft Ride Prices - Thomas We...
Sparse matrix computations in MapReduce
pmux
LIAO TSEN YUNG Cover Letter
Intermachine Parallelism
OOP 2012 - Kanban at Scale and why traditional approaches set you up for failure
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Ad

More from Shao-Yen Hung (6)

PPTX
台灣漫畫史
PPTX
淺談秦始皇
PPTX
思考技術(2)---隱而未見的顯而易見
PPTX
思考技術(1)---勢
PPTX
Introduction of Spark
PPTX
Introduction of Hadoop
台灣漫畫史
淺談秦始皇
思考技術(2)---隱而未見的顯而易見
思考技術(1)---勢
Introduction of Spark
Introduction of Hadoop

Recently uploaded (20)

PDF
Business Analytics and business intelligence.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Computer network topology notes for revision
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
Business Analytics and business intelligence.pdf
climate analysis of Dhaka ,Banglades.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Business Ppt On Nestle.pptx huunnnhhgfvu
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
[EN] Industrial Machine Downtime Prediction
Data_Analytics_and_PowerBI_Presentation.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Supervised vs unsupervised machine learning algorithms
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
SAP 2 completion done . PRESENTATION.pptx
Database Infoormation System (DBIS).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Computer network topology notes for revision
Fluorescence-microscope_Botany_detailed content
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Miokarditis (Inflamasi pada Otot Jantung)

Introduction of Online Machine Learning Algorithms

  • 1. Paper Report for SDM course in 2016 Ad Click Prediction: a View from the Trenches (Online Machine Learning) 報 告 者:蔡宗倫、洪紹嚴、蔡佳盈 日期:2016/12/22
  • 2. 2Final Presentation for SDM-2016 https://aci.in fo/2014/07/ 12/the-data- explosion-in- 2014- minute-by- minute- infographic/
  • 4. 4Final Presentation for SDM-2016 READ DATA Time Memory read.csv 264.5 (secs) 8.73 (GB) fread 33.18 (secs) 2.98 (GB) read.big.matrix 205.03 (secs) 0.2 (MB) 2GB 資料,四百萬筆資料,200個變數 lm Time Memory read.csv X X fread X X read.big.matrix 2.72 (mins) 83.6 (MB)
  • 8. 8Final Presentation for SDM-2016 Big Data (TB, PB, ZB) Model Train • Memory • Time/Accuracy Problem • Parallel Computation: Hadoop, MapReduce, Spark (TB, PB, ZB) • R-package: read.table, bigmemory, ff (GB) • Online learning algorithms Solutions
  • 9. 9Final Presentation for SDM-2016 TG (2009, Microsoft) FOBOS (2009, Google) RDA (2010, Microsoft) FTRL-Proximal (2011, Google) Logistic Regression AOGD (2007, IBM) Adaptive online gradient descend Truncated Gradient Online learning algorithms Regularized dual averaging Follow-the-regularized-Leader Proximal Forward-Backward Splitting
  • 10. 10Final Presentation for SDM-2016 Big Data (TB, PB, ZB) Model Train New data Renew weights • Memory • Time/accuracy Sparsity (LASSO) SGD/OGD (NN/GBM) Problem
  • 11. 11Final Presentation for SDM-2016 TG (2009, Microsoft) FOBOS (2009, Google) RDA (2010, Microsoft) FTRL-Proximal (2011, Google) Logistic Regression AOGD (2007, IBM) + = Online learning algorithms Adaptive online gradient descend Truncated Gradient Regularized dual averaging Follow-the-regularized-Leader Proximal Forward-Backward Splitting
  • 12. 12 Online Gradient Descent-OGD Kind of algorithms used on the online convex optimization  Can be formulated as a repeated game between a player and an adversary  At round 𝑡, the player chooses an action 𝑥𝑡 from some convex subset 𝐾 𝑜𝑓 ℝ 𝑛, and then the adversary chooses a convex loss function 𝑓𝑡  ℛ 𝑇 = 𝑡=1 𝑇 𝑓𝑡(𝑥 𝑡) − min 𝑥∈𝐾 𝑡=1 𝑇 𝑓𝑡 𝑥 , where 𝑥 is any fixed action A central question is how the regret grows with the number of rounds of the game Final Presentation for SDM-2016
  • 13. 13 Online Gradient Descent-OGD Zinkevich considered the following gradient descent algorithm, with step size 𝜂 𝑡 = Θ 1 𝑡 .  1: 𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑒 𝑥1 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑖𝑙𝑦.  2: 𝒇𝒐𝒓 𝑡 = 1 𝑡𝑜 𝑇 𝒅𝒐 3: 𝑃𝑟𝑒𝑑𝑖𝑐𝑡 𝑥 𝑡, 𝑜𝑏𝑠𝑒𝑟𝑣𝑒 𝑓𝑡.  4: 𝑈𝑝𝑑𝑎𝑡𝑒 𝑥𝑡+1 = 𝐾(𝑥𝑡 − 𝜂 𝑡+1 𝛻𝑓𝑡(𝑥𝑡)) .  5: 𝒆𝒏𝒅 𝒇𝒐𝒓 Here, 𝐾 𝑣 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑡ℎ𝑒 𝐸𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛 𝑝𝑟𝑜𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑣 𝑜𝑛 𝑡𝑜 𝑡ℎ𝑒 𝑐𝑜𝑛𝑐𝑒𝑥 𝑠𝑒𝑡 𝐾 Final Presentation for SDM-2016
  • 14. 14 Forward-Backward Splitting (FOBOS) (1)Loss function of Logistic Regression: 𝑊𝑡+1 = 𝑊𝑡 − η 𝜕 𝑙 𝑊𝑡, 𝑋 𝜕𝑊𝑡 𝑙(𝑊, 𝑋) = 𝑡=1 𝑛 log(1 + 𝑒−𝑦𝑡 (𝑊 𝑇 𝑥 𝑡)) Batch gradient descend formula: Online gradient descend formula: 𝑔𝑡 =𝑙 𝑊𝑡, 𝑥𝑖 η𝜕 𝑙 𝑊𝑡, 𝑋 𝜕𝑊𝑡 Final Presentation for SDM-2016
  • 15. 15 Forward-Backward Splitting (FOBOS) Final Presentation for SDM-2016 (1)Loss function of Logistic Regression: 𝑊𝑡+1 = 𝑊𝑡 − η 𝜕 𝑙 𝑊𝑡, 𝑋 𝜕𝑊𝑡 𝑙(𝑊, 𝑋) = 𝑡=1 𝑛 log(1 + 𝑒−𝑦𝑡 (𝑊 𝑇 𝑥 𝑡)) Batch gradient descend formula: Online gradient descend formula: (2) FOBOS的梯度下降公式,可以細分為兩部分:  前部分:微調發生在梯度下降的結果(𝑾 𝒕+ 𝟏 𝟐 )附近  後部分:處理正則化,產生稀疏性 r(w) = λ||𝒙|| 𝟏 (regularization functions) 𝑔𝑡 =𝑙 𝑊𝑡, 𝑥𝑖
  • 16. 16Final Presentation for SDM-2016 (3) 要求得(2)最佳解的充分條件: 0 屬於其subgradient set之中 (4) 因為 ,(3) 可以改寫成: (5) 換句話說,把(4)移項之後:  迭代前的狀態𝑾 𝒕 與梯度 backward  當次迭代的正則項資訊 𝝏𝒓(𝑾𝒕+𝟏) forward x y Forward-Backward Splitting (FOBOS)
  • 17. 17 FOBOS, RDA, FTRL-Proximal Final Presentation for SDM-2016 (A):過去的累積梯度量 (B):regularization functions (C):proximal: 𝑄𝑠 = learning rate (保證微調不會離0或已迭代後的解太遠) 𝚿 𝒙 ∶ λ||𝒙|| 𝟏 (non-smooth convex function) 𝚽𝒕 : certain subgradient of 𝚿 𝒙
  • 18. 18 FOBOS, RDA, FTRL-Proximal Final Presentation for SDM-2016 OGD不夠稀疏 FOBOS能產生更 加好的稀疏特徵 梯度下降類方法,精度比較好 RDA可以在精度與稀疏 性之間做更好的平衡 稀疏性更加出色 最關鍵的不同點 是累積L1懲罰項 的處理方式 FTRL-Proximal 綜合FOBOS的精度和RDA的稀疏性
  • 20. 20Final Presentation for SDM-2016 f(x) = 0.5A + 1.1B + 3.8C + 0.1D + 11E + 41F 1 2 3 4 Per-Coordinate
  • 21. 21Final Presentation for SDM-2016 f(x) = 0.4A + 0.8B + 3.8C + 0.8D + 0E + 41F 1 2 3 4 8 5 7 3 Per-Coordinate
  • 22. 22Final Presentation for SDM-2016 f(x) = 0.4A + 1.2B + 3.5C + 0.9D + 0.3E + 41F 1 2 3 4 8 5 7 3 Per-Coordinate
  • 23. 23Final Presentation for SDM-2016 Big Data (TB, PB, ZB) Model Train New data Renew Weights (per-coordinate) • Memory • Time/Accuracy Sparsity (LASSO) SGD/OGD (NN/GBM) Problem FOBOS (2009, Google) RDA (2010, Microsoft) FTRL-Proximal (2011, Google) Logistic Regression +
  • 25. 25Final Presentation for SDM-2016 R package: FTRLProximal
  • 34. 34Final Presentation for SDM-2016 https://w ww.kaggle. com/c/ava zu-ctr- prediction
  • 35. 35Final Presentation for SDM-2016 5.87GB Prediction result
  • 37. 37Final Presentation for SDM-2016 [1] John Langford, Lihong Li & Tong Zhang. Sparse Online Learning via Truncated Gradient. Journal of Machine Learning Research, 2009. [2] John Duchi & Yoram Singer. Efficient Online and Batch Learning using Forward Backward Splitting. Journal of Machine Learning Research, 2009. [3] Lin Xiao. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization. Journal of Machine Learning Research, 2010. [4] H. B. McMahan. Follow-the-regularized-leader and mirror descent: Equivalence theorems and L1 regularization. In AISTATS, 2011. [5] H. Brendan McMahan,Gary Holt, D. Sculley et al. Ad Click Prediction: a View from the Trenches. In KDD , 2013. [6] Peter Bartlett, Elad Hazan, and Alexander Rakhlin. Adaptive online gradient descent. Technical Report UCB/EECS-2007-82, EECS Department, University of California, Berkeley, Jun 2007. [7] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, pages 928–936, 2003. Reference

Editor's Notes

  • #13: 1. The player aims to ensure that the total loss is not much larger than the smallest total loss of ant fixed action x 2. The difference between the total loss and its optimal value for a fixed action is known as the “regret” 3. Many problem of online prediction of individual sequences can be viewed as special cases of online convex optimization, including prediction with expert advice, sequential probability assignment, and sequential investment.
  • #14: 1. v is a point that achieves the smallest euclidean distance (歐氏距離) from v to the set K