SlideShare a Scribd company logo
Reinforcement Learning
with Thompson Sampling
(3rd)
ujava.org workshop
2016-08-28
www.idosi.com
CEO 강신동
Shindong KANG
(주)지능도시
www.idosi.comujava.org
www.idosi.comspaceapi.org
www.idosi.comReinforcement Learning for Brick Game
www.idosi.comReinforcement Learning
www.idosi.comForecast
www.idosi.comForecast with probability
www.idosi.comProbability (확률)
www.idosi.comConditional Probability (조건부 확률)
www.idosi.comBayesian Probability (베이지안 확률)
www.idosi.comBayes Rule Words
www.idosi.comBayesian Probability (베이지안 확률)
P(fair|H) = ?
P(A) = P(fair) = ½
P(B) = P(H) = ¾
P(B|A) = P(H|fair) = ½
½ ½ 1
--- = –--
¾ 3
www.idosi.comBrownian motion, Gaussian distribution
www.idosi.comMarkov Process
www.idosi.comStochastic Matrix
www.idosi.comStochastic Matrix
0.4 0.6
0.7 0.3
www.idosi.comExploitation and Exploration (개발 and 탐험)
www.idosi.comState-action exploration vs. Parameter exploration
www.idosi.comMulti-armed bandit problem
www.idosi.comSimulated Bandit Performance
www.idosi.comMulti-armed bandit problem
www.idosi.comMulti-Armed Bandit Algorithms
www.idosi.comMAB Reward
www.idosi.comGaussian Distribution
www.idosi.comGaussian Distribution
www.idosi.comGMM (Gaussian Mixture Model)
www.idosi.comGaussian Mixture Model
www.idosi.comGaussian Mixture Model
www.idosi.comFunction's Probability Distribution
Function's Probability Distribution ?
www.idosi.comFunction's Probability Distribution
y = ax^2 +b
www.idosi.comFunction's Probability Distribution with Gaussian Distribution
y = ax^2 +b
www.idosi.comFunction's Probability Distribution with Gaussian Distribution
www.idosi.comGaussian Process Regreesion
www.idosi.comGaussian Process
From “C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press,
2006”
www.idosi.comBayesian Optimization
www.idosi.comAcquisition function
www.idosi.comWhy Bayesian Optimization works
www.idosi.comBayesian reasoners
www.idosi.comIntelligent user interfaces regression
www.idosi.comSlot Machine
www.idosi.comMulti Armed Bandit
www.idosi.comMAB – Regret (후회)
www.idosi.comA/B Testing
www.idosi.comGreedy Algorithm
www.idosi.comGreedy Algorithm (Search Maximum)
www.idosi.comGreedy Algorithm (Search Tree)
www.idosi.comepsilon Greedy (epsilon = exploration)
www.idosi.comSoftmax
www.idosi.comSoftmax
www.idosi.comUCB
www.idosi.comargmax
www.idosi.comUCB
www.idosi.comUCB1
www.idosi.comLog graph
www.idosi.comUCB1
www.idosi.comIndicator function (표시함수)
www.idosi.comThompson sampling
Probability Matching,
Bayesian Bandit
www.idosi.comThompson sampling
www.idosi.comThompson sampling
(from SlideShare “Slice Technologies”)
www.idosi.comThompson sampling
www.idosi.comThompson sampling (area = 1)
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
19 / (19 + 9) = 19 / 28 = 0.679
59 / (59 + 39) = 59 / 98 = 0.60
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling Algorithm for Bernoulli bandits
www.idosi.comThompson sampling Algorithm for general stochastic bandits
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comThompson sampling
www.idosi.comMultiplay Thompson Sampling
(from MS Research)
www.idosi.comMultiplay Thompson sampling
Multi-play Thompson Sampling (MP-TS)
Improved Multi-play Thompson Sampling (IMP-TS)
www.idosi.com
Thank you !
(주)지능도시
Intelligent City Ltd.
강신동
Shindong KANG
www.idosi.com
ceo@idosi.com

More Related Content

ODP
ujava.org Reinforcement Learning (2nd)
PDF
IMAGESKART.COM CORPORATE PRESENTATION
PPTX
Helga y Gael
PPTX
PPTX
Car eko marketing plan 2
PDF
Manual de instalacion mysql
PDF
Productos acerca de google
PDF
Clusters and the Coordination of Private Sector Development Policies-Carlo Pi...
ujava.org Reinforcement Learning (2nd)
IMAGESKART.COM CORPORATE PRESENTATION
Helga y Gael
Car eko marketing plan 2
Manual de instalacion mysql
Productos acerca de google
Clusters and the Coordination of Private Sector Development Policies-Carlo Pi...

Viewers also liked (11)

PDF
Railway recruitment, Jobs (20 job vacancies in konkan railway)
PPTX
Bandit algorithms
ODP
ujava.org Drone Physics
PDF
F. Rob, MA. Marti and JM. Martin - Ultrafiltration on degreasing bath applic...
PPTX
Laminar air flow controller
PPTX
SME credit information presentation
PPTX
Web crawlers with reinforcement learning
PPTX
Introduction to Silicon Valley
PPT
Pid controller by Mitesh Kumar
PDF
AMINOÁCIDOS Y PROTEÍNAS
PDF
DeNAのゲーム開発を支える技術 (クライアントサイド編)
Railway recruitment, Jobs (20 job vacancies in konkan railway)
Bandit algorithms
ujava.org Drone Physics
F. Rob, MA. Marti and JM. Martin - Ultrafiltration on degreasing bath applic...
Laminar air flow controller
SME credit information presentation
Web crawlers with reinforcement learning
Introduction to Silicon Valley
Pid controller by Mitesh Kumar
AMINOÁCIDOS Y PROTEÍNAS
DeNAのゲーム開発を支える技術 (クライアントサイド編)
Ad

More from 신동 강 (16)

PDF
Graph Convolutional Neural Networks
ODP
Recurrent Neural Network tutorial (2nd)
ODP
Quantum Computer for Deep Learning
ODP
ujava.org Drone Scenario & Drone Airport Systems
ODP
Recursive Neural Network : ujava.org 12th deep learning workshop
ODP
NN Models with DL4J for Deep Learning
PDF
RBM with DL4J for Deep Learning
ODP
Deep Learning for Java (DL4J)
ODP
Ujava.org reinforcement-learning
PDF
Ujava.org tensor-analysis
PDF
Tensor Physics for Deep Learning
PDF
ujava.org Deep Learning with Convolutional Neural Network
PDF
Recurrent Neural Network, Fractal for Deep Learning
ODP
ujava.org workshop : Deep Learning [2015-03-08]
PPT
IoT & Machine Learning
PPT
IoT In-Depth Conference 강연 자료 (주)지능도시 강신동 양계장 비닐하우스 포함
Graph Convolutional Neural Networks
Recurrent Neural Network tutorial (2nd)
Quantum Computer for Deep Learning
ujava.org Drone Scenario & Drone Airport Systems
Recursive Neural Network : ujava.org 12th deep learning workshop
NN Models with DL4J for Deep Learning
RBM with DL4J for Deep Learning
Deep Learning for Java (DL4J)
Ujava.org reinforcement-learning
Ujava.org tensor-analysis
Tensor Physics for Deep Learning
ujava.org Deep Learning with Convolutional Neural Network
Recurrent Neural Network, Fractal for Deep Learning
ujava.org workshop : Deep Learning [2015-03-08]
IoT & Machine Learning
IoT In-Depth Conference 강연 자료 (주)지능도시 강신동 양계장 비닐하우스 포함
Ad

Recently uploaded (20)

PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Introduction to the R Programming Language
.pdf is not working space design for the following data for the following dat...
Business Ppt On Nestle.pptx huunnnhhgfvu
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Quality review (1)_presentation of this 21
1_Introduction to advance data techniques.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction-to-Cloud-ComputingFinal.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
SAP 2 completion done . PRESENTATION.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
STUDY DESIGN details- Lt Col Maksud (21).pptx
Qualitative Qantitative and Mixed Methods.pptx
Miokarditis (Inflamasi pada Otot Jantung)
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to the R Programming Language

ujava.org workshop : Reinforcement Learning with Thompson Sampling