© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
Introduction of Reinforcement Learning
1
곽동현
서울대학교 바이오지능 연구실
© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
Background
• 기존의 강화학습(Reinforcement Learning)에서 Q function을
DNN 혹은 CNN으로 근사하여 문제를 해결하는 시도가 최근
Google DeepMind를 필두로 활발히 연구가 되고 있다.
• 최근 연구에서는 Atari 2600, 바둑을 인간보다 더 잘 플레이하
는 수준의 경이적인 성과를 보이고 있으며, 나아가 3D 게임이
나 로봇 컨트롤 문제에도 적용되고 있다.
2
© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
What is AI? ML?
3https://www.linkedin.com/pulse/deep-dive-venture-landscape-ai-ajit-nazre-rahul-garg-nazre
© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
Various Field with ML
4https://www.linkedin.com/pulse/how-exceed-your-goals-2016-dr-travis-bradberry-1
© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
Various Algorithm in ML
5
© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
Function Approximation
6http://arxiv.org/pdf/1411.4555.pdf https://people.mpi-inf.mpg.de/~kkim/supres/supres.htm
© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
What is Deep Learning?
7
© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
Machine Learning
• Supervised Learning :
y = f(x)
• Unsupervised Learning :
x ~ p(x) , x = f(x)
• Reinforcement Learning :
??
8
© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
Agent-Environment Interaction
• Objective : Maximize the expected sum of future rewards
• Algorithms
1) Planning : Dynamic Programming Based
2) Reinforcement Learning : Machine Learning Based
9
© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
Example of Supervised
Learning
10
© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
Polynomial Curve Fitting
11
Microsoft Excel 2007의 추세선
© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
Example of
Unupervised Learning
12
© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
Clustering
13
http://www.frankichamaki.com/data-driven-market-segmentation-more-effective-marketing-to-
segments-using-ai/
© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
Example of
Reinforcement Learning
14
© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
Videos
• A crawling robot: a Q-learning example
https://www.youtube.com/watch?v=2iNrJx6IDEo
• Deep Reinforcement Learning for Robotic
Manipulation
https://youtu.be/ZhsEKTo7V04?t=1m27s
15
© 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
THANK YOU
16

More Related Content

PDF
Introduce Android TV and new features from Google I/O 2016
PDF
디자이너 없어도 괜찮아! (feat.Material Design Guide)
PDF
FIrebase를 이용한 호우호우 미니게임 만들기
PDF
안드로이드 데이터 바인딩
PDF
같은 유저수, 다른 수익? 모바일 앱의 수익을 높이는 방법
PDF
GKAC 2015 Apr. - 테스트 코드에서 코드 커버리지까지
PDF
Best Practices in Media Playback
PDF
RetroFit by Square - GDG Dallas 06/09/16
Introduce Android TV and new features from Google I/O 2016
디자이너 없어도 괜찮아! (feat.Material Design Guide)
FIrebase를 이용한 호우호우 미니게임 만들기
안드로이드 데이터 바인딩
같은 유저수, 다른 수익? 모바일 앱의 수익을 높이는 방법
GKAC 2015 Apr. - 테스트 코드에서 코드 커버리지까지
Best Practices in Media Playback
RetroFit by Square - GDG Dallas 06/09/16

Viewers also liked (16)

PPTX
GKAC 2015 Apr. - Xamarin forms, mvvm and testing
PDF
GKAC 2015 Apr. - Android Looper
PDF
Java Micro Edition Platform & Android - Seminar on Small and Mobile Devices
PDF
Memory Networks, Neural Turing Machines, and Question Answering
PPTX
Intro to Android : Making your first App!
PPTX
Async task, threads, pools, and executors oh my!
PDF
GKAC 2015 Apr. - Battery, 안드로이드를 위한 쉬운 웹 API 호출
PDF
Tensorflow 101
PDF
GKAC 2014 Nov. - Android Wear 개발, 할까요 말까요?
PDF
GKAC 2015 Apr. - RxAndroid
PDF
GKAC 2014 Nov. - RxJava를 활용한 Functional Reactive Programming
PDF
접근성(Accessibility)과 안드로이드
PDF
GKAC 2014 Nov. - 안드로이드 스튜디오로 생산성 올리기
PPTX
Android - Preventing common memory leaks
PPTX
Instalasi Android 7.0 "Nougat"
PDF
Google Firebase로 레고블럭 조립하기 - IO Extended 2016
GKAC 2015 Apr. - Xamarin forms, mvvm and testing
GKAC 2015 Apr. - Android Looper
Java Micro Edition Platform & Android - Seminar on Small and Mobile Devices
Memory Networks, Neural Turing Machines, and Question Answering
Intro to Android : Making your first App!
Async task, threads, pools, and executors oh my!
GKAC 2015 Apr. - Battery, 안드로이드를 위한 쉬운 웹 API 호출
Tensorflow 101
GKAC 2014 Nov. - Android Wear 개발, 할까요 말까요?
GKAC 2015 Apr. - RxAndroid
GKAC 2014 Nov. - RxJava를 활용한 Functional Reactive Programming
접근성(Accessibility)과 안드로이드
GKAC 2014 Nov. - 안드로이드 스튜디오로 생산성 올리기
Android - Preventing common memory leaks
Instalasi Android 7.0 "Nougat"
Google Firebase로 레고블럭 조립하기 - IO Extended 2016
Ad

Similar to Reinfocement learning (7)

PDF
Bayesian networks in AI
PDF
deep-learning-and-what's-next-with-Chinese-annotation
PDF
Big Data LDN 2017: Deep Learning Demystified
PDF
深層学習フレームワーク概要とChainerの事例紹介
PDF
Session-based recommendations with recurrent neural networks
PDF
Using Neural Net Algorithms to Classify Human Activity, with Applications in ...
PDF
Convolutional Neural Network
Bayesian networks in AI
deep-learning-and-what's-next-with-Chinese-annotation
Big Data LDN 2017: Deep Learning Demystified
深層学習フレームワーク概要とChainerの事例紹介
Session-based recommendations with recurrent neural networks
Using Neural Net Algorithms to Classify Human Activity, with Applications in ...
Convolutional Neural Network
Ad

Recently uploaded (20)

PPTX
TEXTILE technology diploma scope and career opportunities
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPTX
2018-HIPAA-Renewal-Training for executives
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PPTX
Configure Apache Mutual Authentication
PPTX
Modernising the Digital Integration Hub
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPT
What is a Computer? Input Devices /output devices
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Five Habits of High-Impact Board Members
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
TEXTILE technology diploma scope and career opportunities
A review of recent deep learning applications in wood surface defect identifi...
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
A contest of sentiment analysis: k-nearest neighbor versus neural network
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
Custom Battery Pack Design Considerations for Performance and Safety
Comparative analysis of machine learning models for fake news detection in so...
Flame analysis and combustion estimation using large language and vision assi...
2018-HIPAA-Renewal-Training for executives
sustainability-14-14877-v2.pddhzftheheeeee
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Configure Apache Mutual Authentication
Modernising the Digital Integration Hub
The influence of sentiment analysis in enhancing early warning system model f...
What is a Computer? Input Devices /output devices
Improvisation in detection of pomegranate leaf disease using transfer learni...
Five Habits of High-Impact Board Members
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide

Reinfocement learning

  • 1. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr Introduction of Reinforcement Learning 1 곽동현 서울대학교 바이오지능 연구실
  • 2. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr Background • 기존의 강화학습(Reinforcement Learning)에서 Q function을 DNN 혹은 CNN으로 근사하여 문제를 해결하는 시도가 최근 Google DeepMind를 필두로 활발히 연구가 되고 있다. • 최근 연구에서는 Atari 2600, 바둑을 인간보다 더 잘 플레이하 는 수준의 경이적인 성과를 보이고 있으며, 나아가 3D 게임이 나 로봇 컨트롤 문제에도 적용되고 있다. 2
  • 3. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr What is AI? ML? 3https://www.linkedin.com/pulse/deep-dive-venture-landscape-ai-ajit-nazre-rahul-garg-nazre
  • 4. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr Various Field with ML 4https://www.linkedin.com/pulse/how-exceed-your-goals-2016-dr-travis-bradberry-1
  • 5. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr Various Algorithm in ML 5
  • 6. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr Function Approximation 6http://arxiv.org/pdf/1411.4555.pdf https://people.mpi-inf.mpg.de/~kkim/supres/supres.htm
  • 7. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr What is Deep Learning? 7
  • 8. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr Machine Learning • Supervised Learning : y = f(x) • Unsupervised Learning : x ~ p(x) , x = f(x) • Reinforcement Learning : ?? 8
  • 9. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr Agent-Environment Interaction • Objective : Maximize the expected sum of future rewards • Algorithms 1) Planning : Dynamic Programming Based 2) Reinforcement Learning : Machine Learning Based 9
  • 10. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr Example of Supervised Learning 10
  • 11. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr Polynomial Curve Fitting 11 Microsoft Excel 2007의 추세선
  • 12. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr Example of Unupervised Learning 12
  • 13. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr Clustering 13 http://www.frankichamaki.com/data-driven-market-segmentation-more-effective-marketing-to- segments-using-ai/
  • 14. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr Example of Reinforcement Learning 14
  • 15. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr Videos • A crawling robot: a Q-learning example https://www.youtube.com/watch?v=2iNrJx6IDEo • Deep Reinforcement Learning for Robotic Manipulation https://youtu.be/ZhsEKTo7V04?t=1m27s 15
  • 16. © 2016. SNU CSE Biointelligence Lab., http://bi.snu.ac.kr THANK YOU 16

Editor's Notes

  • #4: 고전적인 AI 분류에서, 원래 ML은 작은 한 파트였다. 그리고 이 AI를 구현하는 방법 중의 하나가 원래 ML이고 그 안에 Deep이 있다. 아주 작은 일부분 그런데 지금은 흐름이 바뀌어서 AI에서 제시된 분야의 상당 부분이 ML을 통해 연구되고 있다. 따라서 지금의 트랜드는 거의 AI = ML 처럼 되어가고 있지만, 아직도 고전적인 AI 만 연구하는 사람이 많아서 이렇게 말하면 큰일날 수도 있다.
  • #5: 머신러닝은 이렇게 방대한 분야들로부터 탄생한 학문이다. 따라서 처음에 공부하면 굉장히 두서가 없고, 난해하다. 그래서 초반에는 좋은 교재와 세미나를 통한 학습이 필수적이다.
  • #6: 알고리즘마다 경계면을 찾는 방식이 다름
  • #7: 이런식으로 어떤 데이터가 들어왔을 때, ouput을 내는 함수 f를 학습을 통해 찾는다. 명시적인 구현이아니라.