SlideShare a Scribd company logo
2
Most read
3
Most read
CSCI S-89C Deep Reinforcement Learning
Syllabus Spring 2021
Lectures: Online web conference, Wednesdays, 7:40-9:40 pm
Lectures will be live-streamed with the video being available via the course website within 24 hours.
Instructor: Dr. Dmitry Kurochkin, Senior Research Analyst, Harvard University
E-mail: dkurochkin@fas.harvard.edu
Website: https://canvas.harvard.edu/courses/81664
Office Hours: By request
Teaching Fellows: TBA e-mail: TBA
Prerequisites:
Introductory probability and statistics, multivariate calculus equivalent to MATH E-21a, and profi-
ciency in Python programming equivalent to CSCI E-7.
Note on the prerequisites:
We will be formulating value (cost) functions and performing optimization. Students are expected to be
comfortable taking derivatives. Basic knowledge of probability theory (in particular, conditional proba-
bility distributions and conditional expectations) is necessary. Understanding matrix vector operations
and notation is helpful but not required. All coding exercises are performed in Python. Students are
required to take a short pretest at the beginning of the course. The pretest score will not count toward
the final grade but will help you understand whether your background in calculus, probability theory,
as well as command of coding positions you for success in this course.
Text:
Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction, 2nd ed.
ISBN: 978-0-262-03924-6
Electronic copy of the book is available at the author’s webpage (under “Full Pdf”)
http://incompleteideas.net/book/the-book-2nd.html
Optional reading:
Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016
ISBN: 978-0-262-03561-3
HTML version of the book is available at http://www.deeplearningbook.org
Course Description:
This course introduces Deep Reinforcement Learning (RL), one of the most modern techniques of ma-
chine learning. Deep RL has attracted the attention of many researches and developers in recent years
due to its wide range of applications in a variety of fields such as robotics, robotic surgery, pattern
recognition, diagnosis based on medical image, treatment strategies in clinical decision making, person-
alized medical treatment, drug discovery, speech recognition, computer vision, and natural language
processing. Deep RL is often seen as the third area of machine learning, in addition to supervised and
unsupervised algorithms, in which learning of an agent occurs as a result of its own actions and inter-
action with the environment. Generally, such learning processes do not need to be guided externally,
but it has been difficult until recently to use RL ideas practically. This course primarily focuses on
problems that emerge in healthcare and life science applications.
Tentative List of Topics:
I. Reinforcement Learning (RL)
◦ Markov Decision Processes (MDP): Value Functions and Policies
1
◦ Dynamic Programming (DP): Bellman Equation
◦ Monte Carlo (MC) Methods
◦ Temporal-difference (TD) Prediction and Control: SARSA and Q-learning
◦ n-step TD
◦ Approximation Methods: Stochastic-gradient, Semi-gradient TD Update, Least-squares TD
II. Deep Learning
◦ Neural Networks (NN): Classification & Regression
◦ Training NNs: Backpropagation
◦ Tuning NNs: Regularization
◦ Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN)
III. Deep RL
◦ Value-based Deep RL: Q-network
◦ Policy-based Deep RL: REINFORCE
◦ Asynchronous Methods for Deep RL: Advantage Actor-Critic (A2C)
◦ Model-based Deep RL
Homework:
Except when especially noted, homework assignments will be due each Sunday. The assignments will
be posted on Canvas website and will consist of series of programming exercises (solutions should be
implemented in Python) as well as analytical problems (knowledge of calculus and probability theory
should suffice) that help students enhance their understanding of the underlying theory. Solutions to
the programming exercises should be submitted via Canvas in a form of a single .ipynb (Jupyter Note-
book) file. The solutions to the theoretical problems should be submitted in a form of a single PDF
file.
Note on the deadline and penalty:
Solutions to the assignments submitted later than 1, 2, 3, 4, and 5 days after the due date will be
penalized by 10%, 20%, 30%, 40%, and 100%, respectively. In case you need an extension, please
coordinate with the instructor prior to the due day.
Quizzes:
An online quiz will be due before each class, unless announced otherwise. The quiz will consist of ap-
proximately 5 basic questions on understanding of studied principals. No late quizzes will be allowed.
Midterm Exam:
The midterm exam will be released on March 10 (no lecture on March 10) and due March 17 at 7:40
pm (Eastern Time). The test will be similar to Homework exercises but cover topics studied up to this
date. Late midterm will not be accepted.
Final:
The final examination will be due at 11:59 pm (Eastern Time) on May 12 (no lecture on May 12). The
exam will be cumulative covering all topics studied. Late final will not be accepted.
Attendance:
Regular attendance (whether on campus or online) is expected but will not be taken. Recorded lectures
will be available via the course website within 24 hours after the lecture.
2
Participation:
Although no credit is allocated for participation, everyone is encouraged to constructively participate
in class by asking relevant questions. It is important that you check the e-mail registered with Canvas
regularly and monitor course announcements and also participate in discussions on Piazza, the fo-
rum available at https://piazza.com/class/kh5mr9vj75c2ah. All technical and data science related
questions will be discussed on Piazza.
Grading:
The semester average is calculated using the formula:
Grade = 0.25 · Homework + 0.20 · Quizzes + 0.25 · Midterm + 0.30 · Final
Student Learning Objectives:
◦ proficiency in building optimal NNs using Python
◦ understanding of RL including MDP, Bellman equation, and optimal policy
◦ firm understanding of Deep RL and getting comfortable with approximation methods used in
conjunction with RL
◦ hands-on experience on estimating the optimal policy and value functions
Academic Integrity:
You are responsible for understanding Harvard Extension School policies on academic integrity (www.
extension.harvard.edu/resources-policies/student-conduct/academic-integrity) and how to
use sources responsibly. Not knowing the rules, misunderstanding the rules, running out of time, sub-
mitting the wrong draft, or being overwhelmed with multiple demands are not acceptable excuses.
There are no excuses for failure to uphold academic integrity. To support your learning about academic
citation rules, please visit the Harvard Extension School Tips to Avoid Plagiarism (www.extension.
harvard.edu/resources-policies/resources/tips-avoid-plagiarism), where you’ll find links to
the Harvard Guide to Using Sources and two free online 15-minute tutorials to test your knowledge of
academic citation policy. The tutorials are anonymous open-learning tools.
Disability Accommodations:
The Extension School is committed to providing an accessible academic community. The Accessibil-
ity Office offers a variety of accommodations and services to students with documented disabilities.
More information can be found at www.extension.harvard.edu/resources-policies/resources/
accessibility-student-services
Dates of Interest:
◦ Harvard Extension School classes begin, January 25, 2021
◦ Pretest is due, January 29
◦ Last day to change the credit status, January 31
◦ Course drop deadline for full-tuition refund, January 31
◦ Quiz 1 is due, February 3
◦ Assignment 1 is due, February 7
◦ Course drop deadline for half-tuition refund, February 7
◦ Midterm Exam is due, March 17, 7:40 pm (Eastern Time)
◦ Withdrawal deadline, April 23
◦ Final Exam is due, May 12, 11:59 pm (Eastern Time)
3

More Related Content

PDF
Kovanović et al. 2017 - developing a mooc experimentation platform: insight...
PPTX
ENG 411B - Day 1 - Policies and Procedures
PPTX
Kowledge zoom michelle
PDF
The use of games on the teaching of programming: a systematic review
PPT
Sevaq_BMerison_EMorin_belfast_2007
PPTX
Instructional plan: CUR 516
PPTX
Broadening the scope of a Maths module for student Technology teachers
PPTX
Assessing quality in an Open Virtual Mobility MOOC, #edlw2019
Kovanović et al. 2017 - developing a mooc experimentation platform: insight...
ENG 411B - Day 1 - Policies and Procedures
Kowledge zoom michelle
The use of games on the teaching of programming: a systematic review
Sevaq_BMerison_EMorin_belfast_2007
Instructional plan: CUR 516
Broadening the scope of a Maths module for student Technology teachers
Assessing quality in an Open Virtual Mobility MOOC, #edlw2019

What's hot (20)

PPTX
Speaker 11 jim o'dwyer
PPT
Strijker, A. (2005 12 06). Piloting Sakai In A Master Course Does It Really...
KEY
Edu614 Session 1 Summer 2012
PDF
Bridge TEFL IDELT Official Transcript of Deepak (Danny) Singh
DOC
Melbourne t1 2016-assignment_2_mn504
PDF
Involving students in Semester of Code: experiences and issues from the first...
PPTX
Course-Adaptive Content Recommender for Course Authoring
PPTX
Open Education: the MOOC Experience
PPT
Transformation of the Ol Instructor
PPTX
TLC2016 - Peer Review, Peer Assessment, and Peer Feedback methods based on Bl...
PPT
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
PDF
Teaching FEM software in formal and non-formal environment with MOOCs
PDF
MCO 436 syllabus
PDF
Csci e46-syllabus-spring19-v1-2
PPT
Ls 12orientation
PPTX
Training Session on Using Nvivo and SPSS
PPT
Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...
PDF
WP2 Course Modernisation
PDF
«Innovations in pedagogy using MOOCs» / Ting-Chuen Pong, professor of Comput...
PPTX
Web-based Virtual Reality development in classroom: From learner's perspectives
Speaker 11 jim o'dwyer
Strijker, A. (2005 12 06). Piloting Sakai In A Master Course Does It Really...
Edu614 Session 1 Summer 2012
Bridge TEFL IDELT Official Transcript of Deepak (Danny) Singh
Melbourne t1 2016-assignment_2_mn504
Involving students in Semester of Code: experiences and issues from the first...
Course-Adaptive Content Recommender for Course Authoring
Open Education: the MOOC Experience
Transformation of the Ol Instructor
TLC2016 - Peer Review, Peer Assessment, and Peer Feedback methods based on Bl...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
Teaching FEM software in formal and non-formal environment with MOOCs
MCO 436 syllabus
Csci e46-syllabus-spring19-v1-2
Ls 12orientation
Training Session on Using Nvivo and SPSS
Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...
WP2 Course Modernisation
«Innovations in pedagogy using MOOCs» / Ting-Chuen Pong, professor of Comput...
Web-based Virtual Reality development in classroom: From learner's perspectives
Ad

Similar to Deep reinforcement learning (20)

PDF
ECI519_Syllabus_Spring_2016-6
DOC
Assignments .30%
DOCX
OutlineWhat will your programinitiativecourse do What are .docx
PDF
Robotics Syllabus 2016 2017
PDF
Hybrid Statistics Course Development
PDF
Lab Manual_Machine Ldddddddddddddddddearning.pdf
PDF
Syllabus
PDF
8th sem (1)
PDF
2009-06-15 Marist Summer Series
PDF
Cwmd 2601 2020
PDF
Scripting for Design
DOCX
Discrete-Mathematics syllabus sample.docx
PPTX
Introduction to EMA highlights
PDF
Chm1083dfghj
PPTX
lect1_syllabusgsbdnmsmsmjdhsjakam4521.pptx
PPTX
Res1 Methods of Research Outline
DOCX
MIS213 Syllabus [Draft]
DOCX
Computational thinking
PPT
Ngs Hsm 700bl Module 1 01272009
DOCX
1 Saint Leo University GBA 334 Applied Decision.docx
ECI519_Syllabus_Spring_2016-6
Assignments .30%
OutlineWhat will your programinitiativecourse do What are .docx
Robotics Syllabus 2016 2017
Hybrid Statistics Course Development
Lab Manual_Machine Ldddddddddddddddddearning.pdf
Syllabus
8th sem (1)
2009-06-15 Marist Summer Series
Cwmd 2601 2020
Scripting for Design
Discrete-Mathematics syllabus sample.docx
Introduction to EMA highlights
Chm1083dfghj
lect1_syllabusgsbdnmsmsmjdhsjakam4521.pptx
Res1 Methods of Research Outline
MIS213 Syllabus [Draft]
Computational thinking
Ngs Hsm 700bl Module 1 01272009
1 Saint Leo University GBA 334 Applied Decision.docx
Ad

Recently uploaded (20)

DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
web development for engineering and engineering
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
composite construction of structures.pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Current and future trends in Computer Vision.pptx
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Artificial Intelligence
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
R24 SURVEYING LAB MANUAL for civil enggi
web development for engineering and engineering
III.4.1.2_The_Space_Environment.p pdffdf
composite construction of structures.pdf
bas. eng. economics group 4 presentation 1.pptx
Current and future trends in Computer Vision.pptx
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Fundamentals of safety and accident prevention -final (1).pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Embodied AI: Ushering in the Next Era of Intelligent Systems
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Model Code of Practice - Construction Work - 21102022 .pdf
Artificial Intelligence
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf

Deep reinforcement learning

  • 1. CSCI S-89C Deep Reinforcement Learning Syllabus Spring 2021 Lectures: Online web conference, Wednesdays, 7:40-9:40 pm Lectures will be live-streamed with the video being available via the course website within 24 hours. Instructor: Dr. Dmitry Kurochkin, Senior Research Analyst, Harvard University E-mail: dkurochkin@fas.harvard.edu Website: https://canvas.harvard.edu/courses/81664 Office Hours: By request Teaching Fellows: TBA e-mail: TBA Prerequisites: Introductory probability and statistics, multivariate calculus equivalent to MATH E-21a, and profi- ciency in Python programming equivalent to CSCI E-7. Note on the prerequisites: We will be formulating value (cost) functions and performing optimization. Students are expected to be comfortable taking derivatives. Basic knowledge of probability theory (in particular, conditional proba- bility distributions and conditional expectations) is necessary. Understanding matrix vector operations and notation is helpful but not required. All coding exercises are performed in Python. Students are required to take a short pretest at the beginning of the course. The pretest score will not count toward the final grade but will help you understand whether your background in calculus, probability theory, as well as command of coding positions you for success in this course. Text: Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction, 2nd ed. ISBN: 978-0-262-03924-6 Electronic copy of the book is available at the author’s webpage (under “Full Pdf”) http://incompleteideas.net/book/the-book-2nd.html Optional reading: Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016 ISBN: 978-0-262-03561-3 HTML version of the book is available at http://www.deeplearningbook.org Course Description: This course introduces Deep Reinforcement Learning (RL), one of the most modern techniques of ma- chine learning. Deep RL has attracted the attention of many researches and developers in recent years due to its wide range of applications in a variety of fields such as robotics, robotic surgery, pattern recognition, diagnosis based on medical image, treatment strategies in clinical decision making, person- alized medical treatment, drug discovery, speech recognition, computer vision, and natural language processing. Deep RL is often seen as the third area of machine learning, in addition to supervised and unsupervised algorithms, in which learning of an agent occurs as a result of its own actions and inter- action with the environment. Generally, such learning processes do not need to be guided externally, but it has been difficult until recently to use RL ideas practically. This course primarily focuses on problems that emerge in healthcare and life science applications. Tentative List of Topics: I. Reinforcement Learning (RL) ◦ Markov Decision Processes (MDP): Value Functions and Policies 1
  • 2. ◦ Dynamic Programming (DP): Bellman Equation ◦ Monte Carlo (MC) Methods ◦ Temporal-difference (TD) Prediction and Control: SARSA and Q-learning ◦ n-step TD ◦ Approximation Methods: Stochastic-gradient, Semi-gradient TD Update, Least-squares TD II. Deep Learning ◦ Neural Networks (NN): Classification & Regression ◦ Training NNs: Backpropagation ◦ Tuning NNs: Regularization ◦ Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) III. Deep RL ◦ Value-based Deep RL: Q-network ◦ Policy-based Deep RL: REINFORCE ◦ Asynchronous Methods for Deep RL: Advantage Actor-Critic (A2C) ◦ Model-based Deep RL Homework: Except when especially noted, homework assignments will be due each Sunday. The assignments will be posted on Canvas website and will consist of series of programming exercises (solutions should be implemented in Python) as well as analytical problems (knowledge of calculus and probability theory should suffice) that help students enhance their understanding of the underlying theory. Solutions to the programming exercises should be submitted via Canvas in a form of a single .ipynb (Jupyter Note- book) file. The solutions to the theoretical problems should be submitted in a form of a single PDF file. Note on the deadline and penalty: Solutions to the assignments submitted later than 1, 2, 3, 4, and 5 days after the due date will be penalized by 10%, 20%, 30%, 40%, and 100%, respectively. In case you need an extension, please coordinate with the instructor prior to the due day. Quizzes: An online quiz will be due before each class, unless announced otherwise. The quiz will consist of ap- proximately 5 basic questions on understanding of studied principals. No late quizzes will be allowed. Midterm Exam: The midterm exam will be released on March 10 (no lecture on March 10) and due March 17 at 7:40 pm (Eastern Time). The test will be similar to Homework exercises but cover topics studied up to this date. Late midterm will not be accepted. Final: The final examination will be due at 11:59 pm (Eastern Time) on May 12 (no lecture on May 12). The exam will be cumulative covering all topics studied. Late final will not be accepted. Attendance: Regular attendance (whether on campus or online) is expected but will not be taken. Recorded lectures will be available via the course website within 24 hours after the lecture. 2
  • 3. Participation: Although no credit is allocated for participation, everyone is encouraged to constructively participate in class by asking relevant questions. It is important that you check the e-mail registered with Canvas regularly and monitor course announcements and also participate in discussions on Piazza, the fo- rum available at https://piazza.com/class/kh5mr9vj75c2ah. All technical and data science related questions will be discussed on Piazza. Grading: The semester average is calculated using the formula: Grade = 0.25 · Homework + 0.20 · Quizzes + 0.25 · Midterm + 0.30 · Final Student Learning Objectives: ◦ proficiency in building optimal NNs using Python ◦ understanding of RL including MDP, Bellman equation, and optimal policy ◦ firm understanding of Deep RL and getting comfortable with approximation methods used in conjunction with RL ◦ hands-on experience on estimating the optimal policy and value functions Academic Integrity: You are responsible for understanding Harvard Extension School policies on academic integrity (www. extension.harvard.edu/resources-policies/student-conduct/academic-integrity) and how to use sources responsibly. Not knowing the rules, misunderstanding the rules, running out of time, sub- mitting the wrong draft, or being overwhelmed with multiple demands are not acceptable excuses. There are no excuses for failure to uphold academic integrity. To support your learning about academic citation rules, please visit the Harvard Extension School Tips to Avoid Plagiarism (www.extension. harvard.edu/resources-policies/resources/tips-avoid-plagiarism), where you’ll find links to the Harvard Guide to Using Sources and two free online 15-minute tutorials to test your knowledge of academic citation policy. The tutorials are anonymous open-learning tools. Disability Accommodations: The Extension School is committed to providing an accessible academic community. The Accessibil- ity Office offers a variety of accommodations and services to students with documented disabilities. More information can be found at www.extension.harvard.edu/resources-policies/resources/ accessibility-student-services Dates of Interest: ◦ Harvard Extension School classes begin, January 25, 2021 ◦ Pretest is due, January 29 ◦ Last day to change the credit status, January 31 ◦ Course drop deadline for full-tuition refund, January 31 ◦ Quiz 1 is due, February 3 ◦ Assignment 1 is due, February 7 ◦ Course drop deadline for half-tuition refund, February 7 ◦ Midterm Exam is due, March 17, 7:40 pm (Eastern Time) ◦ Withdrawal deadline, April 23 ◦ Final Exam is due, May 12, 11:59 pm (Eastern Time) 3