Instructor: | Zizhan Zheng (zzheng3@tulane.edu), Stanley Thomas 307B |
Class Time & Place: | TR 12:25PM-01:35PM, Dinwiddie Hall 103 |
Office Hours: | Wed 10-11AM and by appointment |
Reinforcement learning (RL) has found successful applications in various domains, including recommender systems, health care, energy, finance, robotics, transportation, and computer systems. Many people believe that RL is a step toward Artificial General Intelligence (AGI). This course introduces both the classic results and state-of-the-art research in RL at the graduate level. We will cover both the theoretical foundation of RL and its applications through case studies. Topics to be covered include:
We will meet both in person and online (about 40% of lectures will be online). A detailed class schedule can be found on the course webpage. To compensate for the shortened class time, extra reading and discussion material will be assigned on Canvas.
There will be both written problem assignments and labs (programming assignments). Graduate students will be given extra questions that require advanced algorithmic/analytic techniques. Specific instructions will be given in each assignment. All the assignments will be posted on the course webpage.
The midterm will be closed-book and closed-notes, but you will be allowed to bring a cheat sheet to each exam (one letter page single-sided). A different set of questions will be given to undergraduate and graduate students, respectively.
Students will work in groups on a final project. Each group should include up to two members. The project should center on a well-defined problem related to reinforcement learning and (ideally) your specific research area. You will develop the project through close interactions with the instructor and your peers and write a paper that has all the sections of a typical research paper including some preliminary results.
A couple of milestone presentations will be scheduled during the semester and the final presentation will be in the final exam week (Nov. 30 – Dec. 5). The final paper is due after the final presentation. A tentative schedule for the final project can be found on the course website.
Each student has a total of 6 grace days that may be applied to the homework assignments. No more than 2 grace days may be used on any single assignment. Any assignment submitted more than 2 days past the deadline (or the date the student no longer has late day credit) will get zero credit. No late days are allowed for the final presentation and report.
Faculty and students must comply with University policies on COVID-19 testing and isolation, which are located here[https://tulane.edu/covid-19/health-strategies]. Faculty and students must wear face coverings in all common areas, including classrooms, and follow social distancing rules. Failure to comply is a violation of the Code of Student Conduct and students will be subject to University discipline, which can include suspension or permanent dismissal.
If a student cannot attend class for any reason, the student is responsible for communicating with their instructor to make up any work they may miss. Faculty will provide online options for class participation, outlined in this document, and unless a student is seriously ill, they are expected to use this option. The University Health Center will provide documentation verifying a student is ill, as well as verification that a student may return to class. With the approval of the Newcomb-Tulane College dean, an instructor may have a student who has excessive absences involuntarily withdrawn from a course with a WF grade after written warning at any time during the semester.
The weighted average will determine your letter grade roughly as
follows:
A >= 90%; B >= 80%; C >= 70%; D >=
60%; F < 60%
+/- grades will be given for borderline cases.
All grades will be posted on Canvas.
Acknowledgment: many slides are adapted from Richard Sutton's RL slides, David Silver's RL course, and Berkeley CS 285.
Lecture | Date | Topic | Lecture Topic | Reading | Assignments |
1 | Aug 20 (R) |
Introduction |
Logistics; Intro to RL[pdf] | SB 1.1-1.5 Probability review Linear algebra review |
Forming groups (due Sep 1) |
2 |
Aug 25 (T) |
Markov
decision processes and dynamic programming |
Markov Reward Processes; Episodic and continuing tasks | SB 3.1-3.5, DB 4.1 | |
3 |
Aug 27 (R) |
Finite MDP |
SB 3.1-3.5, CS 2.1-2.2, DB 4.2 | ||
4 |
Sep 1 (T) | Bellman equations | SB 3.6, CS 2.3, DB 4.3 | Homework 1 (due
Sep 10) |
|
5 |
Sep 3 (R) |
Bellman optimality equation [pdf] | SB 3.6, CS 2.4 | ||
6 |
Sep 8 (T) |
Contractions and fixed point theorem; DP for prediction | CS 2.4,
A.1-A.2 SB 4.1-4.2 |
|
|
7 |
Sep 10 (R) |
Value iteration | SB 4.3-4.7 | Homework 2 (due Sep 22) | |
Sep 15 (T) |
Class cancelled | ||||
8 |
Sep 17 (R) |
Policy iteration | SB 4.3-4.7 | ||
9 |
Sep 22 (T) |
Model-free prediction and control |
LP
approach for MDP, POMDP [pdf]; Monte Carlo prediction |
SB 17.3, 5.1-5.2, CS 3.1 | |
10 |
Sep 24 (R) |
Student
presentations: project proposal |
Lab 1 (due Oct 6) | ||
11 |
Sep 29 (T) |
Stochastic approximation, TD(0) | SB 6.1-6.3, CS 3.1 | ||
12 |
Oct 1 (R) |
TD(0) | SB 6.1-6.3, CS 3.1 | |
|
13 |
Oct 6 (T) |
n-step TD, TD(λ) | SB 7.1, 12.1-12.2, CS 3.1 | Homework 3 (due Oct 13) | |
14 |
Oct 8 (R) |
TD(λ) [pdf] Monte Carlo control |
SB 5.3-5.7 | ||
15 |
Oct 11 (U) | Sarsa; Midterm review [pdf] | SB 6.4 | ||
16 |
Oct 13 (T) | Q-learning [pdf] | SB 6.5-6.7 | ||
17 |
Oct 15 (R) |
Midterm: Thursday, Oct 15 | |||
18 |
Oct 20 (T) | Approximation
solution methods |
On-policy prediction | SB 9.1-9.2, CS 3.2 | |
19 |
Oct 22 (R) |
On-policy prediction | SB 9.3-9.4, CS 3.2 | ||
20 |
Oct 27 (T) |
On-policy
control; Off-policy methods |
SB 9.5, 9.8, 11.1-11.3 | ||
Oct 29 (R) |
Class
cancelled |
||||
21 |
Nov 3 (T) |
Batch methods, DQN [pdf] | SB 16.5, 13 Mnih, et al., “Human-level
control through deep reinforcement learning”, Nature, 2015
|
Lab 2 | |
22 |
Nov 5 (R) |
Student
presentations: project update |
|||
23 |
Nov
7 (S) |
Policy gradients | SB 13 | ||
24 |
Nov 10 (T) |
Policy gradients | SB 13 |
||
25 |
Nov 12 (R) |
Mini-lectures | Arie,
Eli, and Sri: Deep Q-Networks Farzad and Tianyi: Multi-armed bandits for wireless network |
||
26 |
Nov 17 (T) |
Planning |
DDPG [pdf], model-based RL | Lillicrap, et al., “Continuous
control with deep reinforcement learning”, ICLR, 2016; SB 8 |
|
27 |
Nov 19 (R) |
Mini-lectures | Ningxiao
and Xiaolin: Robust
Deep Reinforcement Learning against Adversarial Perturbations on
State Observations Henger: Convergence of Q-learning |
||
28 |
Nov 24 (T) |
Dyna, Rollout, Monte Carlo tree search [pdf] | SB 8, 16.6 | ||
Final
presentations: Wednesday, Dec 2, 4:00-6:00pm Final report: Friday, Dec 4, 11:59pm |
Tulane University strives to make all learning experiences as accessible as possible. If you anticipate or experience academic barriers based on your disability, please let me know immediately so that we can privately discuss options. I will never ask for medical documentation from you to support potential accommodation needs. Instead, to establish reasonable accommodations, I may request that you register with the Goldman Center for Student Accessibility. After registration, make arrangements with me as soon as possible to discuss your accommodations so that they may be implemented in a timely fashion. Goldman Center contact information: goldman@tulane.edu; (504) 862-8433; accessibility.tulane.edu.
The Code of Academic Conduct applies to all students, full-time and part-time, in Tulane University. Tulane University expects and requires behavior compatible with its high standards of scholarship. By accepting admission to the university, a student accepts its regulations (i.e., Code of Academic Conduct and Code of Student Conduct) and acknowledges the right of the university to take disciplinary action, including suspension or expulsion, for conduct judged unsatisfactory or disruptive.
Per Tulane’s religious accommodation policy, I will make every reasonable effort to ensure that students are able to observe religious holidays without jeopardizing their ability to fulfill their academic obligations. Excused absences do not relieve the student from the responsibility for any course work required during the period of absence. Students should notify me within the first two weeks of the semester about their intent to observe any holidays that fall on a class day or on the day of the final exam.
Tulane University recognizes the inherent dignity of all individuals and promotes respect for all people. As such, Tulane is committed to providing an environment free of all forms of discrimination including sexual and gender-based discrimination, harassment, and violence like sexual assault, intimate partner violence, and stalking. If you (or someone you know) has experienced or is experiencing these types of behaviors, know that you are not alone. Resources and support are available: you can learn more at allin.tulane.edu. Any and all of your communications on these matters will be treated as either “Confidential” or “Private” as explained in the chart below. Please know that if you choose to confide in me I am mandated by the university to report to the Title IX Coordinator, as Tulane and I want to be sure you are connected with all the support the university can offer. You do not need to respond to outreach from the university if you do not want. You can also make a report yourself, including an anonymous report, through the form at tulane.edu/concerns.
Confidential | Private |
Except in extreme circumstances, involving imminent danger to one’s self or others, nothing will be shared without your explicit permission. | Conversations are kept as confidential as possible, but information is shared with key staff members so the University can offer resources and accommodations and take action if necessary for safety reasons. |
Counseling & Psychological Services (CAPS) | (504) 314-2277 or The Line (24/7) | (504) 264-6074 | Case Management & Victim Support Services | (504) 314-2160 or srss@tulane.edu |
Student Health Center | (504) 865-5255 | Tulane University Police (TUPD) | Uptown - (504) 865-5911. Downtown – (504) 988-5531 |
Sexual Aggression Peer Hotline and Education (SAPHE) | (504) 654-9543 | Title IX Coordinator | (504) 865-5615 or msmith76@tulane.edu |