Bandits, Experts & Games

Instructor: Alex Slivkins, Senior Researcher, Microsoft Research NYC.
Schedule: Mondays 2:30pm - 5:30pm
Location: A.V. Williams Building (AVW) 3258
Computer Science department, University of Maryland at College Park
Office Hours: Mondays 11am-2pm (by appointment), AVW 3171.
Q&A: We will use Piazza: https://piazza.com/umd/fall2016/cmsc858g/.
Contact: slivkins@cs.umd.edu (but please use Piazza if appropriate).

Course description

This course is on mathematical foundations of sequential decision-making under uncertainty, a subject on the intersection of algorithms, machine learning, and operations research. Such decision-making problems arise in many areas ranging from the web (web search, online advertising, recommendation systems, content optimization) to revenue and inventory optimization to human computation systems to computer networking. The course will cover fundamental results, as well as more recent research. The emphasis will be on problems that feature a tension between acquisition and usage of data, a.k.a. exploration-exploitation tradeoff. We will also touch upon economic aspects of sequential decision-making, making connections to game theory and mechanism design.

Prerequesites

Competence with algorithm design and mathematical proofs at the level of CMSC 451. Deeper familiarity with probability theory or statistics, as well as some exposure to game theory and mechanism design, would help but are not required. Familiarity with programming languages is not required.

Tentative topics

Fundamental results on "learning with expert advice":
algorithms: weighted majority, follow-the-leader, multiplicative weights
lower bounds
Fundamental results on "multi-armed bandits":
    IID rewards: UCB algorithm, active-arms elimination
    IID rewards: Lower bounds
    adversarial rewards: EXP3 algorithm
    Bayesian and IID rewards: Thompson Sampling
    MDP version: Gittins algorithm
Problems with a large action space
Lipschitz rewards
linear and convex rewards
Recent advances
    global constraints: "bandits with knapsacks" and dynamic pricing
    contextual bandits and applications thereof
    exploration and incentives: how to incentivize self-interested agents to explore?
    convergence of no-regret learning dynamics

The level of detail and the generality of exposition will vary from one topic to another, depending on what is needed to transmit the key ideas and on how much fits into a lecture.

Resources

We will not follow a particular textbook. As the course progresses, reading and lecture notes will be provided. Some resources on course-related topics are listed below:

A survey on (non-Bayesian) bandits by Sebastien Bubeck and Nicolo Cesa-Bianchi (2012).
Prediction, Learning, and Games: a book by Nicolo Cesa-Bianchi and Gabor Lugosi (2006).
Learning and Optimization for Sequential Decision Making: a course by Shipra Agrawal at Columbia University (Spring'16).
Learning, Games and Electronic Markets: a course by Bobby Kleinberg at Cornell University (Spring'07).
(Brief) lectures on bandit theory by Sebastien Bubeck: part 1 and part 2 (Spring'16).
A survey on concentration inequalities by Fan Chung and Linyuan Lu (2010)
Another survey on concentration inequalities by Colin McDiarmid (1998).

Coursework and assessment

There will be 2-3 homeworks and a project. Also, everyone will be expected to scribe one lecture (in pairs).The project can involve a mix of reading, coding, and/or original research on a course-related topic of your choice (more details later).

The grade will be based on: homeworks (50%), project (40%, plus up to 10% bonus for coding and/or original research), and scribing (10%, plus 5% bonus if scribing another lecture).

Dec 18: HW2 is graded! You can pick it up if you come by on Monday (11am-5pm), or from an envelope that I will post on my office door. Post-grading discussion for HW2 is out (but please feel free to email me after reading it if you have questions). I graded HW2 out of 21 points max (without the bonus problem); but both HWs will be worth the same in the final grade.

Dec 1: Some lecture notes out: the second part of Lecture 9 (FPL), Lecture 11 (Bandits with Knapsacks), and my hand-written notes for the last two lectures ("bandits and incentives" and "bandits and games").

Nov 21: announcements from class

HW1 grading. There are 10 items, incl. one "bonus" (#2b). Each item is graded out of 3 points. (I used (-, -+, +-, +) instead of (0,1,2,3).) The overall grade is just the sum. I've marked the per-problem grades the homeworks that I've handed back, and I'll be happy to email you the total grade if you ask.
HW2 questions. I strongly encourage you to take a look at HW2 early and ask on Piazza if you see issues or if you have questions.
Lecture notes for some of the previous lectures are missing. I've received most of them from the scribes, and I plan to work on them next week.
Project reports. Guidelines are online. In short, a project report should be like a small research paper; please email me if you have questions. Aim to have a draft by Dec 5.
Project presentation. Each project will need to make a short presentation in the last class (Dec 12). Aim for 10-15 mins (but let me know if you'd like to take longer). Whiteboard or slides or none -- up to you. For projects with multiple people, it is up to you to decide who presents; multiple presenters are OK, too. Please prepare in advance -- this is your chance to tell the class about the exciting stuff you've done and/or learned about!
End-game. The remainder of the class will work as follows:
    Nov 28: last lecture, bandits & games.
    Dec 5: no class, meetings all day with each projects (sign-up coming)
    Dec 12: project presentations (sign-up coming).
Please let me know in advance, by email, if no-one from your project can attend on Dec 5 or Dec 12.

Nov 14: Homework 2 out.

Nov 11: Lecture notes for adversarial bandits (Oct 24 class) and for combinatorial bandits (Oct 31 class, part I) are out.

Oct 27: Lecture notes for best-expert problem (Oct 17 class) are out.

Oct 25: The sign-up sheet is up here. Please Pick a 20-min slot between 2:20pm and 5pm.

Oct 23: I am giving a theory seminar talk on Friday, Oct 28 (1-2pm), on incentitvizing exploration -- a topic closely related to this course (and a subject of one of the later lectures). So, you all are very welcome to attend! That said: there is absolutely no pressure to attend.

Oct 22: Lecture notes for Lipschitz bandits (Oct 10 class) are out. Compared to the presentation in class, I filled in two minor gaps in the analysis of the lower bound and the zooming algorithm, respectively.

Oct 21: Re projects

I’ll have an “office hour” right before Monday’s class: just show up at 2pm to the same room, and let’s talk about projects. But no pressure. It would help if you give it a little thought beforehand.
For people who already have some idea of what they want, and are looking for a partner: think if you’d like to advertise it to the class (say, a 2-3 mins spiel).
I would like to have a short face-to-face conversation next week with everyone who is planning to do a project (unless you’ve talked with me already). I will have a long “office hour” on Friday (Oct 28; tent. 2:30-5:30pm), and there will be a sign-up sheet. Or talk to me on Monday.

Oct 17: Due date for HW1 extended till Oct 24 (Monday).

Oct 10: Lecture notes from Oct 3 class are online (regret bounds for Thompson Sampling).

Oct 10: For those of you who missed today's class (either because of FOCS, or because of Columbus Day, or for other reasons). Please read the lecture notes when they come out. If there is interest, I will be happy to hold an office hour for a recap; please let me know. Next lectures will not depend on today's material, but one of the problems on HW2 might.

Oct 10: some edits in HW1.

Oct 9: Tentative timeline for the rest of the course:
    Oct 17 (Mon): HW1 due
    Oct 31 (Mon): HW2 out, Project proposals due
    Nov 14 (Mon): HW2 due
    Dec 5 (Mon): soft deadline for project reports
    Dec 5, 12: project presentations.

Oct 9: Project guidelines are out, incl. the deadlines and a sign-up sheet.

Oct 7: Discussion notes for Oct 3 class are out.

Oct 5: Lecture notes for Sep 26 class are out. You are encouraged to take a look, to check some of the details that may have come out unclearly in class. (Same applies to other lectures, too.)

Oct 4: Since Since Coursemail emails seem go directly to your spam folders, I will use Piazza for announcements instead.

Sep 29: Homework 1 is out, due Oct 17 (Mon).

Sep 27: Lecture notes for Sep 19 class are out. Take a look for a slightly more careful exposition of the technical details and the connections between various pieces of the big proof.

Sep 19: We need two brave volunteers to scribe today's class. True, bandit lower bounds are subtle, but as today's class will show, one should not be intimidated!

Sep 19: Lecture notes for Sep 12 class are out. The lecture notes reflect a few clarifications which I will also mention in today's class.

Sep 19: Homework 1 is delayed for a few days (and the due date will be moved accordingly).

Sep 12: Homework 1 will be out next Monday (Sep 19), due in two weeks (Oct 3).

Sep 12: Posted a minor revision to the "concentration inequalities" slides -- stated Hoeffding Inequality in a slightly more general way which we will use in class.

Aug 30: Slides for the first class are up.

Aug 30: Instructions for scribes posted on the "lectures" tab.

Aug 29: Please sign up for Piazza: https://piazza.com/umd/fall2016/cmsc858g/.

Instructions for scribes:

Sign-up: Please sign up for scribes here. Everyone registered should sign up for at least one lecture. I will need volunteers to cover the remaining lectures (each volunteer scribes one more lecture, for 5% bonus to the course grade).
LaTeX: Use LaTeX with this template (please see README). You are encouraged to take a look at the macro in "my-setup.sty".
Submission: email me (slivkins@cs.umd.edu) the source files and the PDF, with subj smth like "CMSC 858G: lecture notes for lecture 2".
Process: Each scribe is responsible for his/her half of the class. However, you are encouraged to comment on each other's drafts (and revise your draft, if needed) before sending it to me. Usually I'd also make a pass on the lecture notes before posting it.
Timeline: Please have the scribes ready by Friday same week (ideally), or by Monday morning next week. Then I'd have time to revise and post the lecture notes by Tuesday night. But do email me if you need more time, that's OK, too.

Lecture #1 (Aug 29): Class organization and intro to the problem space (slides).
Recap of probability and concentration inequalities (slides -- feel free to use as a reference).

Lecture #2 (Sep 12): Bandits with IID rewards
Lecture notes: part I and part II.
References:

[Algorithm UCB1]
- Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning (2002).
- Chapter 2.2 of this survey (analyzes a slightly more general algorithm).
[Successive Elimination] Eyal Even-Dar, Shie Mannor and Yishay Mansour. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems. Journal of Machine Learning Research (2006).

Lecture #3 (Sep 19): Lower bounds on regret
Lecture notes: both parts of the class.
References:

[Sqrt{KT} lower bound]
- Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The non-stochastic multi-armed bandit problem. SIAM Journal on Computing (2002).
- Chapter 3.3 of this survey.
- Lecture 9 in this class.
[log(T) per-instance lower bound ]
- T. L. Lai and Herbert Robbins Asymptotically efficient Adaptive Allocation Rules, , Advances in Applied Mathematics, 1985.
- Chapter 2.3 of this survey.

Lecture #4 (Sep 26): Lower bounds (ending); Bayesian Bandits and Thompson Sampling
Lecture notes: entire class.

References for Thompson Sampling:

[The original paper from 1933] William R. Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, 1933.
[Bayesian regret analysis from the class] Daniel Russo and Benjamin Van Roy, Learning to Optimize via Posterior Sampling, Math. Oper. Res., 2014.
[Non-Bayesian regret analysis] Shipra Agrawal and Navin Goyal, Further Optimal Regret Bounds for Thompson Sampling, AISTATS 2013.

Lecture #5 Thompson Sampling (ending); Discussion
Lecture notes for Thompson Sampling: regret bounds.
Discussion notes: where we are, what's next, practical aspects, simulations & deployments.

Lecture #6 (Oct 10) Lipschitz bandits
Lecture notes
Slide deck for Lipschitz bandits & the zooming algorithm (extracted from a longer talk).

[Main reference] Robert Kleinberg, Aleksandrs Slivkins, Eli Upfal Bandits and Experts in Metric Spaces (working paper, 2015; subsumes papers in STOC'08 and SODA'10).
- pp. 18-23: UniformMesh and Zooming algorithm.
- page 6 and Section 3.3: covering dimension and zooming dimension.
- Section 2.1: discussion of other papers in this line of work.

[Extension to contextual bandits] Aleksandrs Slivkins, Contextual Bandits with Similarity Information (COLT'11, JMLR'14).

Lecture #7 (Oct 17) The best-expert problem (full feedback & adversarial rewards)

Lecture notes: part I and part II.

[References]

This material is presented in various courses and books on online learning, e.g. this book and that book. I followed a lecture plan from this class, but presented the analysis of Hedge a little differently (so as to make it immediately applicable to the analysis of EXP3/EXP4 in the next lecture).
The lower bound can be found in Theorem 3.7 of Prediction, Learning & Games, and (for a stronger, non-asymptotic version) in Francesco Orabona and David Pal, Optimal Non-Asymptotic Lower Bound on the Minimax Regret of Learning with Expert Advice (working paper, 2015).

Lecture #8 (Oct 24) Adversarial bandits

Lecture notes: both parts.

[References]

The main reference is the original "EXP3 paper": Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire.The non-stochastic multi-armed bandit problem. SIAM Journal on Computing, 2002. It contains EXP3 and many extensions thereof, incl. EXP4.
This material is presented in various books and courses on online learning, e.g. here and there. Among these sources, my presentation was most influenced by one in this class, but I made the "reduction to Hedge" even more explicit.

Lecture #9 (Oct 31): (Semi-)bandits & experts with linear costs

Lecture notes: part I and part II.

[References]

[follow the perturbed leader] This material is found in various books and surveys, often in more generality (for convex rewards). I was mainly using the notes from this class, simplifying to uniform noise.
[background on linear & combinatorial bandits] see Section 5 in this survey.

Lecture #10 (Nov 7): contextual bandits

Lecture notes (for "practical issues", see "ML methodology" section of this paper).

[References]

[CB with policy sets]
- [reduction from CB to supervised ML] John Langford and Tong Zhang, The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits, NIPS'07.
- [optimal regret and small #oracle calls] Alekh Agarwal and Daniel Hsu and Satyen Kale and John Langford and Lihong Li and Robert Schapire, "Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits", ICML'14
- [adversarial rewards] Vasilis Syrgkanis and Akshay Krishnamurthy and Robert E. Schapire, "Efficient Algorithms for Adversarial Contextual Learning", ICML'16.
- [practical system] "ML methodology" section of this paper.
[CB with linear rewards: algorithm LinUCB]
- Lihong Li and Wei Chu and John Langford and Robert E. Schapire,
  "A contextual-bandit approach to personalized news article recommendation", WWW'10
- Wei Chu and Lihong Li and Lev Reyzin and Robert E. Schapire,
  "Contextual Bandits with Linear Payoff Functions", AISTATS'11.
[CB with Lipschitz rewards]
- [uniform discretization] Elad Hazan and Nimrod Megiddo, "Online Learning with Prior Information", COLT'07.
- [adaptive discretization & application to bounded change over time] Alex Slivkins, "Contextual bandits with similarity information", JMLR'14 and COLT'11.

Lecture #11 (Nov 14): "Bandits with Knapsacks"

Lecture notes (the first half of the class was a discussion of HW1, which is not included).
NB: the discussion of the "fake costs" for BwK is a bit clearer (but less detailed) compared to what was presented in class.

[References] The main reference is Ashwinkumar Badanidiyuru, Robert Kleinberg and Aleksandrs Slivkins, Bandits with Knapsacks (FOCS'13, to appear in JACM). This paper also contains an exstensive discussion of motivations and related work, incl. the follow-up work. The UCB-based algorithm is from Shipra Agrawal and Nikhil Devanur, Bandits with concave rewards and convex knapsacks (EC'14).

Lecture #12 (Nov 21): Bandits and incentives

Lecture notes: my hand-written notes [some pages are upside down]
[to be replaced by LaTeX notes eventually]

[References] On incentive-commpatible bandit algorithms for recommendation systems, the main reference is: Yishay Mansour, Aleksandrs Slivkins and Vasilis Syrgkanis, Bayesian Incentive-Compatible Bandit Exploration (EC'16). Bandit mechanisms for pay-per-click ad auctions are from this paper (negative result) and that paper (positive result).

Lecture #13 (Nov 28): Bandits and games

Lecture notes: my hand-written notes [to be replaced by LaTeX lecture notes eventually]

[References] For ALG vs ALG', see lecture notes from this class. For ALG vs best-response, see Yoav Freund and Robert E. Schapire, Adaptive game playing using multiplicative weights,
Games and Economic Behavior (1999). Both sources prove the minimax theorem via no-regret algorithms. For a (very simple) coarse correlated equilibrium argument, see, e.g., Chapter 17.4.2 from this recent textbook by Tim Roughrarden.

Jan 2017: removed homeworks from open access (for now). Please email me at slivkins@microsoft.com to request.

Submission: Homeworks can be submitted either by email (if you are using LaTeX), or in class, preferably both. For email: please submit PDF, to slivkins@cs.umd.edu, with subject like "homework 1 submission". LaTeX preferred; handwritten OK, too, but please write legibly.

Homework 1: out Sep 29 (Thu), due Oct 17 (Mon) New due date: Oct 24 (Mon).
Oct 10: posted a clarification in the hint for problem 5.
Oct 10: in problem 2(a), changed sqrt{t} to sqrt{t/K}.
Oct 17: posted a number of clarifications.
Nov 14: Homework 1 with discussion.

Homework 2: out Nov 14 (Mon), due Nov 28 (Mon) [encouraged], Nov 30 (Wed) [definite]
Nov 22: fixed some minor issues pointed via Piazza, please re-print the HW if needed.
Dec 18: Homework 2 with discussion.

Sign-up: Please use this sign-up sheet (tab: "sign-up for projects") to help me keep track of the projects. Please update it as appropriate, including your names/emails, brief project description (or the current/tentative version thereof), and project status. Also, use this sheet to to find partners and to be aware of each other's topics and progress.

Deliverables & due dates

Discussion with me regarding the possible project topics, in person and/or via email.
See below for advice on choosing a topic and topic suggestions.
Project proposal: a short document explaining the topic and the proposed approach (due Oct 31; but the sooner the better).
Final report: a short (3-7 pp long) academic-style paper describing the background and what you've done for the project. (Soft deadline: Dec 5).
Final presentation: A short (~15 mins) presentation during the last classes (Dec 12).

You are encouraged to start the discussion early (and several of you have talked with me already).

Choosing a topic. The project topic can be anything directly related to bandits and/or experts. (If you are unsure whether a given topic is sufficiently related, please email/talk to me.) Generally, there are three types of projects:

Reading (literature review): read up on a given sub-topic. What is are the problem formulations and how they are motivated? What are the main results and the crucual techniques to obtain these results? What are the open questions? What are the papers in this line of work, and who did what?
Experiments: pick a sub-topic and several algorithms for this sub-topic, run experiments with these algorithms to see which ones are better (for which "regimes"). You can use simulated data and/or some of the real datasets.
Original research: pick an open problem and try to make a progress on it. Could be theoretical or experimental (or both), up to you. Such a project is likely to involve some amount of reading, to figure out what is known and what is likely to help. It is great if by the end of the class you make a substantial progress towards a future publication, but it is totally OK to only have preliminary results, or no results at all. (After all, good research is rarely done in a few weeks.)

These three types are not mutually exclusive: an original research may involve much reading; and reading- and experimets- type projects may be driven by a specific research question, and/or may help towards research projects in the future.

You are very encouraged to work in pairs: this is more productive, more fun and also a useful collaborative experience (and less projects for me). Triples are fine, too.

If you are working on a course-related research project with some collaborators who are not attending this course, you are encouraged to submit an account of this research project as the "class project". But please ask me first.

Project proposal. For a reading project, mention a few papers to start with. For simulations, mention a few algorithms, a few types of problem instances you are planning to consider (if doing simulations), and the data sources you are planning to use (if working with real data). For an open research project, describe the open problem and how you plan/hope to go about solving it. While a project may evolve and morph compared to the proposal, a good proposal usually helps to organize your thinking.

NB: It is OK if an original research project evolves into a reading and/or experiments project and vice versa. However, please keep me posted.

NB: Sometimes a bigh chunk of the overall work for the project happens before (and while) you write the project proposal. So, don't be alarmed if generating a project proposal takes much work, this probably means that you'd have less work to do later.

A page of text should usually suffice, possibly less. Please use LaTeX, and insert bibliography as appropriate. I will need a PDF, sent to slivkins@cs.umd.edu, with a subject like "[CMSC 858G] project proposal".

Topic suggestions. [I may modify or elaborate these suggestions over time... ]

[Reading: bandits and X] A reading project is a safer bet. One good "template" for a project topic is "bandits and X", where X is some other topic that you like. For example, X could be any one of the following: convex optimization, submodular functions, metric spaces and graphs, Gaussian processes, MDPs, supervised learning, stochastic packing problems, crowdsourcing markets, economic incentives, medical trials. Drop me a line if you are interested in any of the above (or something else), and I'll be happy to point you to some papers.
[IID bandits with initial observations: experiments, possibly also research]
In IID bandits, some observations of the arms may be available before the algorithm starts. What is the best way to incorporate this "initial info"? For example, one can intepret this info as a prior (and there are multiple ways to do this!), and run a Bayesian bandit algorithm (Thompson Sampling or Gittins Algorithm or smth else). Alternatively, one can run any "non-Bayesian" algorithm such as UCB or Successive Elimination and just add the initial samples to the history. And there may be other ways to do it, too.
["Optimism under uncertainty": reading and/or research]
The technique in UCB1 has been used in many bandit-like problems. Is it possible to formulate a general setting/algorithm/proof which would cover many of these applications? There are some partial results, but no unified inderstanding, as far as I am aware.
[A/B testing vs. Explore-First: reading]
A/B testing is a standard technique for randomized experiments. For IID rewards, it is (essentially) the same as Explore-First bandit algorithm. So, in which ways does A/B testing goes beyond Explore-First?
[Thompson Sampling: reading, possibly also research]
What is known about [the theory of] Thompson Sampling for bandit-like settings? Where does it fall short compared to other approaches? You may also try to extend the techniques from prior work to analyze Thompson Sampling for some setting(s) not covered in the prior work. (I'll give specific pointers if you are interested.)
[Contextual bandits in practice: experimental]
Use state-of-art contextual bandit algorithms on real data sets. In particular, learn to use contextual bandit algorithms in Vowpal Wabbit.
[Adversarial bandits: experimental, possibly reading and/or research]
Say one would like to run simulations with several algorithms for adversarial bandits, to see which of the algorithms works better. Which "regimes" (types of problem instances) should one consider? Ideally, we'd have a well-established "library" of problem instances, so that we could run a new algorithm against this library and quickly see where it does well and where it does not.
["semi-adversarial bandits": reading, possibly also experiments and/or research]
Between a "super-optimistic" model of IID bandits and "super-pessimistic" model of adversarial bandits, there is a spectrum of models in which the rewards can change over time, but in some limited way. One would like to design algorithms that start out not knowing how "nice" the problem instance is, perform well in the more pessimistic models, and much better if the problem instance is actually "nice". See my papers in COLT'12 and ICML'14 for an example.
[Dynamic pricing with limited supply, etc.: research]
- [slowly changing environments] Dynamic pricing with limited supply is (mainly) studied for IID environments, e.g. see my paper in EC'12. Extend it to slowly changing environment, e.g. as in my papers in COLT'08 and COLT'11.
- [discretization error] The most general results proceed by "discretizing" the price space and then running a K-armed bandit algorithm for limited supply. However, discretization error is well-understood only when there is (at most) one product to sell and no contexts. (See my papers in EC'12, FOCS'13 and COLT'14.) How to go beyond that?
- A trivial tweak of the basic UCB1 algorithm happens to "work" in a one specific setting with many globlal constraints, see this paper. Can same/similar tweak work in other settings? What are the limitations of this simple approach?
[Contextual bandits with partially observed contexts: reading and/or research]
What if in each round there is a "context", but it is only partially observed by the algorithm? For example: what if the context corresponds to a user profile, and the algorithm gradually learns more about each user over time. Another example: what if the algorithm receives "advice" from some "experts" that have access to some of the features of the context that are not observed by the algorithm.
["Free exploration" in a recommendation system: research, possibly reading.]
Bandit algorithms can help optimize recommendations in a recommendation system (think: Netflix for movies, Yelp for restaurants, etc.) However, some exploration happens "for free", simply because different people try out different things. How to combine this "free exploration" with exploration done by a bandit algorithm?

Final report. The final report should clearly and concisely describe the project: what was the chosen topic, why it is interesting, what you've set out to accomplish, what you've learned/understood/accomplished, and what it means in a broader context.

The report should resemble a research paper in terms of style and presentation. Try to make it accessible to students in this course: explain things clearly, and try not assume background beyond what was covered in class.

"Introduction" section should state what is the topic/problem, why it is interesting, background and [some] prior work with appropriate citations, what you've set out to accomplish, and (on a high level) what you've accomplished.
"Preliminaries" section is a place to introduce models and/or notation. Also, if you'll be using tools that are somewhat "advanced" for this course (e.g., martingales, linear duality, KL-divergence) mention them here, and perhaps explain briefly what it is and how you'd be using it.
"Body" of the paper presents all the details.

For new results: explain what the results mean, be as explicit as possible about why they are interesting given the prior work. Include enough detail: [for theory] precise theorem statements, and full proofs, or at least proof sketches that contain the main ideas, [for simulations/experiments] detailed experimental setup, algorithms used, and what was observed.

If you tried some approach and it didn't work out, write about it, too! Try to explain why it didn't work and what are the obstacles; present lessons / conclusions, should you have any.

For reading projects: what is the goal of your project? Clearly state and cite the papers you've looked at. Try to state the main results as precisely and explicitly as possible (but also try to be succinct, and do use informal descriptions if needed). Explain what these results mean. What are the main techniques to achieve them. Ideally, also what are the cool open problems in the area, and why prior work falls short in solving them.

Mapping out the relevant work is an imporant part of the project. In particular, when citing a line of work, cite several papers, not just one; ideally, the initial paper or two, and a few of the most important follow-ups. Cite surveys whenever applicable. DBLP is a good tool to track down correct references (e.g., do not cite an arxiv version if there is a conference publication), and also to get BibTeX entries.

Formalities. The document should be 3-5 pp long, not including fugures and references, in 11pt font with 1-inch margins and single spacing. Include full names and emails for all authors. LaTeX strongly preferred (but do talk to me if this is a problem). Then the desired formatting can be achieved by
\documentclass[11pt,letterpaper]{article}
\usepackage{fullpage}.
BibTeX is recommended for bibliography.

Submission. Submit PDF file by email to "slivkins@cs.umd.edu" with subj "final project report".

Project presentation. Each project will need to make a short presentation.

Aim for 10-15 mins (but let me know if you'd like to take longer).
Whiteboard or slides or none -- up to you.
For projects with multiple people, it is up to you to decide who presents; multiple presenters are OK, too.

Please prepare in advance -- this is your chance to tell the class about the exciting stuff you've done and/or learned about!

Some generic advice:

Aim to make your presentation as accessible as possible. Explain what is the topic/problem and why it is exciting. Transmitting the high-level bits and excitement is more important in a short talk than going into any details.
Try to distill crucial ideas and find simple high-level explanations for them. Include hard details if you can state them clearly and concisely, in a way accessible to the class. But if you know that this or that technical statement is not likely to be understood, then no need to say it.
It is totally OK to “sweep things under the rug” if it helps you transmit crucial points.
It is totally OK to say “this is what we’ve been trying to do, but we don’t know how to do it”.

Re slides, should you choose to make them:

slides are useful, among other things, to remind you what you are supposed to say!
strive to make the slides clear and non-crowded. Write the main points, but no need to write full sentences, or to put on the slide everything that you intend to say.
If you need to discuss a formula, it is often useful to put it on a slide and point to it.
Use large enough font size so that it is visible from the back row.
No need to make slides fancy. In particular, bulleted list with plain text is totally fine!

CMSC 858G (Fall 2016)

Advanced Topics in Theory of Computing: Bandits, Experts, and Games