CMSC 828L Deep Learning




Professor David Jacobs AV Williams, 4421

Email: djacobs-at-cs


TAs:     Justin Terry Email: justinkterry – at - gmail

             Chen Zhu  Email: chenzhu-at-cs


Office Hours:

                    Monday, 5-6.  Justin.

                    Tuesday, 3-4.  David.

                    Wednesday, 10-11, David

                    Wednesday, 5-6. Justin.

                    Thursday, 4-6.  Chen.

                   Location: TA office hours will be in 4101 or 4103 AV Williams, depending on availability

                   (check both rooms).  Prof. Jacobs office hours will be in 4421 AV Williams.





The following two books are available on line.

Deep Learning, by Ian Goodfellow and Yoshua Bengio and Aaron Courville

Neural Networks and Deep Learning, by Michael Nielsen


Other reading material appears in the schedule below.




Students registered for this class must complete the following assignments:


Presentation: Students will form groups of 3 students.  Each group will prepare a 30 minute presentation on a topic of their choice.  Students will select two papers, and will present a summary and critical analysis of the material in these papers, along with any other appropriate background or related material.  Students should video record their presentation and submit a link to their video.  Presentations will be graded on the choice of topic (is the material interesting), clarity of presentation (do we understand the key points), focus (does the presentation highlight the most important parts of the work, rather than uniformly summarizing everything), and analysis (does the presentation help us understand the strengths and limitations of the presented work).  Six leading presentations will be selected for live presentation to the full class.

Problem Sets: There will be three problem sets assigned during the course.  These will include programming projects and may also include written exercises.

Midterm: There will be a one week, take-home midterm.  This will include paper and pencil exercises.

Final Exam: There will be an in-class final exam.


Course Policies






Course work, late policies, and grading

Homework and the take-home midterm are due at the start of class. Problems may be turned in late, with a penalty of 10% for each day they are late, but may not be turned in after the start of the next class after they are due. For example, if a problem set is due on Tuesday, it may be turned in before Wednesday at 12:30pm, with a 10% penalty, or before Thursday at 12:30pm, with a 20% penalty, but no later than Thursday at 12:30pm.

Some homeworks and the exams may have a special challenge problem. Points from the challenge problems are extra credit. This means that I do not consider these points until after the final course grade cutoffs have been set. Students participating in class discussion or asking good questions may also receive extra credit.

Each problem set and the presentation will count for 10% of the final grade.  The midterm will count for 20%, and the final will count for 40%.



Academic Honesty

All class work is to be done independently. You are allowed to discuss class material, homework problems, and general solution strategies with your classmates. When it comes to formulating/writing/programming solutions you must work alone. If you make use of other sources in coming up with your answers you must cite these sources clearly (papers or books in the literature, friends or classmates, information downloaded from the web, whatever).

It is best to try to solve problems on your own, since problem solving is an important component of the course. But I will not deduct points if you make use of outside help, provided that you cite your sources clearly. Representing other people's work as your own, however, is plagiarism and is in violation of university policies. Instances of academic dishonesty will be dealt with harshly, and usually result in a hearing in front of a student honor council, and a grade of XF. (Note, this and other course policies are taken from those of Prof. David Mount).


Any student who needs to be excused for an absence from a single lecture, recitation, or lab due to a medically necessitated absence shall: a) Make a reasonable attempt to inform the instructor of his/her illness prior to the class. b) Upon returning to the class, present their instructor with a self-signed note attesting to the date of their illness. Each note must contain an acknowledgment by the student that the information provided is true and correct. Providing false information to University officials is prohibited under Part 9(h) of the Code of Student Conduct (V-1.00(B) University of Maryland Code of Student Conduct) and may result in disciplinary action. The self-documentation may not be used for the Major Scheduled Grading Events as defined below and it may only be used for only 1 class meeting (or more, if you choose) during the semester. Any student who needs to be excused for a prolonged absence (2 or more consecutive class meetings), or for a Major Scheduled Grading Event, must provide written documentation of the illness from the Health Center or from an outside health care provider. This documentation must verify dates of treatment and indicate the timeframe that the student was unable to meet academic responsibilities. In addition, it must contain the name and phone number of the medical service provider to be used if verification is needed. No diagnostic information will ever be requested. The Major Scheduled Grading Events for this course include: the Final exam, as given in University schedule.

Academic Accommodations

Any student eligible for and requesting reasonable academic accommodations due to a disability is requested to provide, to the instructor in office hours, a letter of accommodation from the Office of Disability Support Services (DSS) within the first two weeks of the semester.






Problem Set 1



Problem Set 2



Problem Set 3











Tentative Schedule








Class 1





Class 2


Intro to Machine Learning


Deep Learning, Chapter 5

Class 3


Intro to Machine Learning: Linear models (SVMs and Perceptrons, logistic regression)


For Logistic Regression see this chapter from Cosmo Shalizi

Class 4


Intro to Neural Nets: What a network computes.


Deep Learning, Chapter 6


Neural Networks and Deep Learning, Chapter 2

Class 5


Training a network: loss functions, backpropagation .


A tutorial on energy based learning, by Lecun et al.


Neural Networks and Deep Learning, Chapter 3

Class 6


Neural networks as universal function approximators


Approximation by superpositions of a sigmoidal function, by George Cybenko (1989). 


Multilayer feedforward networks are universal approximators, by Kurt Hornik, Maxwell Stinchcombe, and Halbert White (1989)


Neural Networks and Deep Learning, Chapter 4


Class 7


Convolution and Fourier Transforms


Convolution and Fourier Transforms

Class 8


CNNs contŐd

Stochastic Gradient Descent, batch normalization, Siamese networks, early stopping, transfer learning, brief history of neural networks.

Deep Learning, Chapter 7


Deep Learning, Chapter 9

Class 9


Implementation of deep learning.  Deep learning frameworks and the software stack, hyperparameter optimization, hardware acceleration, debugging.



Class 10


Implementation of deep learning, contŐd


Class 11


Deeper networks.  The vanishing gradient, skip connections, resnet.


Very Deep Convolutional Neural Networks for Large-Scale Image Recognition, by Simonyan and Zisserman


Deep Residual Learning for Image Recognition by He et al.


Residual Networks are Exponential Ensembles of Relatively Shallow Networks by Veit et al.


Densely Connected Convolutional Neural Networks by Huang et al.


Also of interest:


Neural Networks and Deep Learning Chapter 5


On the Difficulty of Training Recurrent Neural Networks by Pascanu et al.

Class 12


Optimization.  Convex vs. non-convex functions.  Convergence of GD and SGD, Adam optimizer, initialization, leaky RELU, Momentum, Changing step sizes.


Neural Networks and Deep Learning Chapter 8

Class 13


Convergence in deep networks.  Minima that do/donŐt generalize. Broad vs. narrow minima.  GD vs. SGD.  The loss landscape.

Understanding deep learning requires rethinking generalization, by Zhang et al.


Visualizing the loss landscape of neural nets, by Li et al.


VC Dimension and Rademacher compextiy are discussed in many places, eg., these notes.


Keskar, Nitish Shirish, et al. "On large-batch training for deep learning: Generalization gap and sharp minima."


Class 14


Dimensionality reduction, linear (PCA, LDA) and manifolds, random projections.

PCA (slides from Olga Veksler)


LDA (slides from Olga Veksler)


An elementary proof of the Johnson-Lindenstrauss Lemma, by Dasgupta and Gupta 

Class 15


Low-dimensional embedding, metric learning

Efficient Estimation of Word Representations in Vector Space by Mikolov et al.


Facenet: a Unified Embedding for Face Recognition and Clustering by Schroff et al.


Metric Learning, a Survey, by Brian Kulis

Class 16


Adversarial attacks


Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey, by Akhtar and Mian


Intriguing Properties of Neural Networks, by Szegedy et al.


Explaining and Harnessing Adversarial Examples by Goodfellow et al.


A Boundary Tilting Perspective on the Phenomenon of Adversarial Examples by Tanay and Griffith


Poison Frogs! Targeted Clean Label Attacks on Neural Networks by Shafahi et al.

Class 17


AI Safety and the future of AI


Class 18


Autoencoders, Variational Autoencoders,  and dimensionality reduction in networks


Deep Learning, Chapter 14


Tutorial on Variational Autoencoder, by Carl Doersch

Class 19


Generative models, GANs.



Generative Adversarial Networks by Goodfellow et al.


Towards Principled Methods of Training Generative Adversarial Networks by Arjovsky and Bettou


Wasserstein GAN by Arjovsky et al.



Class 20


Go over midterm.


Image-to-image translation

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks by Zhu et al.


Class 21


Reinforcement learning

Reinforcement Learning, An Introduction by Sutton and Barto


Understanding Chapters 3 and 6 is important, but reading 4 and 5 will probably help with 6.  Chapter 1 is fun and quick to read.

Class 22


Deep reinforcement learning

Reinforcement Learning, An Introduction by Sutton and Barto


Deep Learning Sections 16.1, 16.5, 16.6

Class 23


Why are deep networks better than shallow?

G. F. Montufar, R. Pascanu, K. Cho, and Y. Bengio. On the number of linear regions of deep neural networks. In NIPS, pages 2924–2932, 2014.

The Power of Depth for Feedforward Neural Networks 
Ronen Eldan and Ohad Shamir 
29th Conference on Learning Theory


Benefits of depth in neural networks Matus Telgarsky

Class 24


Catching up on previous topics.

Class 25


Recurrent neural nets.

Deep Learning, Chapter 10, especially from the beginning through 10.2, and Section 10.10

Class 26


Student presentations

Visual Question Answering -- Ishita, Pranav and Shlok


Bayesian Deep Learning -- Sam and Susmija

DonŐt Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering by Agrawal et al.



Gal, Yarin, and Zoubin Ghahramani. "Dropout as a Bayesian approximation: Representing model uncertainty in deep learning." 


Class 27





Class 28


Student presentations

Mansi, Sahil, and Saumya – Capsule Networks


Kamal, Sneha, and Uttaran – Graph Convolutional Networks

Sara Sabour, Nicholas Frosst, Geoffrey Hinton, Dynamic Routing Between Capsules


Spectral Networks and Locally Connected Networks on Graphs
Joan Bruna, Wojciech Zaremba, Arthur Szlam, Yann LeCun


Class 29


Student presentations

Samuel, Alex and Alex -- Memory Augmented Neural Networks and Meta-Learning


Abhishek, Nirat, Snehesh, Chahat – Depth, Pose, and Flow from Images.

One Shot Learning with Memory-Augmented Neural Networks, by Santoro et al.



Yin and Shi, GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose