Course Description: The future of Artificial Intelligence demands a paradigm shift towards multimodal perception, enabling systems to interpret and fuse information from diverse sensory inputs. While we humans perceive the world by looking, listening, touching, smelling, and tasting, traditional form of machine intelligence has primarily focused on a single sensory modality, often vision. To truly understand the world around us, AI must learn to jointly interpret multimodal signals. This graduate-level seminar course explores computer vision from a multimodal perspective, focusing on learning algorithms that augment vision with other essential modalities, such as audio, touch, language, and more. The majority of the course will consist of student presentations, experiments, and paper discussions, and we will delve into the latest research and advancements in multimodal perception.
Logistics
Instructor: Ruohan Gao (rhgao [at] umd.edu)
Office: IRB-4248
Office hours: by appointment (send email)
TA: Zihao Wei (zihaowei [at] umd.edu)
Office: IRB-3116
Office hours: by appointment (send email)
- Lectures: Tuesday 3:30PM - 6:00PM Eastern Time at CSI 2120.
- Piazza: We will be using Piazza as the primary platform for communication.
- Canvas: Submit your paper reviews on Canvas.
- Gradescope: Submit your coding assignments on Gradescope.
- Topic Preferences: Submit your topic preferences for presentation through this Google Form.
Course Requirements
Course Prerequisites: Familiarity with introductory courses in computer vision (CMSC426 or similar) and machine/deep learning (CMSC422 or similar) is recommended; Ability to understand and analyze conference papers in this area is required; Programming with deep learning frameworks is needed for experiment presentations and projects. I would strongly suggest scanning through a few papers and the topics on the syllabus to gauge what kind of background is expected. You don't have to know every single algorithm/tool/feature a given paper mentions, but you should feel comfortable following the key ideas. Please talk to me if you are unsure whether the course is a good match for your background.
Requirements Summary:
- Paper Reviews: writing two paper reviews for each of the paper presentation week and submitting a PDF on Canvas.
- Paper Presentation: presenting a cluster of papers on one shared topic with a partner, including one required paper read and reviewed by the entire class and four other related papers. The presentation must include an experiment component that shows new results or analyses for at least one of the papers.
- Paper Discussion: leading discussion of one presentation session to challenge the presenters and facilitate discussion of the entire class.
- Coding Assignments: two warmup coding assignments during the first half of the course.
- Midterm Exam: a takehome midterm exam that contains some questions based on readings and lectures and a mock peer review task.
- Final Project: completing a research-oriented final project in groups of up to 3 students.
Grading Summary:
- 20% Presentation (paper presentation + discussion)
- 20% Assignments (paper reviews + two coding assignments)
- 30% Midterm Exam
- 30% Final Project (including project proposal, extended abstract, final report, and presentation)
- Up to 5% Extra Credit on Class Participation (attendance, paper discussions, debate, exceptional presentation, etc.)
Important Dates
- Monday each of the paper presentation week: paper reviews for that week are due at 8pm ET.
- Tuesday the week before your presentation: send presentation slides draft to the instructor by email, due at 11:59pm ET.
- Friday, Feb 6: paper presentation topic preference due.
- Monday, Feb 9: coding assignment 1 released.
- Monday, Feb 23: coding assignment 1 due at 11:59pm ET.
- Tuesday, Feb 24: coding assignment 2 released.
- Friday, March 13: coding assignment 2 due at 11:59pm ET.
- Monday, March 23: one page project proposal due at 11:59pm ET.
- Friday, April 24: four page extended abstract for peer review due at 11:59pm ET.
- Tuesday, April 28: take-home midterm exam released.
- Tuesday, May 5: final project presentation.
- Friday, May 15: final project report due at 11:59pm ET.
Schedule
- Marked in Green: denotes required papers (2 or 3 each week), which the entire class should read and choose two to write paper reviews.
- Marked in Blue: denotes optional papers (4 accompanying papers for each required paper on a given topic), which paper presenters are also required to read and present; optional reading for the rest of the class.
Policy
Academic Integrity: Note that academic dishonesty includes not only cheating, fabrication, and plagiarism, but also includes helping other students commit acts of academic dishonesty by allowing them to obtain copies of your work. In short, all submitted work must be your own. Cases of academic dishonesty will be pursued to the fullest extent possible as stipulated by the Office of Student Conduct. It is very important for you to be aware of the consequences of cheating, fabrication, facilitation, and plagiarism. For more information on the Code of Academic Integrity or the Student Honor Council, please visit University of Maryland Code of Academic Integrity and Computer Science Department Academic Integrity Information.
Excused Absences: Any student who needs to be excused for an absence from a single lecture, recitation, or lab due to a medically necessitated absence shall make a reasonable attempt to inform the instructor of his/her illness prior to the class. Upon returning to the class, they should present their instructor with a self-signed note attesting to the date of their illness. Each note must contain an acknowledgment by the student that the information provided is true and correct. Providing false information to University officials is prohibited under Part 9(i) of the Code of Student Conduct (V-1.00(B) University of Maryland Code of Student Conduct) and may result in disciplinary action. For further details, please see University of Maryland Policy on Excused Absence.
Other Accommodations and Policies: