Multilingual Natural Language Processing
CMSC828I Advanced Topics in Information Processing
LING848 Seminar in Computational Linguistics


About this class

Statistical Natural Language Processing provides representations and algorithms that make text useful for computer applications. This seminar course will address the challenge of scaling these approaches to the rich diversity of human languages. There are more than 7000 living languages in the world today, but the majority of techniques and resources are developed for English.

The course will introduce key ideas and techniques that make it possible to port existing resources into other languages, and to learn representations of language from multilingual data. Along the way, we will see a wide range of natural language processing tasks (from automatic analysis of syntax to semantics and discourse), as well as machine learning techniques (including supervised and unsupervised learning.) By the end of the semester, students should be well prepared to read and understand current research papers on these topics.

This course is aimed at graduate students interested in automatic processing of language from various angles: e.g., students interested in Natural Language Processing who would like to learn how to address the challenges of multilinguality , Linguistics students who would like to learn how to use their insights on language to inform language processing systems, or students with interests in Machine Learning who would like to learn more about applications in the field of natural language. Previous coursework in there areas is recommended, but you are encouraged to contact the instructor regardless of your background if this sounds interesting to you!


  • Instructor: Marine Carpuat
  • Course meeting times: Tuesday/Thursday 2:00-3:15PM. AVW 3258.
  • Office hours: Thursday 11:00-12:00PM
  • Piazza page for discussion and announcements: https://piazza.com/university_of_maryland_college_park/spring2015/cmsc828i/home
  • Prerequisite: CMSC/LING723 or permission of instructor.


  • Why Multilingual NLP?
  • Machine learning of cross-lingual mappings: introduction to alignment models for document structure, sentences, words and constituents
  • Porting resources and models across languages
  • Learning representations using cross-lingual supervision
  • Learning from multilingual text beyond translations (e.g., comparable corpora, code-switched and mixed language documents)

    View schedule and readings


    This course will involve a substantial amount of reading and discussion, as well as a project to put ideas in practice. Accordingly, grading will be based on:

  • Project (50%),
  • Participation (25%), based on attendance, in-class presentations and discussions, and paper reading notes posted on Piazza.
  • Homeworks (15%) and final exam (10%).

    This course is a PhD/MS qualifying course in AI.

    Submitting homework

    See homework questions and hand in your work here.

    Reading notes

    When reading notes are required, you should post the following on the piazza discussion board. For each of the readings, a few sentences (~2-3) summarizing what the paper is about, and a short comment (2-3 sentences) for discussion. The comment can include, e.g., questions, suggestions for doing something differently, ideas or hypotheses to test based on the methods or data presented on the paper, or any other paper-related point that you would like to talk about.

    Reading notes will account for 15% of the total grade, with 1% per note marked as a "good note" on Piazza.

    Language-in-10 Presentation

    Each student will prepare a 10 minute presentation on a language they do not speak natively. The slides must cover (1) Language Facts (demographics, location, etc.) (2) Important linguistic characteristics (orthography, morphology, syntax) and (3) computational efforts such as resources, tools, papers. Be creative and have fun with this! Asking for help from native speakers or language experts is fine, but the student is ultimately responsible for the presentation. You can find examples, inspiration and resources in Nizar Habash's machine translation course at Columbia and on the Langscape site developed by the UMD Language Science Center.

    The Language-in-10 Presentation will account for 5% of the total grade.


    You will have the opportunity to define the project that you want to tackle, together with the instructor. The goal of the project is to dive deeper on a topic of your choosing, and to put ideas from the course in practice by designing, implementing, and reporting on experiments. Project work will include presentations and write-ups, building toward a final report modeled after short papers published at conferences such as ACL or EMNLP.


    Syllabus can change

    This syllabus is subject to change. Students will be notified in advance of important changes that could affect grading, assignments, etc.

    Attendance and absences

    Students are expected to come to class. Unexcused absences will be taken into account in your participation grade. If you have to be absent for foreseeable reasons (e.g., religious observance, conference travel), please let the instructor know within the first two weeks of class, preferably by email. If you will be absent for unforeseeable reasons (e.g., illness), please email the instructor as soon as possible. Prolonged absence or illness preventing attendance from class requires written documentation from the Health Center and/or health care provider verifying dates of treatment when student was unable to meet academic responsibilities.


    Assignments will not be accepted late. We will post notes on Piazza when assignments have been graded. If you handed something in and do not get a score for an assignment, or if you would like it to be regraded, please email the instructor within one week.

    Academic integrity

    The student-administered Honor Code and Honor Pledge prohibit students from cheating on exams, plagiarizing papers, submitting the same paper for credit in two courses without authorization, buying papers, submitting fraudulent documents and forging signatures. Allegations of academic dishonesty will be reported directly to the Student Honor Council: http://www.shc.umd.edu.

    Students with disabilities

    The University of Maryland is committed to providing appropriate accommodations for students with disabilities. Students with a documented disability should inform the instructors within the add-drop period if academic accommodations are needed. To obtain an Accommodation Letter prepared by Disability Support Service (DSS), a division of the University Counseling Center, please call 301-314-7682, e-mail dissup@umd.edu, or visit the Shoemaker Building for more information.

    Course evaluations

    Course evaluations are a part of the process by which the University of Maryland seeks to improve teaching and learning. Your participation in this official system is critical to the success of the process, and all information submitted to CourseEvalUM is confidential.

    [Back] Back to the Department of Computer Science Class Pages