Natural Language Processing Graduate Certificate

In August of 2021, the Department of Linguistics launched the Graduate Certificate in Natural Language Processing (NLP). The certificate can be completed entirely online, in-person, or as a mix of both modalities. Find more information on the Graduate College's Certificate Catalog under "Natural Language Processing" or read through the NLP Certificate frequently asked questions.

The NLP Graduate Certificate consists of 9 units.

The following two courses are required:

  • LING 539: Statistical Natural Language Processing
  • LING 582: Advanced Statistical Natural Language Processing

The elective slot can be filled with any course listed electives below.

For students with little background looking for a gentler introduction, we recommend the following sequence:

  1. LING 529: Human Language Technology I (see public Course Overview page for LING 529)
  2. LING 539: Statistical Natural Language Processing (see public Course Overview page for LING 539)
  3. LING 582: Advanced Statistical Natural Language Processing

Core Courses

LING 539: Statistical Natural Language Processing
This course introduces the key concepts underlying statistical natural language processing. Students will learn a variety of techniques for the computational modeling of natural language, including: n-gram models, smoothing, Hidden Markov models, Bayesian Inference, Expectation Maximization, Viterbi, Inside-Outside Algorithm for Probabilistic Context-Free Grammars, and higher-order language models. Graduate-level requirements include assignments of greater scope than undergraduate assignments. In addition to being more in depth, graduate assignments are typically longer and additional readings are required.

LING 582: Advanced Statistical Natural Language Processing
This course focuses on statistical approaches to pattern classification and applications of natural language processing to real-world problems.

Electives Courses

LING 529: Human Language Technology I
This class serves as an introduction to human language technology (HLT), an emerging interdisciplinary field that encompasses most subdisciplines of linguistics, as well as computational linguistics, natural language processing, computer science, artificial intelligence, psychology, philosophy, mathematics, and statistics. Content includes a combination of theoretical and applied topics such as (but not limited to) tokenization across languages, ngrams, word representations, basic probability theory, introductory programming, and version control.

LING 531: Human Language Technology II
This intermediate-level course is a continuation of LING 529 and covers a combination of theoretical and applied topics such as (but not limited to) unsupervised learning (clustering), decision tree classifiers, and the basics of information retrieval.

LING 538: Computational Linguistics
Fundamentals of formal language theory; syntactic and semantic processing; the place of world knowledge in natural language processing. Graduate-level requirements include a greater number of assignments and a higher level of performance.

LING 581: Advanced Computational Linguistics
This course provides a hands-on project-based approach to particular problems and issues in computational linguistics.

LING 578: Speech Technology
Topics include speech synthesis, speech recognition, and other speech technologies. This course gives students background for a career in the speech technology industry. Graduate students will do extra readings, extra assignments, and have an extra presentation. Their final project must constitute original work in a speech technology.

LING 508: Computational Techniques for Linguists
Students are introduced to computer programming as it pertains to collecting and analyzing linguistic data. The particular programming language is chosen at the discretion of the instructor. Graduate-level requirements include more challenging exams; 50% greater contribution to their respective group projects; 9 instead of 6 assignments; additional readings from the primary literature.
NOTE: The version offered in the online program is closer to something like “advanced programming techniques for computational linguists”

LING 696G: Topics in Computational Linguistics
The development and exchange of scholarly information, usually in a small group setting with an in-depth investigation of computational linguistics theory and application. The scope of work shall consist of research by course registrants, with the exchange of the results of such research through discussion, reports, and/or papers.
NOTE: The topic/focus may vary from offering to offering. The next online offering will be structured as a followup to the Speech Technology course (LING 578).

INFO 523: Data Mining and Discovery
This course will introduce students to the concepts and techniques of data mining for knowledge discovery. It includes methods developed in the fields of statistics, large-scale data analytics, machine learning, pattern recognition, database technology and artificial intelligence for automatic or semi-automatic analysis of large quantities of data to extract previously unknown interesting patterns. Topics include understanding varieties of data, data preprocessing, classification, association and correlation rule analysis, cluster analysis, outlier detection, and data mining trends and research frontiers. We will use software packages for data mining, explaining the underlying algorithms and their use and limitations. The course include laboratory exercises, with data mining case studies using data from many different resources such as social networks, linguistics, geo-spatial applications, marketing and/or psychology

INFO 536: Data Science and Public Interests
This course focuses on the use of modern data science methods to help learners make socially responsible decisions and mitigate harm that arises from issues like bias, discrimination, and threats to one’s personal privacy. More and more individuals are needing to make data-driven decisions in a wide variety of contexts including non-governmental organizations, not-for-profit industries, human services, environmental organizations, refugee camps, and more. Students in this class will thus learn about data science and how it can be utilized in contexts where socially-good decisions are desired and emphasized. This active learning class is designed for students who have an interest in the topic but who may have little to no previous experience with data science or programming.

CSC 583: Text Retrieval and Web Search
Most of the web data today consists of unstructured text. Of course, the fact that this data exists is irrelevant, unless it is made available such that users can quickly find information that is relevant for their needs. This course will cover the fundamental knowledge necessary to build these systems, such as web crawling, index construction and compression, Boolean, vector-based, and probabilistic retrieval models, text classification and clustering, link analysis algorithms such as PageRank, learning to rank, and computational advertising. The students will also complete one programming project, in which they will construct one complex application that combines multiple algorithms into a system that solves real-world problems.

CSC 585: Algorithms for Natural Language Processing
This course covers important algorithms useful for natural language processing (NLP), including distributional similarity algorithms such as word embeddings, recurrent and recursive neural networks (NN), probabilistic graphical models useful for sequence prediction, and parsing algorithms such as shift-reduce. This course will focus on the algorithms that underlie NLP, rather than the application of NLP to various problem domains.

INFO 557: Neural Networks
Neural networks are a branch of machine learning that combines a large number of simple computational units to allow computers to learn from and generalize over complex patterns in data. Students in this course will learn how to train and optimize feed forward, convolutional, and recurrent neural networks for tasks such as text classification, image recognition, and game playing.