Natural Language Processing for Arabic and its Dialects: Challenges and Approaches (and Why Theoretical Linguists Should Care)

When

3 – 4 p.m., Sept. 27, 2024

Where

Natural Language Processing for Arabic and its Dialects: Challenges and Approaches (and Why Theoretical Linguists Should Care)

Owen Rambow, Department of Linguistics and IACS, Stony Brook University
owen.rambow@stonybrook.edu

Arabic is a challenge for natural language processing (NLP) for at least two reasons: (1) it has rich morphology and (2) it shows dialectal variation which affects many levels of linguistic analysis, including phonology and morphology.  In this talk, I will review work in Arabic NLP that I have been involved in.  In a first part related to the richness of morphology, I will talk about the tasks of morphological analysis (determining all possible morphological analyses for a work) and morphological tagging (determining the correct analysis in context).  This will be based on Modern Standard Arabic (MSA).  In a second part, I will address the question of how we can build NLP resources in the presence of profound dialectal variation, given that most Arabic dialects are under-studied and under-resourced.  Specifically, I will present recent work on learning morphophonological rules from small data sets.  Such morphophonological rules can be used to create morphological analyzers and taggers for dialects.

The work presented will be based heavily on the contributions of my collaborators: Nizar Habash and Mona Diab for work on morphological analysis and tagging for MSA, and Salam Khalifa on the rule learning work.
 
Password: LingCo1