Instructors: Dr. Jon Dehdari and Dr. Asad Sayeed
Class Location: (Former) CiP Room, building C7.2
Class Times: Lecture: Mondays 14:00-16:00 (c.t); Lab: Wednesdays 16:00-18:00 (s.t)
Class Dates: Oct. 31st - Feb. 15th
Jon's Offices: either room 1.15, building A2.2, or room 1.11 building D3.1
Asad's Office: room 3.04, building C7.4
Final exam: Feb. 22nd, 16:00 - 18:00, Conference Room of Building C7.4
Language Technologies I teaches the theoretical foundation of modern computational linguistics and natural language processing. This includes important machine learning techniques.
- Formal models of language: possibilities (homework)
- Statistical models of language: probabilities (homework)
- Applications of language models
- n-gram language models and smoothing (more info) (homework) (training data) (testing data) (example transcript) (Bad-Turing transcript) (lazy tokenization script)
- Parts of speech, word clusters, and class-based language models
- Log-linear models (homework)
- Word vectors, and applications (homework)
- Feedforward neural networks and autoencoders (homework)
- Recurrent neural networks and their language models (homework) (example transcript)
- Probabilistic context-free grammars, parsing, and syntactic language models (homework - small updates)
- Sequence-to-sequence models and neural machine translation
- Convolutional networks and character-based models of language
- Neural Network Software:
-
Keras, an easy neural network library. You can install it by typing the following from the command line:
pip3 install --user keras
-
- Free Corpora:
- WMT Data, both parallel and monolingual corpora
- ACL Wiki, "Resources by Language"
- OPUS - open parallel corpora
- Corpus Processing Tools: