W | Date |
Topic |
Reference |
Assignment (due next class) |
1 | 8/29 (T) |
Intro to CL, setup, string processing
[Lecture1.pdf]
|
Python 3 Notes |
Exercise 1: Python refresher quiz |
8/31 (Th) |
Encoding systems, Python structural programming
[Lecture2.pdf, palindrome]
|
L&C ch.1 Encoding language |
Exercise 2: Python quiz, Pig Latin |
|
2 | 9/5 (T) |
Encoding systems & Unicode; text processing with NLTK
[Lecture3.pdf]
|
NLTK ch.1, ch.2, ch.3 |
Exercise 3: Processing O. Henry |
9/7 (Th) |
Unicode, spell checking fundamentals: edit distance, more NLTK
[Lecture4.pdf]
|
L&C ch.2 Writers' aids: spell checkers |
HW 1: Spell checker, text processing |
|
3 | 9/12 (T) |
N-gram context, n-gram frequency, data resources on web, list comprehension
[Lecture5.pdf]
|
|
Exercise 4: Austen vs. Enable |
9/14 (Th) |
Conditional probability, n-gram frequency, conditional frequency distribution
[Lecture6.pdf, CFD + ENABLE practice shell txt/PDF]
|
NLTK ch.2, ch.3 |
HW 2: Bigram Speak |
|
4 | 9/19 (T) |
N-gram language models, web resources
[Lecture7.pdf, process Norvig's unigram data txt/PDF]
|
J&M ed.3 ch.3 N-gram language models |
Exercise 5: Big-data n-gram stats |
9/21 (Th) |
N-gram resources, NLTK's corpus tools, corpus linguistics
[Lecture8.pdf]
|
|
HW 3: BU/JA EFL writing (week-long) |
|
5 | 9/26 (T) |
Type, token, TTR; Zipf's law, freq distribution, n-grams
|
|
- |
9/28 (Th) |
HW3 review, classifier intro
|
NLTK ch.6: 6.1.3 Learning to classify text |
Exercise 6 |
|
6 | 10/3 (T) |
Naive Bayes classifier
|
L&C ch.5 Classifying documents |
HW 4 (week-long) |
10/5 (Th) |
Bayes theorem, evaluation metrics
|
NLTK 6.5 Naive Bayes classifiers, 6.3 Evaluation |
- |
|
7 | 10/10 (T) |
Homework 4 review
|
|
- |
10/12 (Th) |
Midterm exam |
|
8 | 10/17 (T) |
Regular expressions
|
NLTK 3.4 Regular expressions
J&M ed.3 ch.2 Regular expressions |
Exercise 7 |
10/19 (Th) |
RE in Python, FSA
|
L&C ch.4 FSA
|
HW 5 |
|
9 | 10/24 (T) |
FSA, Morphology, FST
|
J&M ed.2 (older edition!) ch.3 Words and Transducers
Hulden (2011) Morphological analysis with FSTs |
Exercise 8 |
10/26 (Th) |
FST and Foma
|
|
HW 6 |
|
10 | 10/31 (T) |
FST review, POS tags
|
L&C ch.3 Language tutoring systems 3.4 Tokenization, POS tagging
NLTK ch.5 Categorizing and tagging words |
Exercise 9 |
11/2 (Th) |
POS tagging
|
NLTK ch.5.5 N-Gram Tagging
J&M ed.3 ch.8 POS tagging |
HW 7 |
|
11 | 11/7 (T) |
N-gram tagger review, HMM tagger, Trees, CFG
|
L&C ch.3 Beyond words
J&M ed.3 ch.12 Constituency Grammars
NLTK 7.4.2 Trees! |
Exercise 10 |
11/9 (Th) |
Parsing, CFG, Treebanks
|
NLTK ch.8 Analyzing sentence structure |
HW 8 |
|
12 | 11/14 (T) |
Probabilistic CFG, dependency grammar;
Computational semantics: WordNet
|
J&M ed.3 ch.14 Dependency parsing
J&M ed.3 ch.18 Word sense and WordNet
NLTK 2.5 WordNet |
Exercise 11 |
11/16 (Th) |
Computational semantics: formal, semantic roles
|
NLTK ch.10 Analyzing the meaning of sentences
J&M ed.3 ch.19 Semantic role labeling |
HW 9 (due 11/28) |
|
Thanksgiving break (whole week) |
|
13 | 11/28 (T) |
Vector semantics, word embeddings
|
J&M ed.3 ch.6 Vector semantics and embeddings |
- |
11/30 (Th) |
Machine translation
|
L&C ch.7 Machine translation
Eisenstein (2019) Ch.18 MT, draft copy
J&M ed.3 Appendix B The noisy channel model, PPT slides |
HW 10 |
|
14 | 12/5 (T) |
MT wrap, formal language theory
|
Eisenstein (2019) Ch.9 Formal language theory, draft copy |
- |
12/7 (Th) |
Formal language theory
|
Partee et al. (1993) Ch.16 |
- |
|
15 | TBA |
Final exam |
*Class schedule is subject to revision throughout the semester.
|