Na-Rae Han's home page

LING 1901 Fundamentals of Text Processing for Linguists (Independent Study)

Spring 2014, University of Pittsburgh

Meetings: Wednesdays 4:30pm - 5:45pm, in CL 2818 (Linguistics conference room)
This is a one-credit course designed to quickly introduce linguists to the fundamentals of text processing. We will use Python, a popular programming language in the computational linguistics community, as the platform. Students will learn the basics of computer programming and apply them to process written language. By the end, they will be able to perform such tasks as implementing language-based games, processing a text into word and sentence lists, extract word frequency counts, produce textual statistics, and more.

Please note that this course is not intended as a general introduction to computer programming. For that, students are encouraged to follow up with CS 0008 "Introduction to Computer Programming with Python", offered by the CS department every semester. CS 0008 or this course will be the pre-requisite for LING 1330 "Introduction to Computational Linguistics", to be offered in Fall 2014.

Na-Rae Han           Office hours: Mon & Thu 2:30-4pm (at CL G17)         Pitt ID & Google ID: naraehan
Shameek Poddar (TA)     Office hours: Mon 11am-1pm & Tue 5-6pm (at CL 2832)           Pitt ID: shp54, Google ID: shameek.poddar
Minas Abovyan (TA)     Office hours: Mon 5-6pm & Tue 4-5pm (at CL 2832)       Pitt ID: mia36, Google ID: abminara

Course Organization:
Bring your laptop! The class will center around hands-on learning of Python.

Requirements, Grading and Policies
Grading will be based on programming assignments and attendance. For details on course requirements and other policies, please read the Course Policies page.

Class Schedule*
W Date
Topic Notes Assignment Out
(Due Tuesday midnight)
11/8 Getting around in IDLE, print, assigning variables Lesson1.pdf Exercise1
21/15 Python basic syntax, string operations, writing and running script files Lesson2.pdf Exercise2
31/22 Control flow, commenting, list-string conversion Lesson3.pdf HW1
41/29 Type conversion, mutability, sequence indexing Lesson4.pdf Exercise3
52/5 for & while loops Lesson5.pdf Exercise4
62/12 Dictionary, user-defined function Lesson6.pdf HW2
72/19 How to use help, list and dictionary methods Lesson7.pdf Exercise5
82/26 Sorting, file IO Lesson8.pdf Exercise6
93/5 Modules, file & dir path, basic text stats Lesson9.pdf HW3
Spring break
103/19 List comprehension, text processing on the fly Lesson10.pdf Exercise7
113/26 Pickling objects, working with formatted data Lesson11.pdf Exercise8
124/2 Working with corpora Lesson12.pdf HW4
134/9 Object-oriented programming (presented by Shameek), Unicode handling Lesson13.pdf, Shameek's -
144/16 Looking forward: NLTK (Natural Language Toolkit) -
15 Finals week: no exam
*Class schedule is subject to revision throughout the semester.

Assignment Schedule

  1. As a rule, there will always be a form of assignment between classes. There are two types: programming exercises and homework assignments. They are due Tuesday midnight through CourseWeb.
  2. Programming Exercises: These programming exercises are designed to help you practice what you learned in class. As long as you are keeping up with the course contents, you should be able to complete them well within an hour. These are 20 points each. Currently 9 are being planned.
  3. Homework Assignments: These are longer assignments that involve writing a Python script. They are 40-50 points each. Currently 4 are being planned.
Most of your final grade will be based on the assignments. For details, refer to the Course Policies page.