This is a one-credit course designed to quickly introduce linguists to the fundamentals of text processing. We will use Python, a popular programming language in the computational linguistics community, as the platform. Students will learn the basics of computer programming and apply them to process written language. By the end, they will be able to perform such tasks as implementing language-based games, processing a text into word and sentence lists, extract word frequency counts, produce textual statistics, and more.
Please note that this course is not intended as a general introduction to computer programming. For that, students are encouraged to follow up with CS 0008 "Introduction to Computer Programming with Python", offered by the CS department every semester. CS 0008 or this course will be the pre-requisite for LING 1330 "Introduction to Computational Linguistics", to be offered in Fall 2014.
Na-Rae Han Office hours: Mon & Thu 2:30-4pm (at CL G17) Pitt ID & Google ID: naraehan
Shameek Poddar (TA) Office hours: Mon 11am-1pm & Tue 5-6pm (at CL 2832) Pitt ID: shp54, Google ID: shameek.poddar
Minas Abovyan (TA) Office hours: Mon 5-6pm & Tue 4-5pm (at CL 2832) Pitt ID: mia36, Google ID: abminara
Bring your laptop! The class will center around hands-on learning of Python.
Requirements, Grading and Policies
Grading will be based on programming assignments and attendance. For details on course requirements and other policies, please read the Course Policies page.
(Due Tuesday midnight)
||Getting around in IDLE, print, assigning variables
||Python basic syntax, string operations, writing and running script files
|| Control flow, commenting, list-string conversion
||Type conversion, mutability, sequence indexing
||for & while loops
||Dictionary, user-defined function
||How to use help, list and dictionary methods
||Sorting, file IO
||Modules, file & dir path, basic text stats
||List comprehension, text processing on the fly
||Pickling objects, working with formatted data
||Working with corpora
||Object-oriented programming (presented by Shameek), Unicode handling
||Looking forward: NLTK (Natural Language Toolkit)
||Finals week: no exam
*Class schedule is subject to revision throughout the semester.
Most of your final grade will be based on the assignments. For details, refer to the Course Policies page.
- As a rule, there will always be a form of assignment between classes. There are two types: programming exercises and homework assignments. They are due Tuesday midnight through CourseWeb.
- Programming Exercises: These programming exercises are designed to help you practice what you learned in class. As long as you are keeping up with the course contents, you should be able to complete them well within an hour. These are 20 points each. Currently 9 are being planned.
- Homework Assignments: These are longer assignments that involve writing a Python script. They are 40-50 points each. Currently 4 are being planned.