Go to: Course home page  

Homework Assignment #3

Two inaugural speeches: Washington vs. Obama
George Washington wrote the first of his two inaugural speeches in 1789. Barack Obama wrote his in 2009. How does the intervening 220 years affect their language? We will explore the two pieces of historic texts in this homework.

Your job is to complete two python scripts: (A) textproc.py, and (B) HW3.inaugural.TEMPLATE.py. The former is a python module that includes many essential functions for text processing, for example getToks() for tokenization. The latter is your main script, and it calls various functions in the textproc.py module to process the two inaugural speech text files. Additionally, you will be answering some questions regarding the two speeches: write them up in a separate document (.txt, MS word, or a .pdf file).

Download the two speech text files (Mac users: make sure to download them as the source text files) and the two script files, and save them in the same directory. Then follow the instructions below to complete the assignment.

Part A: Complete textproc.py and Learn the Functions

The goal here is two-fold: (1) complete the module, and (2) familiarize yourself with the workings of the various functions so you can comfortably use them in Part B.
  1. Complete the getRelFreq() function, marked with [1].
  2. Try out the main() function by making the edits marked with [2] and running the script. You should be getting this shell output.
  3. Learn how the individual functions work by examining how they are called in the main() function. You may also experiment with the functions immediately following the execution of textproc.py.

Part B: Complete HW3.inaugural.TEMPLATE.py and Answer Questions

Now you are ready to explore the two speeches and address some linguistically motivated questions. You will find the answers through completing HW3.inaugural.TEMPLATE.py, while using the functions in textproc.py. Have your script calculate and write out the relevant data points, and then write down in a separate document the answers and your analysis.

  • Question 1: Initial Impressions
    First, take a quick look through the two speeches (Washington, Obama) and form an impression. Are there any differences that are immediately noticeable?

  • Question 2: Text Length
    Whose speech is longer: Washington's or Obama's? How long are the speeches?

  • Question 3: Vocabulary Diversity
    Whose speech has more diverse vocabulary? Vocabulary diversity can be represented by TTR (Type-Token Ratio). Have your script write out both the type count and the TTR.

  • Question 4: Sentence Length
    Who uses longer sentences -- Washington or Obama? Have your script write out both the sentence count and the average sentence length.

  • Question 5: Word Length
    Who uses longer words -- Washington or Obama? What are their average word lengths? Exclude symbols when calculating these numbers.

  • Question 6: Top 20 Washington Words
    What are the top 20 most frequent words in Washington's speech? Have your script write them out, in descending order: the word, its frequency count, and its relative frequency, separated by a tab.

  • Question 7: Top 20 Obama Words
    Do the same for Obama's speech. How does this list compare with the one from Washington's?

  • Question 8: Frequent Words Found in One Speech Only
    What are the frequent words that are only found in one speech? That is: what are the frequent words in Washington's speech that do not occur in Obama's at all, and vice versa? Have your script print out top 10 words each case, followed by their Washington count and their Obama count, separate by a tab.

  • Question 9: Top 20 Favored by Washington
    Of those words used by both, which were used by Washington much more than by Obama? Give top 20 such words along with the degree of preference (calculated as the difference in their relative frequencies in the two texts). Any observations you can make?

  • Question 10: Top 20 Favored by Obama
    Do the same for Obama's speech. Anything noteworthy?

  • Question 11: Your Own Question
    Anything else you want to investigate? Pick your own question and find the answer.

  • Question 12: Comparison Summary
    In your own words, summarize the findings from Q1-Q11. Feel free to interpret the results and provide your own insights.

When you are done, upload the four files:

  • textproc.py
  • HW3.inaugural.YOUR-LAST-NAME.py
  • Your output file HW3.inaugural.OUT.txt
  • A document (.txt, .docx, or .pdf) containing your answers