Go to: LING 1330/2330 home page  

Homework Assignment 7: N-gram Tagger

This homework assignment contains two parts.
They are based on a couple of sections of Ch.5: 5-4 Automatic Tagging and 5-5 N-Gram Tagging.

PART 1: Building a Bigram Tagger [35 points]


In this part, the goal is to build a bigram tagger and test it out. We will use the Brown Corpus and its native tagset. Follow the steps below.

STEP 1: Prepare data sets

There are a total of 57,340 POS-tagged sentences in the Brown Corpus. Among them, assign the first 50,000 to your list of training sentences. Then, assign the remaining sentences to your list of testing sentences. The first of your testing sentences should look like this:
 
>>> br_test[0]
[('I', 'PPSS'), ('was', 'BEDZ'), ('loaded', 'VBN'), ('with', 'IN'), ('suds', 'NNS'), 
('when', 'WRB'), ('I', 'PPSS'), ('ran', 'VBD'), ('away', 'RB'), (',', ','), ('and', 'CC'), 
('I', 'PPSS'), ("haven't", 'HV*'), ('had', 'HVN'), ('a', 'AT'), ('chance', 'NN'), 
('to', 'TO'), ('wash', 'VB'), ('it', 'PPO'), ('off', 'RP'), ('.', '.')]
>>> 

STEP 2: Build a bigram tagger

Following the steps shown in this chapter, build a bigram tagger with two back-off models. The first one on the stack should be a default tagger that assigns 'NN' by default.

STEP 3: Evaluate

Evaluate your bigram tagger on the test sentences through .accuracy(). (Note: .evaluate() is the outdated method.) You should be getting the accuracy score of 0.911. If not, something went wrong: go back and re-build your tagger.

STEP 4: Explore

Now, explore your tagger to answer the questions below.
  1. How big are your training data and testing data? Answer in terms of the number of total words in them.
  2. What is the performance of each of the two back-off taggers? How much improvement did you get: (1) going from the default tagger to the unigram tagger, and (2) going from the unigram tagger to the bigram tagger?
  3. Recall that 'cold' is ambiguous between JJ 'adjective' and NN 'singular noun'. Let's explore the word in the training data. The problem with the training data, through, is that it is a list of tagged sentences, and it's difficult to get to the tagged words which are one level below:
     
    >>> br_train[0]
    [('The', 'AT'), ('Fulton', 'NP-TL'), ('County', 'NN-TL'), ('Grand', 'JJ-TL'),
    ('Jury', 'NN-TL'), ('said', 'VBD'), ('Friday', 'NR'), ('an', 'AT'), ('investigation', 
    'NN'), ('of', 'IN'), ("Atlanta's", 'NP$'), ('recent', 'JJ'), ('primary', 'NN'), 
    ('election', 'NN'), ('produced', 'VBD'), ('``', '``'), ('no', 'AT'), ('evidence', 
    'NN'), ("''", "''"), ('that', 'CS'), ('any', 'DTI'), ('irregularities', 'NNS'), 
    ('took', 'VBD'), ('place', 'NN'), ('.', '.')]
    >>> br_train[1277]         # 1278th sentence
    [('``', '``'), ('I', 'PPSS'), ('told', 'VBD'), ('him', 'PPO'), ('who', 'WPS'), 
    ('I', 'PPSS'), ('was', 'BEDZ'), ('and', 'CC'), ('he', 'PPS'), ('was', 'BEDZ'), 
    ('quite', 'QL'), ('cold', 'JJ'), ('.', '.')]
    >>> br_train[1277][11]     # 1278th sentence, 12th word
    ('cold', 'JJ')
    >>> 
    
    To be able to compile tagged-word-level statistics, we will need a flat list of tagged words, without them being organized into sentences. And, let's lowercase all words while at it, so we don't have to deal with 'Cold' and 'cold' as separate cases. How to do this? You can use multi-loop list comprehension to construct it while applying .lower():
     
    >>> br_train_flat = [(word.lower(), tag) for sent in br_train for (word, tag) in sent]
                      # [x for innerlist in outerlist for x in innerlist]
    >>> br_train_flat[:40]
    [('the', 'AT'), ('fulton', 'NP-TL'), ('county', 'NN-TL'), ('grand', 'JJ-TL'), 
    ('jury', 'NN-TL'), ('said', 'VBD'), ('friday', 'NR'), ('an', 'AT'), ('investigation', 
    'NN'), ('of', 'IN'), ("atlanta's", 'NP$'), ('recent', 'JJ'), ('primary', 'NN'), 
    ('election', 'NN'), ('produced', 'VBD'), ('``', '``'), ('no', 'AT'), ('evidence', 
    'NN'), ("''", "''"), ('that', 'CS'), ('any', 'DTI'), ('irregularities', 'NNS'), 
    ('took', 'VBD'), ('place', 'NN'), ('.', '.'), ('the', 'AT'), ('jury', 'NN'), 
    ('further', 'RBR'), ('said', 'VBD'), ('in', 'IN'), ('term-end', 'NN'), ('presentments', 
    'NNS'), ('that', 'CS'), ('the', 'AT'), ('city', 'NN-TL'), ('executive', 'JJ-TL'), 
    ('committee', 'NN-TL'), (',', ','), ('which', 'WDT'), ('had', 'HVD')]
    >>> br_train_flat[13]       # 14th word
    ('election', 'NN')
    >>> 
    
    Now, exploring this list of (word, POS) pairs from the training data, answer the questions below.
    1. Which is the more likely tag for 'cold' overall?
    2. When the POS tag of the preceding word (call it POSn-1) is AT, what is the likelihood of 'cold' being a noun? How about it being an adjective?
    3. When POSn-1 is JJ, what is the likelihood of 'cold' being a noun? How about it being an adjective?
    4. Can you find any POSn-1 that favors NN over JJ for the following word 'cold'?
  4. Based on what you found, how is your bigram tagger expected to tag 'cold' in the following sentences?
    1. I was very cold.
    2. I had a cold.
    3. I had a severe cold.
    4. January was a cold month.
  5. Verify your prediction by having the tagger actually tag the four sentences. What did you find?
  6. Have the tagger tag the following sentences, all of which contain the word 'so':
    1. I failed to do so.
    2. I was happy, but so was my enemy.
    3. So, how was the exam?
    4. The students came in early so they can get good seats.
    5. She failed the exam, so she must take it again.
    6. That was so incredible.
    7. Wow, so incredible.
  7. Examine the tagger's performance on the sentences, focusing on the word 'so'. For each of them, decide if the tagger's output is correct, and explain how the tagger determined the POS tag.
  8. Based on what you have observed so far, offer a critique on the bigram tagger. What are its strengths and what are its limitations?


PART 2: Building a Better Tagger [15 points]

There are multiple ways to design a more complex tagger with a better performance: the book sections illustrate at least two obvious ways to achieve this. In this part, your task is to improve the bigram tagger we built in PART 1. Make sure to use the same training and testing data you used above, and do not over-write the original bigram tagger because you will need it around for comparison. First implement your new version of tagger, test it out to make sure the performance has indeed improved, and answer the following questions.
  1. Explain what you did to improve your tagger's performance.
  2. How much performance gain were you able to achieve? Was it as significant as you hoped?
  3. Make up a sentence on which your new POS tagger produces a better result. Explain why the new tagger is more successful with this particular example.
  4. Find a sentence from your test data that shows an improved tagging result by your new POS tagger. Explain how your new tagger was more successful in handling it.


SUBMIT

Two format choices:
  • MS Word answer sheet: HW7 n-tram tagger.docx plus your saved IDLE session file "HW7_shell.txt".
  • OR, you may submit a Jupyter Notebook file (.ipynb) if you're comfortable with the format.