Work on a Python script for this assignment. But as always, your best bet is to switch back and forth between your Python script and shell: execute your script, try out follow-up commands in the shell, update your script with successful code, and repeat.
The questions in this assignment should be answered directly by your script: have the script print out the answers and/or include them as comments.
Document your script by inserting appropriate comments. The script is going to get long, so it should help keep your code organized and also facilitate my grading.
You might not like the particular flavor of phrase structure grammar I used here. Well trust me, I can/would love to give you a much bigger, more detailed, and overall better grammar! But do stick to this version for the purpose of this assignment: your rules and trees should exactly match the examples given.
In the trees and rules, use lower-case ('the', 'he') for all words, even at the beginning of a sentence. The only exceptions are the proper names ('Homer', 'Marge', etc.). This simplifies grammar development and parsing.
For the same reason, disregard punctuation and symbols for this assignment.
In this exercise assignment, you will be practicing building tree objects.
Build the following three tree objects as np, aux, and vp.
Using them, build two tree objects, named s1 and s2, for the following sentences. The trees
should look exactly like the ones shown on this page.
(s1) Marge will make a ham sandwich
(s2) will Marge make a ham sandwich
Build a tree object named s3 for the following sentence, using its full-sentence string representation.
(s3) Homer ate the donut on the table
First, build the tree string representation. It is easier to write it across
multiple lines with proper indentation using """. Be careful with the closing
brackets. Then, create the tree using the tree string.
>>> tree_str = """
(S (NP (N Homer))
(VP (V ate)
(NP (DET the)
(N donut))))
"""
>>> t = nltk.Tree.fromstring(tree_str)
>>> print(t)
(S (NP (N Homer)) (VP (V ate) (NP (DET the) (N donut))))
>>> t.pretty_print()
S
________|___
| VP
| _______|___
NP | NP
| | ___|____
N V DET N
| | | |
Homer ate the donut
>>>
Build tree objects named s4 and s5 for the following sentences.
(s4) my old cat died on Tuesday
(s5) children must play in the park with their friends
Once a tree is built, you can extract a list of context-free rules, generally called production rules, from it using the .productions() method. Each CF rule in the list is either lexical, i.e, contains a lexical word on its right-hand side, or not:
>>> print(vp)
(VP (V ate) (NP (DET the) (N donut)))>>> vp_rules = vp.productions() # list of all CF rules used in the tree>>> vp_rules
[VP -> V NP, V -> 'ate', NP -> DET N, DET -> 'the', N -> 'donut']>>> vp_rules[0]
VP -> V NP>>> vp_rules[1]
V -> 'ate'>>> vp_rules[0].is_lexical() # VP -> V NP is not a lexical ruleFalse>>> vp_rules[1].is_lexical() # V -> 'ate' is a lexical ruleTrue
Explore the CF rules of s5. Include in your script the answers to the following:
How many CF rules are used in s5?
How many unique CF rules are used in s5?
How many of them are lexical?
NLTK's Penn Treebank corpus represents its syntactic trees following this formalism. Load the corpus and explore. Hint: these are shorter and more manageable sentences: 0, 1, 7, 9, 45, 96, etc.
>>> from nltk.corpus import treebank
>>> tb_psents = treebank.parsed_sents()
>>> tb_psents[45]
Tree('S', [Tree('NP-SBJ', [Tree('DT', ['The']), Tree('JJ', ['top']),
Tree('NN', ['money']), Tree('NNS', ['funds'])]), Tree('VP', [Tree('VBP',
['are']), Tree('ADVP-TMP', [Tree('RB', ['currently'])]), Tree('VP', [Tree('VBG',
['yielding']), Tree('NP', [Tree('QP', [Tree('RB', ['well']), Tree('IN', ['over']),
Tree('CD', ['9'])]), Tree('NN', ['%'])])])]), Tree('.', ['.'])])>>> tb_psents[45].pprint() # same thing as print(tree)(S
(NP-SBJ (DT The) (JJ top) (NN money) (NNS funds))
(VP
(VBP are)
(ADVP-TMP (RB currently))
(VP (VBG yielding) (NP (QP (RB well) (IN over) (CD 9)) (NN %))))
(. .))>>> tb_psents[45].pretty_print() # easier to grasp, maybe? S
________________________|_____________________________________
| VP |
| ______________________|____ |
| | | VP |
| | | ___________|____ |
| | | | NP |
| | | | ____|_______ |
NP-SBJ | ADVP-TMP | QP | |
____|____________ | | | ____|________ | |
DT JJ NN NNS VBP RB VBG RB IN CD NN .
| | | | | | | | | | | |
The top money funds are currently yielding well over 9 % .
>>> tb_psents[45].draw() # A window pops up
SUBMIT:
Your Python script and saved IDLE shell output (.txt file)
Alternatively: Jupyter Notebook (.ipynb file) if you prefer. In-cell tree rendering likely will need additional configuration, such as installation of GhostView.