Lecture 21: TreesCh.7.4 Recursion in Linguistic Structure: Trees, Tree Traversal | ||||||||||||||||||||||||||
Computational Representation of Syntactic TreesA tree is a set of connected labeled nodes. In linguistics, a sentence or phrase is typically represented as a syntactic tree; "Homer ate the donut" may be syntactically represented as:
How can we formally define a syntactic tree? It is important to note that a tree is recursive: a top-level tree is typically composed of multiple subtrees, which are also a tree themselves. (Question: How many subtrees constitute the whole tree above?) A smallest tree, then, looks like the following, which consists of the parent node and a list of child nodes. The individual trees correspond to a single context-free grammar rule: 'NP -> DET N' (left) and 'N -> donut' (right).
A tree can also be represented linearly as a text string. In a commonly adopted tree representation scheme below, the label immediately following the opening parenthesis is understood to be the parent node, and the rest its children until the tree is completed by the closing parenthesis. (NP DET N) (N donut) Below are complex trees receiving the same linear representation: (NP DET (N donut)) (NP (DET the) (N donut)) (S (NP (N Homer)) (VP (V ate) (NP (DET the) (N donut)))) For the sake of readability, large trees are often broken down across multiple lines with indentation indicating dominance relation: (S (NP (N Homer)) (VP (V ate) (NP (DET the) (N donut))))
The Tree Object in NLTK(Refer to these NLTK sections: Ch.7.4 Recursion in Linguistic Structure: Trees, Tree Traversal)NLTK has its own implementation of syntactic tree objects called nltk.Tree (same as nltk.tree.tree.Tree). Using its .fromstring() method, you can construct a Tree object by encapsulating a tree string representation. The resulting object is an NLTK Tree object, which is defined as a pair of (1) a single parent node 'NP' and (2) a list of child nodes ['DET', 'N'].
You can print the tree in the form of its string representation using the print command, and as ASCII-style tree drawing with .pretty_print(). In addition, you can call a handy applet .draw() that displays a graphical representation:
Constructing a more complex syntactic tree can be done in the same way. Note that each of the embedded local trees is formatted as a Tree object:
Tree objects can also be constructed directly by specifying the parent and the children as two arguments. The former is a single string ('NP'), and the latter is the list data type (['DET', 'N']):
Using this method, you can build up a tree object with smaller component trees:
Below, both tree-building methods (1. from a string representation using .fromstring(), 2. direct construction) are used.
NLTK's nltk.Tree is an object class, and it offers its own tree operation methods. Executing dir() on it displays the available methods:
|