Let's build a Finite-State Transducer lexicon for English. Mans Hulden's foma site ("Morphological analysis tutorial" under Documentation) gives everything you need to get started; we will have you follow it along, and then introduce some additional enhancements.
PART 1: The basics [20 points]
As the first step, follow the tutorial on Hulden's page. Complete the first three sections:
That has you stop right before the section called "Grammar Tweaking".
BUG ALERT! In the third section, the "english.foma" file is reading in the lexc file twice, as shown in this screenshot. You can keep either portion, but make sure there is ";" after "Lexicon".
One note about the flookup utility. Windows folks, I've got bad news for you. It won't work within cmd, Window's default console. It will work in its beefed up variant called PowerShell. You can ignore it for now, since flookup is not essential for this homework. (Cmd and PowerShell are both fairly limiting. At some point, you might want to cross over to the wonderful world of bash via Git Bash.)
When you're done, print out your FST through the "print dot" command. Then, in your console, convert the file into an image file called "english-fst-graph-part1.png". Examine the file and the FST network. You will be submitting this file later.
PART 2: Enhancements [30 points]
It's time to expand the grammar and introduce enhancements. By now, you have two script files: "english.lexc" and "english.foma". Continue to modify them.
A. Adding new lexical items
We currently have some nouns and verbs, but we could use a whole lot more. Add the following into your lexicon. At every step of the way, make sure your FST is producing the correct surface forms. Adjust your rules if necessary.
- Add three new nouns into your lexicon: panda, bus, and library.
- Add three new verbs into your lexicon: walk, debate, and study.
- Add this new verb: stop.
- Add these new verbs: nod and fold. Make sure you don't get *noded or *foldded.
One last note. This point should be obvious, but do NOT hard-code 'studied', 'finer', 'drinkable', etc. fully suffixed words into LEXC. They should be starting as 'study', 'fine', 'drink', and then go through a proper affixation process.
B. Exceptions, parallel forms
That make+V+PastPart mapping to *maked has been an absolute bother. Let's take care of irregular forms and exceptions here. We will also introduce some parallel forms.
- Follow the fourth section Grammar tweaking: adding exceptions (irregular forms), the 2nd option with Priority Union.
- You finally have made instead of *maked.
- While at it, add these irregular verbs: write and drink.
- Add these irregular plurals too: mouse/mice, child/children.
- Follow the fifth section Grammar tweaking: adding exceptions (parallel forms).
- You now have both cactuses and cacti as legitimate plural forms.
- While at it, add octopuses/octopi for octopus.
- What's the past/past participle form of sneak? Some say snuck, and others sneaked. Add both.
C. New part-of-speech: adjective
Next, let's branch out -- we will add adjectives as a new part-of-speech. We want the comparative and sueprlative forms too, so we will need three additional multi-character symbols: +ADJ (adjective POS), +Comp (comparative), +Supl (superlative). Under this scheme, the underlying form of slowest will be slow+ADJ+Supl.
- Add adjectives slow and cool.
- Add adjective fine. Avoid bad forms *fineer and *fineest.
- Now, add adjectives beautiful and excellent.
- Note that these forms do not take further inflectional suffixes: *beautifulest. That means you can't treat them like slow, cool, and fine. Make sure to set them up with an appropriate continuation class.
- On the other hand, don't try to have the FST produce "more beautiful", "most beautiful" etc. The system only models individual words; it should rightfully lack the comparative and superlative versions of the two adjectives altogether. Looking down beautiful+ADJ+Supl should therefore simply fail with ???.
D. Derivational morphology
So far, we have only modeled inflectional morphology. Let's add a couple of derivational morphemes: -ly which turns an adjective into an adverb, and -able which turns a verb into an adjective.
First, model slowly, whose underlying form is slow^ly+ADV. Note that the derivational suffix -ly is visible on the upper side too as ^ly. This is different from inflectional suffixes: in making, the suffix -ing is only represented as tags in the underlying form make+V+PresPart.
- You will need a new multi-character symbol +ADV.
- To have -ly present in both upper and lower levels, you will need something like:
^ly CONTINUATION-CLASS;
Note that there is only one arc label and no ":", meaning ^ly represents both upper and lower labels. ^ marker present on both sides, which indicates the morpheme boundary. The continuation class then takes care of supplying the +ADV tag on the upper side:
+ADV:0 #;
- Make sure you get beautifully and excellently as well as slowly, coolly and finely.
Next, model watchable, whose underlying form is watch^able+ADJ.
- Treat all verbs as capable of -able suffixation. This means we will also get some dubious words such as foxable and beggable; let's just tolerate them for this homework.
- Like with -ly, you will need the suffix to be present on both levels; use ^able:^able or simply ^able, which is the same thing.
- Notice that watchable as an adjective behaves like beautiful in that it does not undergo comparative and superlative suffixation. That means it should share the same continuation class as that of beautiful.
- Make sure you get correct forms: no *stopable, *makeable, *tryable.
- Just like beautiful, watchable should be further allowed to take on -ly. There is a wrinkle though, as you should be getting watchably instead of *watchablely. Create a rule called "LYMerge" to take care of this. Ultimately, watchably should get analyzed as watch^able^ly+ADV.
Word checklist
PART 1:
cat, city, fox, panic, try, watch (nouns)
beg, fox, make, panic, try, watch (verbs)
Added in PART 2:
panda, bus, library, mouse, child, cactus, octopus (nouns)
walk, debate, study, stop, nod, fold, write, drink, sneak (verbs)
slow, cool, fine, beautiful, excellent (adjectives)
adverbs ending in -ly, adjectives ending in -able
SUBMIT:
- PART 1: Upload the image file "english-fst-graph-part1.png".
- PART 2: Upload two script files: "english.lexc" and "english.foma".
You may add a couple of "by lines" at the beginning with your name, but careful, the LEXC and FOMA formats use different comment prefix characters: ! and # respectively. Examples:
! Na-Rae Han, Oct 28, naraehan@... <-- LEXC
# Na-Rae Han, Oct 28, naraehan@... <-- FOMA
|
|
|