Go to: LING2050 home page   Lab pages index   Command reference sheet

Lab Homework Assignment #2

One of the main goals of this homework is to practice using regular expressions, from within your terminal environment and AntConc. It is all too easy to fall back on a few regular expression syntax rules you are familiar with; please make an effort to include as many varieties of regular expression syntax as possible.


  1. Explore Gutenberg corpus within your terminal environment, using unix commands and regular expression search patterns. For each item, provide: your search syntax, top 20-30 lines of your result, and a short analysis.

    1. Search the corpus (original text files) for a linguistic expression of your choice. Use regular expression syntax.
    2. Try another one of the above.
    3. Search any N-gram (unigram, bigram, trigram, 4-gram) file for an expression/pattern of your choice.
    4. Try another one of the above; this time, process the result further to produce a frequency list.


  2. Again try exploring Gutenberg corpus, this time using AntConc. For each item, present a screenshot of your result, along with your own analysis of the result.

    1. Look up concordances of an expression of your choice. This one does not have to involve regular expressions.
    2. Look up concordances of an expression of your choice. Make sure to utilize regular expression syntax in your search.
    3. Look up frequent clusters involving a search item of your choice.
    4. Find collocates of a word of your own choice.

    NOTE: How to take a screenshot
    * PCs: Alt+PrtSc to capture current window, and then Ctrl+V to paste into your document.
    * Macs: see here http://guides.macrumors.com/Taking_Screenshots_in_Mac_OS_X.