Go to: LING2050 home page   Lab pages index   Command reference sheet

Lab 1

Objectives: navigating through directories and files in terminal environment; displaying file contents
Reference: http://osxfaq.com/Tutorials/LearningCenter/
Pages covering this lab session: here, here, here, and here.
  1. [Graphical Desktop Environment] From NLTK Corpora page, download these two corpora: 2. Australian Broadcasting 2006, 20. Project Gutenberg Selections. Unzip them and place the two corpus directories where it's convenient for you to access them. Note their locations.

  2. Open up your terminal application. Move into one of the two corpus directories and examine its directory content. Move into the other and do the same. You will need these commands:
    pwd display current directory
    cd dir change current directory to dir
    cd .. change current directory to the parent directory
    cd c: [cygwin only] move into C:
    cd /cygdrive/c [cygwin only] move into C:
    cd change current directory to your home (default) directory
    cd ~ same as above ('~' refers to home directory)
    cd ~/dir move into dir directory under your home directory
    echo $HOME display your home directory (likely the directory your terminal starts in)
    ls list the files and directories in the current directory
    ls . same as above ('.' refers to current directory)
    ls dir list the files and directories in dir
    ls -l ... and some information on each file
    ls -a list hidden files (name starts with a ".", as in .bashrc) as well
    ls -F indicate file attributes (directory with /, executable with *)
    ls / list the content of your root directory (cygwin: C:/cygwin, OS-X: system disc root)

    Helpful tips:

    • If a directory name contains a space, you need to (1) enclose the name argument in ", as in: cd "Documents and Settings", or (2) append each space with \, as in: cd Documents\ and\ Settings

    • [cygwin] Windows uses the backslash to indicate path, as in: C:\WINDOWS\Fonts. In unix and cygwin, the forward slash is used: C:/WINDOWS/Fonts

    • [cygwin] Windows does not distinguish uppercase and lowercase letters in file and directory names; neither does cygwin. As a result, cd C:\WINDOWS and cd c:\windows achieve the same thing. However, cygwin is case-sensitive for command names and options: ls -F and ls -f work differently.

    • Absolute vs. relative path: An absolute path starts out from the root directory (/), as in /home/narae/documents and /cygdrive/c/windows. A relative path begins with a directory name, and that directory has to be within your current directory. Therefore, cd documents only works when documents is under your current directory; cd /home/narae/documents works anywhere.

    • [cygwin] Your home directory is typically /home/user_name. Since the root (/) is c:/cygwin, this home directory can actually be found here: c:/cygwin/home/user_name

    • Tab completion: You will be relieved to know that you don't have to type in entire file names. Pressing TAB half-way through typing a name triggers auto completion. System beeps when there are 2 or more matches; type in a couple more characters and try TAB again.

    • Command history: You can scroll through your previous commands by pressing up and down arrow. Hit ENTER when you found the one you want.

    • man page: Unix system includes " manual pages " for most commands. Try man ls if you want to find out how to use ls.

  3. Move into the Guttenberg corpus directory. Examine the text content of the file carroll-alice.txt. What do the first few lines of the file look like? How about the end of the file? You will need these commands:
    cat file print file to Standard Output (i.e., terminal window)
    more file print file, one screenful at a time (SPACE to forward, q to get out)
    less file print file, one screenful at a time (SPACE/PageUp to forward, b/PageDown to go back, q to get out)
    head file print first 10 lines of file
    head -m file print first m lines of file
    tail file print last 10 lines of file
    tail -m file print last m lines of file
    tail -n +m file print file starting from line m
    *NOTE: the old syntax tail +m is no longer supported in newer versions of tail.
    Printing the 6th line and on, old syntax: tail +6; new syntax: tail -n +6.

    The commands above can be combined to carry out more complex tasks. How can we display lines 50-100? How can we extract certain lines and save them into a different file? Try:
    head -200 file | more print first 100 lines of file and print one screenful at a time
    head -200 file | tail -10 print lines 191-200 of file
    tail -n +191 file | head -10 same as above!
    head -200 file > temp.txt print first 200 lines of file into another file called temp.txt
    head -200 file | tail -10 > temp.txt print lines 191-200 of file into another file called temp.txt
    rm temp.txt remove file temp.txt

    So how big is this file? How many characters, words, and lines are in this file? These are the commands to use:
    wc file print # of lines, words, and characters in the file, in that order
    wc -l file print # of lines only
    wc -w file print # of words only

  4. When you are done, you can close your terminal by typing: