Goal of the Lab class:
You will learn:
- How to navigate in a unix-like command-line interface
- The basics of unix commands
- The basics of text processing
You will be able to perform these tasks:
- extract words from a text
- extract word frequency tables
- sort these lists, based on frequency or alphabetical order
- use Regular Expression in your search
- perform string replace operations on text
- use ready-made text processing software to process your corpus text for:
tokenization, POS tagging, syntactic parsing
If you own a Mac, you will be using its Terminal application:
- Unix is native to OS-X; you will not need to install terminal software.
- The Terminal application can be found in: your-system-disk > Users > your-user-name > Applications > Utilities. The exact location might vary depending on your version of OS-X.
- Learn about OS-X's terminal here.
If you own a PC, you will be using cygwin terminals (and later cygwin-X terminals):
- Cygwin/X: a Unix emulator for Windows
- Project home page: here
- Wikipedia entry: here
- You will need to download and install Cygwin/X.
A step-by-step installation guide is found on this page. You can follow the steps, with two crucial differences:
- The installation method shown in the guide proceeds by downloading the cygwin installation package and installing all at the same time. This should work, in theory, but it is likely to fail in cases of Internet connection interruption. So, instead, breaking up the installation process into two stages is recommended:
(1) first, download the entire installation package, and then
(2) second, install from the local directory that contains the downloaded package.
In the installation stage shown in step 5. of the page, the second and the third options let users choose those routes, respectively. Therefore, in the first stage, choose "Download Without Installing" option, specify local directory for the files to be saved in, and exit the setup when download is complete. In the second stage, execute setup.exe again, this time choose "Install from Local Directory" in step 5., choose the saved directory when prompted, and then proceed to install.
* Note: In this class, DVD disks containing a pre-downloaded installation package were distribued for your convenience. What you did was essentially skipping (1) and going directly through (2).
- The default selection of software packages is chosen in this installation guide. Instead, we are going to install all packages. To do that, in step 15., click "Default" on the top of the package tree (see this image). It takes a while, but "Default" eventually turns into "Install" for all packages (see this image). You should do this in both stages (downloading and installing). You will need 4.5GB of free space on your hard drive.
- After installation is complete, launch cygwin by clicking on the cygwin icon on your desktop. Type in ls -la, followed by ENTER. You should see a result similar to the one in the screenshot above. For some, these three files might be missing: .bash_profile .bashrc .inputrc. If that is the case, type in cp /etc/skel/.* ~/ (pay attention to the period!!), followed by ENTER. Confirm that the three files are now there by typing in ls -la again. (Don't worry about .bash_history -- it will get automatically created.)