Class Schedule (Revised)
W | Date |
Topic |
Readings |
1 | 1/06 (W) |
Introduction [pdf] |
[1]A1 |
2 | 1/11 (M) |
Corpus basics [pdf] |
[1]A1,A2 |
1/13 (W) |
Corpus basics [pdf] |
[1]A3 |
3 | 1/20 (W) |
Corpus annotation [pdf] |
[1]A4,A5 |
4 | 1/25 (M) |
Survey of available corpora [pdf] |
[1]A7 |
1/27 (W) |
Survey of available corpora [pdf]
Lab 1: Navigating in terminals; displaying files |
|
5 | 2/01 (M) |
Survey of available corpora [pdf] |
[1]A7 |
2/03 (W) |
Lab 2: Managing files; Searching text file contents [Exercise] |
|
6 | 2/08 (M) |
Snow day: class canceled* |
2/10 (W) |
Snow day: class canceled* |
7 | 2/15 (M) |
Collocation, frequency, corpus statistics [pdf] |
[1]A6,C1 [b], [c] |
2/17 (W) |
Lab 3: Text transformation; Word lists and type frequency tables [HW] |
[a] |
8 | 2/22 (M) |
Lexical studies: collocation, phraseology, semantic prosody |
[1]A10.2, (1) |
2/24 (W) |
Lab 4: Compiling N-grams; Regular Expressions |
[a] |
9 | 3/01 (M) |
Grammatical studies |
[1]A10.3, (2) |
3/03 (W) |
Lab 5: Regular expressions |
|
Spring break |
10 | 3/15 (M) |
Language variation studies |
[1]A10.4, A10.5, (3) |
3/17 (W) |
Lab 6: CLAN, AntConc |
[d], [e] |
11 | 3/22 (M) |
Contrastive and diachronic studies |
[1]A10.6, 10.7, (4) |
3/24 (W) |
Lab 7: NLTK |
[f], [g] |
12 | 3/29 (M) |
Corpora in language education:
Issues of language description; General and specific applications
|
[2]Ch.6,7,8; (5) |
3/31 (W) |
Lab 8: NLTK |
[f], [g] |
13 | 4/05 (M) |
Corpora in language education: Studies
|
[1]A10.8, (5) |
4/07 (W) |
Lab 9: R |
|
14 | 4/12 (M) |
Stylistics, stylometry, translation studies |
[1]A10.13, (6) |
4/14 (W) |
Lab 10: R |
|
15 | 4/19 (M) | Term project presentation |
4/21 (W) | Term project presentation |
16 | 4/30 (F) | Project paper due |
*A make up class will be scheduled.
|
Assignment Schedule:
- 1/20 -- 2/01 Class presentation: survey of two corpora of your own choice
- 2/17 Homework assignment: processing text files
- 3/03 Homework assignment: regular expression search, N-gram stats
- 3/31 Homework assignment: corpus processing using NLTK
- 3/01 -- 4/12 Class presentation: literature reviews and case studies
- Additionally, a few take-home exercises will be given for lab classes.
Main textbook:
[1] Corpus-Based Language Studies: An Advanced Resource Book. Tony McEnery
et al. Routledge, 2006.
*See References section below for articles covered in the B and C units of this book.
Supplementary books (a couple chapters will be used):
[2] Corpora in Applied Linguistics. Susan Hunston. Cambridge, 2002.
Ch.6,7,8 Corpora and language teaching
[3] From Corpus to Classroom. O'Keeffe et al, Cambridge, 2007.
[4] Corpus Linguistics. McEnery & Wilson, Edinburgh Univ. Press, 2001.
References:
All topics
- Martin Wynne (editor). 2005. Developing Linguistic Corpora: a Guide to Good Practice. Oxford: Oxbow Books. Available online from this [link]
- John Sinclair. 2005. "How to Build a Corpus" in Developing Linguistic Corpora: a Guide to Good Practice, ed. M. Wynne. Oxford: Oxbow Books: 1-16. Available online from this [link]
- [b] Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press. [link] Ch.5 Collocations [pdf]
- Bird, S., E. Klein and E. Loper. 2009. Natural Language Processing with Python. O'Reilly Media. [home][e-book][book]
- Gries, S. 2009. Quantitative Corpus Linguistics with R: A Practical Introduction. Routledge. [link]
- Baayen, R. H. 2008. Analyzing Linguistic Data: A Practical Introduction to Statistics using R. Cambridge University Press. [link]
- [c] Zipf's law [link]
- MacWhinney, B. 'The CHILDES System'. [pdf]
(1) Lexical studies: collocation, phraseology, semantic prosody
- [1]B3: Partington, A. 2004. 'Utterly content in each other's
company': semantic prosody and semantic preference'. International
Journal of Corpus Linguistics 9:1.
-
Hunston, S. 2007. 'Semantic prosody revisited'. In Moon, Rosamund (ed.), Words, grammar, text: revisiting the work of John Sinclair: Special issue of International Journal of Corpus Linguistics 12:2. [link]
-
Biber, D. 2009. 'A corpus-driven approach to formulaic language in
English: Multi-word patterns in speech and writing'.
International Journal of Corpus Linguistics 14:3.
[link]
-
Forchini, P. 2008. 'N-grams in comparable specialized corpora:
Perspectives on phraseology, translation, and pedagogy'. In Amanda
Murphy Römer, Ute and Rainer Schulze (eds.), Patterns, meaningful
units and specialized discourses: Special Issue of International
Journal of Corpus Linguistics 13:3.
[link]
(2) Grammatical studies
- [1]B3: Carter, R. and McCarthy, M. 1999. 'The English get-passive in spoken discourse: description and implication for an interpersonal grammar'. English Language and Literature 3:1.
- [1]B3: Kreyer, R. 2003. 'Genitive and of-construction in modern written English: processability and human involvement'. International Journal of Corpus Linguistics 8:1.
- Nesselhauf, N. and Ute Römer. 2007.
'Lexical-grammatical patterns in spoken English: The case of the progressive with future time reference'.
International Journal of Corpus Linguistics 12:3.
[link]
(3) Register and language variation
- [1]B4: Biber, D. 1995a. 'On the role of computational, statistical, and interpretive techniques in multi-dimensional analysis of register variation'. Text 15:3.
- [1]B4,C5: Biber, D. 1988. Variation Across Speech and Writing. Cambridge: Cambridge University Press.
- Biber, D. 1993. 'Using register-diversified corpora for general language studies'. Computational Linguistics 19:2. [link]
- [1]C4: McEnery, A. and Xiao, Z. 2004. 'Swearing in modern British English: the case of FUCK in the BNC'. Language and Literature 13:3.
- [1]B4: Lehmann, H. 2002. 'Zero subject relative constructions in American and British English'. New Frontiers in Corpus Research, pp. 163-177. Amsterdam: Rodopi.
- [1]B4: Kachru, Y. 2003. 'On definite reference in world Englishes'. World Englishes 22:4.
- Peters, P. 'A study of backchannels in regional varieties of English, using corpus mark-up as the means of identification'. International Journal of Corpus Linguistics 12:4. [link]
(4) Contrastive and diachronic studies
- [1]B5: Altenberg, B. and Granger, S. 2002. 'Recent trends in cross-linguistic lexical studies' in B. Altenbert and S. Granger (eds) Lexis in Contrast, pp. 3-48. Amsterdam: John Benjamins.
- [1]B5: McEnery, A., Xiao, Z. and Mo, L. 2003. 'Aspect marking in English and Chinese'. Literary and Linguistic Computing 18:4.
- [1]B5: Kilpiö, M. 1997. 'On the forms and functions of the verb to be from Old to Modern English.' In M. Rissanen, M. Kytö and K. Heikkonen (eds.), English in Transition: Corpus-Based Studies in Linguistic Variation and Genre Styles. 87-120. Berlin: Mouton de Gruyter.
- Millar, N. 2009. 'Modal verbs in TIME: Frequency changes 1923-2006'.
International Journal of Corpus Linguistics 14:2.
[link]
(5) Acquisition, SLA
- [1]B6: Gavioli, L. and Aston, G. 2001. 'Enriching reality: language corpora in language pedagogy'. ELT Journal 55:3.
- [1]B6: Thurstun, J. and Candlin, C. 1998. 'Concordancing and
teaching of the vocabulary of academic English'. English for
Specific Purposes 17: 267-280.
-
Flowerdew, L. 2009. 'Applying corpus linguistics to pedagogy: A
critical evaluation'
International Journal of Corpus Linguistics 14:3.
[link]
-
Mahlberg, M. 2006. 'Lexical cohesion: Corpus linguistic theory and its
application in English language teaching'. In Flowerdew, John and
Michaela Mahlberg (eds.), Lexical Cohesion and Corpus Linguistics:
Special issue of International Journal of Corpus Linguistics 11:3
[link]
-
Lu, X. 2009. 'Automatic measurement of syntactic complexity in child
language acquisition'
International Journal of Corpus Linguistics 14:1.
[link]
(6) Stylistics, stylometry, translation studies
-
Fischer-Starcke, B. 'Keywords and frequent phrases of Jane
Austen's Pride and Prejudice: A corpus-stylistic analysis'.
International Journal of Corpus Linguistics 14:4.
[link]
-
Grieve, J. 2007. 'Quantitative authorship attribution: an evaluation
of techniques'. Literary and Linguistic Computing 22(3).
[link]
- Dayrell, C. 2007. 'A quantitative approach to compare
collocational patterns in translated and non‑translated
texts'. International Journal of Corpus Linguistics
12:3. [link]
|
Corpus Resource Pages:
Corpora [Page]
- Web-searchable corpora
- Other easy-access corpora
- For-fee/limited-access corpora
- Corpus archives
- Corpora in other languages
Tools and More [Page]
- Organizations
- Tools and software
- Other corpus resource pages
Lab Pages:
- Lab 0: Setting up your computing environment
- Lab 1: Navigating in terminal environment; displaying file contents
- Lab 2: Managing multiple files; basics of searching text file contents [Exercise]
- Lab 3: Basics of text transformation; Extracting word lists and type frequency tables [Homework 1]
- Lab 4: N-grams; Regular Expressions
- Lab 5: More Regular Expressions; perl
- Lab 6: Using Antconc [Homework 2]
- Lab 7: Installing python and NLTK; using NLTK corpora
Lab Help Pages:
- Configuring your terminal environment [OS-X][Cygwin]
- Unix command reference sheet [Page]
Lab Forum
is located in Pitt's CourseWeb class page.
Lab References:
|