Go to: LING2050 home page   Lab pages index   Command reference sheet

Corpus Linguistics Resources: Corpora

Web-Searchable Corpora

  • [link] BYU Corpora
    • [link] Corpus of Contemporary American English (COCA)
    • [link] Corpus of Historical American English (COHA): not yet released
    • [link] BYU-BNC: British National Corpus (cf. the official version below)
    • [link] TIME Corpus of American English
  • [link] Pitt ELI Online Data Search System (permission needed)
  • [link][pdf] Child Language Data Exchange System (CHILDES)
  • [link] Michigan Corpus Linguistics Home
    • [link] Michigan Corpus of Academic Spoken English (MICASE)
    • [link] Michigan Corpus of Upper-Level Student Papers (MICUSP)
    • [link] The John Swales Conference Corpus (JSCC): no online interface; downloadable transcripts
    • [link] Collins Wordbanks Online English Corpus Concordance Sampler (Part of Collins-COBUILD Corpus/Bank of English)

Downloadable/Easy-Access Corpora

  • [link] The John Swales Conference Corpus (JSCC), hosted by Michigan University
  • [link] The Lancaster-Oslo/Bergen Corpus (LOB)
  • [link] The Brown Corpus
  • [link] The Santa Barbara Corpus of Spoken American English
  • [link] International Corpus of English (ICE)
    *Also: see Corpus Archives and Indexes section below.

For-Fee/Limited-Access Corpora

  • [link] British National Corpus (BNC) by BNC Consortium (cf. BYU's online version above)
  • [home][catalog] American National Corpus (ANC)
  • [home] The Penn Treebank Project
  • [home][catalog] LDC Catalog by Type And Source
  • [blog][link] Web 1T 5-Gram Corpus by Google
  • [link] Cambridge International Corpus
    • [link] Cambridge and Nottingham Corpus of Discourse in English (CANCODE)
    • [link] Cambridge Learner Corpus (CLC)
    • [link] Cambridge and Nottingham Spoken Business English Corpus (CANBEC)
    • ... and many more.
  • [link] The Collins-COBUILD Corpus / The Bank of English Corpus
  • [link] Penn Parsed Corpora of Historical English
  • [link][book] International Corpus of Learner English (ICLE)
  • [link] Louvain Corpus of Native English Essays (LOCNESS)
  • [link] Longman Learners' Corpus

Corpus Archives and Indexes

  • [home][catalogue] The Oxford Text Archive
  • [link] NLTK (Natural Language Toolkit) Corpora
  • [home][index] Corpus Resource Database (CoRD) at Helsinki University
  • [home][catalog] LDC Catalog by Type And Source
  • [home] LinguistList Texts and Corpora page

Corpora in Other Languages

(Web-searchable ones only)
  • [link] Corpus del Espaņol (at BYU)
  • [link] Corpus do Portuguęs (at BYU)
  • [link] COMPARA: a bidirectional parallel corpus of English and Portuguese
  • [link] Corpus de Referencia del Espaņol Actual (CREA)
  • [link][ENG] The Russian Reference Corpus
  • [link] Hungarian National Corpus (Registration required)
  • [link] CORIS/CODIS: Corpus of Written Italian (Registration required)
  • [link] The German National Corpus
  • [link] The Hellenic National Corpus
  • [link] FLLOC: French Learner Language Oral Corpora
  • [link] SPOLLOC: Spanish Learner Language Oral Corpora
  • [home][concordancer] The Lancaster Corpus of Mandarin Chinese (LCMC)