Other International Corpora

  • English Corpora in Corpora4Learning (Braun, S., 2006)

    This is a page produced by Dr Sabine Braun in University of Surrey where you can find the most widely known English language corpora in the world with short descriptions.

  • The International Dialects of English Archive (Meier, P., 1997)

    The International Dialects of English Archive was established by Paul Meier in 1997. It is the first online archive of primary-source recordings of English dialects and accents as heard around the world. All IDEA’s recordings are in English, including both English dialects and English spoken with other accents. The archive currently houses more than 1000 recordings by native speakers from different countries. Each recording includes both a reading and some unscripted speech-totaling about four minutes.

  • Speech Accent Archive (Steven, W., 2011)

    The speech accent archive was established by Prof. Steven H. Weinberger in George Manson University in 2011. This speech corpus provides a large set of speech samples from a variety of language backgrounds. Both native and non-native speakers of English read the same paragraph and are carefully transcribed. Users can compare and analyze the accents of different English speakers using this corpus.

Other Corpora of Asian English

  • A Corpus of Spoken PRC English (Deterding, D., 2005)

    The corpus aims to provide high-quality recordings of speakers from the People's Republic of China. All recordings were made directly onto the computer in the Phonetics Laboratory at NIE in Singapore. Thirteen students have been recorded so far. The language tasks include a passage reading (“The North Wind and the Sun”) and a short interview of two minutes.

  • Hong Kong Corpus of Spoken English (Cheng, W., Greaves, C., & Warren, M., 2005)

    The HKCSE is a large collection of texts representing spoken English in Hong Kong. There are currently 907,657 words in the HKCSE. Users can search for a word and find examples of its use in context. In addition, users can also search for an additional word in combination with your search word. The Hong Kong Corpora of Spoken English is comprised of four sub-corpora (academic, business, conversation and public).

  • The NIE Corpora of Spoken Singapore English (Deterding. D. & Low, E. L.,2001)

    The NIE Corpus of Spoken Singapore English aims to provide high-quality recordings of Singaporean speakers and facilitate acoustic/phonetic analysis of Singapore English. In order to eliminate background noise and thereby facilitate acoustic/phonetic measurement, all recordings were made directly onto the computer in the NIE Phonetics Laboratory. There are three major parts of the corpus: a passage reading of “The North Wind and the Sun”, an interview, and some examples of extra final consonants.

  • The Spoken English Corpus of Chinese and Non-Chinese learners in Hong Kong (Chen, H. C., 2020)

    The Spoken English Corpus of Chinese and Non-Chinese learners in Hong Kong aims to provide learners, teachers and researchers with high-quality authentic recordings. The corpus contains 20 sets of speech data collected from local Hong Kong learners, 96 sets of data produced by English learners from 9 dialect groups in mainland China and 20 sets of data provided by immigrant English learners from 5 ethnic groups in South Asia and Southeast Asia.

Useful Websites on Corpus Linguistics