Post has attachment
KWICGrouper in CLiC
A new feature in the CLiC corpus tool for literary text announced at #ICAME38

h/t CLiC Dickens‏ @CLiC_Dickens

see also:
CLiC literary text corpus []

Post has attachment
Announcement of a diy concordancer at #ICAME38. Nice.

h/t Stefan Evert‏ @schtepf

Post has attachment
Is Intuition as Good as Corpus Frequency for Selecting Vocabulary?
An ELT Research Bites summary by +Anthony Schmidt. To corpus kool-aid drinkers that intuition can match corpus frequency may be surprising : ). Though research in this area seems to have been saying this for a while.

h/t ELT Research Bites‏ @ResearchBites

Post has attachment
A Preposition Corpus
Very neat tool, filters available at bottom of a results page such as complement properties | attachment properties. Very useful.

h/t +Sketch Engine

Post has attachment
Domain of One’s Own: A Corpus Study, Part 1 – Words and Voices
This is setting up to be a nice series, nice bonus that author has made code available.
h/t ℳąhą Bąℓi مها بالي ‏ @Bali_Maha

Post has attachment
Using YouTube as a corpus of spoken English
One for your diary on Saturday 10 December 2016 at 3PM GMT
by +Olya Sergeeva

Hi all,

I was trying to obtain a keyword list by comparing a corpus I have (around 8650 words) with Brown corpus as a reference corpus using Wordsmith 6.0 but I got this notice "text doesn't seem to be valid file for WordList" Please, could you help in this matter.

Thanks in advance

Exploring audio clip tools using the BNC spoken corpus

There are now a number of audio corpus interfaces available that allow you to get examples of spoken English. These vary from Movies and TV shows such as Playphrase ( and Yarn (, h/t Sandy Millin ‏@sandymillin, to Youtube videos such as Youglish ( and Divii (

A question arises as to what search terms to use, i.e. what are frequent terms that we could use in class? In order to get a list of frequent terms one can use the Spokes interface to the Spoken BNC (British National Corpus).[]

It has a feature that lists all the common formulae such as the following top 3:
thank you very much
i don't know
isn't it

One now has a nice base in order to explore the tools mentioned initially. Of course one must bear in mind that the BNC is dated so current conversational English will not be represented well.

Thanks to Cara Leopold @eltinfrance for prompting this note.

Post has attachment
CrowdED Corpus
great to see a publically available speech corpus, nice one +Andrew Caines and team : )

What code to use to remove all quoted texts from a corpus / or a long single text?
Anyone knows any line of code in any programming language that helps to omit quoted texts marked by quotations marks from a long text or a corpus of long texts? Will be really grateful for the help!
Wait while more posts are being loaded