KWICGrouper in CLiC
A new feature in the CLiC corpus tool for literary text announced at #ICAME38

see also:
CLiC literary text corpus []

Announcement of a diy concordancer at #ICAME38. Nice.

Is Intuition as Good as Corpus Frequency for Selecting Vocabulary?
An ELT Research Bites summary by +Anthony Schmidt. To corpus kool-aid drinkers that intuition can match corpus frequency may be surprising : ). Though research in this area seems to have been saying this for a while.

A Preposition Corpus
Very neat tool, filters available at bottom of a results page such as complement properties | attachment properties. Very useful.

Domain of One’s Own: A Corpus Study, Part 1 – Words and Voices
This is setting up to be a nice series, nice bonus that author has made code available.
Using YouTube as a corpus of spoken English
One for your diary on Saturday 10 December 2016 at 3PM GMT
I was trying to obtain a keyword list by comparing a corpus I have (around 8650 words) with Brown corpus as a reference corpus using Wordsmith 6.0 but I got this notice "text doesn't seem to be valid file for WordList" Please, could you help in this matter.

Exploring audio clip tools using the BNC spoken corpus

There are now a number of audio corpus interfaces available that allow you to get examples of spoken English. These vary from Movies and TV shows such as Playphrase ( and Yarn (, h/t Sandy Millin ‏@sandymillin, to Youtube videos such as Youglish ( and Divii (

A question arises as to what search terms to use, i.e. what are frequent terms that we could use in class? In order to get a list of frequent terms one can use the Spokes interface to the Spoken BNC (British National Corpus).[]

It has a feature that lists all the common formulae such as the following top 3:
thank you very much
i don't know
isn't it

One now has a nice base in order to explore the tools mentioned initially. Of course one must bear in mind that the BNC is dated so current conversational English will not be represented well.

CrowdED Corpus
great to see a publically available speech corpus, nice one +Andrew Caines and team : )

What code to use to remove all quoted texts from a corpus / or a long single text?
Anyone knows any line of code in any programming language that helps to omit quoted texts marked by quotations marks from a long text or a corpus of long texts? Will be really grateful for the help!
