Stream

Join this community to post or comment

Mura Nava
owner

Tools  - 
 
Email corpora
1) The most famous is no doubt the ENRON corpus e.g. [https://www.cs.cmu.edu/~enron/] or the ENRON Sent corpus [http://savethevowels.org/enronsent/]

2) There is the Business Letter Corpus [http://www.someya-net.com/concordancer/], shame interface to this is limited.

3) British Columbia Conversation Corpora (BC3): Email corpus [https://www.cs.ubc.ca/cs-research/lci/research-groups/natural-language-processing/bc3.html#email]

4) SPAM and non-SPAM email datasets [http://csmining.org/index.php/data.html]

5) Hilary Clinton email archive by Wikileaks [https://wikileaks.org/clinton-emails/].

6) US Democratic National Convention email archive [https://wikileaks.org/dnc-emails/].

Thanks to +Laura Adele Soracco for prompting this post.

related:
Enron corpus primer tutorial [https://plus.google.com/+MuraNava/posts/eMuCSNRPVRt]
3
Add a comment...

Mura Nava
owner

Tools  - 
 
Netcollo
Looks like a new improved version of [netcollo.stringnet.org]. Allows you to search for collocations and results indicate whether your search collocation is good or not.
Help page - [http://www.netcollo.info/netcollo/manual.php]

h/t Pérez-Paredes ‏@perezparedes
1
Add a comment...

Mura Nava
owner

Tools  - 
 
Rebeats.tv
Beta version of program that uses pop lyrics & video to practice English.
Based on corpus data and analysis. You can set-up account using made-up email address.

Had a quick play.
Videos needs to be able to be fast fowarded and rewound.

More info - http://www.tellop.eu/tell-op-workshop/

h/t Pérez-Paredes ‏@perezparedes
Learn through what you love
1
1
Michael Brown's profile photoMura Nava's profile photo
2 comments
 
yes it is not as well developed as lyricstraining
Add a comment...

Mura Nava
owner

Tools  - 
 
BootCat Top Tip
If you use BootCat here is a command to help you separate the collected corpus into individual files using the CURRENT URL line as a separator in a regex:

awk '/CURRENT URL/{g++} { print $0 > g".txt"}' corpus.txt

Be careful when copy pasting this command into your command line that the apostrophe ' is straight and not curly.

related:
BootCat custom URL: [https://eflnotes.wordpress.com/2014/10/08/building-your-own-corpus-bootcat/]
BootCat seeding: [https://plus.google.com/+MuraNava/posts/8JjUVKyA8FE]
1
Peter Parise's profile photoMura Nava's profile photo
2 comments
 
hi, please do Peter : )
Add a comment...

Lingua F
moderator

Tools  - 
 
Thanks to +Atlanta Bill .
1
1
Add a comment...

Mura Nava
owner

Reading  - 
 
A blog by +Viola Wiegand  on the Univerisity of Birmingham CL Summer School 2016 with links to the author's presentation on creating specialised corpora using various tools one of which is the new BYU CORE corpus of online registers.

Using the virtual corpora feature this is a great way to build custom corpora.

see also [https://plus.google.com/+MuraNava/posts/gvXu6N3mwuf]
1
Add a comment...

Mura Nava
owner

Tools  - 
 
Exploring audio clip tools using the BNC spoken corpus

There are now a number of audio corpus interfaces available that allow you to get examples of spoken English. These vary from Movies and TV shows such as Playphrase (http://playphrase.me) and Yarn (http://getyarn.io/yarn-popular), h/t Sandy Millin ‏@sandymillin, to Youtube videos such as Youglish (http://youglish.com/) and Divii (http://www.divii.org/).

A question arises as to what search terms to use, i.e. what are frequent terms that we could use in class? In order to get a list of frequent terms one can use the Spokes interface to the Spoken BNC (British National Corpus).[http://pelcra.clarin-pl.eu/SpokesBNC/#explore/kp/0/100]

It has a feature that lists all the common formulae such as the following top 3:
thank you very much
i don't know
isn't it

One now has a nice base in order to explore the tools mentioned initially. Of course one must bear in mind that the BNC is dated so current conversational English will not be represented well.

Thanks to Cara Leopold @eltinfrance for prompting this note.
3
1
Add a comment...

Mura Nava
owner

Reading  - 
 
Summary pdf of conference "Enhancing and extending corpora and corpora tools for learning and teaching" at Université Grenoble Alpes, France

h/t CorpusCALL FB group
1
Add a comment...

Mura Nava
owner

corpusMOOC  - 
 
the #corpusmooc is on again on 26 September 2016
do peruse the corpusmooc section here to get a flavour of course, +Michael Brown also links to some related reading here - https://corpling4efl.wordpress.com/2016/06/08/corpusmooc-16/

h/t Tony McEnery ‏@TonyMcEnery
Offers a practical introduction to the methodology of corpus linguistics for researchers in social sciences and humanities
5
Add a comment...

Mura Nava
owner

Tools  - 
 
CrowdED Corpus
great to see a publically available speech corpus, nice one +Andrew Caines and team : )
3
1
Add a comment...

Mura Nava
owner

Tools  - 
 
Intriguing announcement of a Turkish DIY corpus platform. There is some  English explanation on the FB post but I am none the wiser.

Anyone here who knows Turkish and can point out features of this platform?

h/t FB CL group
1
Add a comment...

About this community

A place to share, discuss, question etc anything and everything related to corpus linguistics and language teaching and language learning. If as a language teacher you are wondering what's the point have a look at the first link below. And if you are looking for more technical discussions I would direct you to the Facebook corpus linguistics group. Worth checking as well as signing up to Corpora List and/or reading its archives. The Lextutor Facebook group should also be in your reading list. CorpusCALL is a new FB group worth checking. Image is of Sue Atkins a lexicographer and one of the pioneers of the BNC & CoBuild projects. Read more about her in link below.

Mura Nava
owner

Reading  - 
 
A brief history of computer concordances
A nice find by +Michael Brown​​​  which describes more of the humanities side of concordancing.

The Language Log blog [http://languagelog.ldc.upenn.edu/nll/?p=26769] which links this piece also links to an hilariously tragic David Lodge extract on the dangers of corpus linguistics for writers [http://itre.cis.upenn.edu/~myl/languagelog/archives/000361.html].

Which reminds me to see if anyone can identify the corpus David Lodge uses in one of his later books? [https://plus.google.com/+MuraNava/posts/HF8hWXAUZfh]

related:
Corpus Linguistics: The past [https://plus.google.com/+MuraNava/posts/FBWcemupNMu]
A Brief History of Computer Concordances. Michael Preston. Computer-assisted study of folklore and literature was initiated shortly after World War II by Roberto Busa, S.J., who began preparing a concordance to the works of Thomas Aquinas in 1948, and Bertrand Bronson, who made use of the ...
1
Add a comment...

Mura Nava
owner

Reading  - 
 
Corpus Linguistics: The past
For those interested in the history of CL, Charlotte Taylor ‏@_ctaylor slides from Lancaster CL Summer School 2016 are available.

related:
A (brief) History of Computerised Coprus Tools - [http://timemapper.okfnlabs.org/muranava/history-of-computerised-corpus-tools%5D

A brief history of computer concordances - [https://plus.google.com/+MuraNava/posts/YkrMrcyN3Ju]
1
Add a comment...

Mura Nava
owner

Tutorials  - 
 
Mini CL course
if you are looking for a mini-course in CL check out the The corpus and Oxford Dictionaries [http://www.oxforddictionaries.com/words/the-oxford-english-corpus]. In particular the Using the Corpus section [http://www.oxforddictionaries.com/words/using-the-corpus]
Neato!
h/t Rudy Loock ‏@RudyLoock
1
1
Add a comment...

Mura Nava
owner

Tools  - 
 
GAVAGAI LIVING LEXICON
very neat tool, like the results display

h/t williamjturkel ‏@williamjturkel
LIKE WHAT YOU SEE? Try our Chrome Extension to be able to look up words that you find when you are browsing the web. Want to implement our word knowledge in your own applications? Sign up for a free trial of our API. Looking for in the lexicon ...
1
1
Add a comment...

Mura Nava
owner

Tools  - 
 
Florent Perek from the University of Birmingham gave a talk at the 2016 corpus linguistics summer school on using (new) BYU-COCA interface. His handout is linked here. H/t Viola Wiegand @violawiegand

Also +Michael Brown​​​ writes about using the new interface for a language question that arose in his class - https://corpling4efl.wordpress.com/2016/06/22/playing-by-ear-and-verbing-by-body-parts-using-coca-to-discover-usage/

see also [https://plus.google.com/+MuraNava/posts/BuusJWJEUhN]
Performance Tracers (debug users only). 57.775 Start Calling startPreProcess: DocsHttpForwardingProcessor 3 57.778 Start | [AfterLockServiceFilter] Before 13 57.791 Done 13 ms | [AfterLockServiceFilter] Before 0 57.791 Done 16 ms Calling startPreProcess: DocsHttpForwardingProcessor 0 57.791 ...
1
1
Add a comment...

Mura Nava
owner

Reading  - 
 
Sinclair Lecture 2016 by Professor Michaela Mahlberg (video).
This lecture series is always worth checking.

h/t Kim-Sue Kreischer ‏@kimsuekreischer
2
1
Add a comment...

Mura Nava
owner

Tools  - 
 
A corpus of 16,081 pairs of matched arguments from the Web with convincingness ratings

h/t corporalist
acl2016-convincing-arguments - Code and data for ACL2016 article "Which argument is more convincing? Analyzing and predicting convincingness of Web arguments using bidirectional LSTM" by ...
1
Add a comment...

Mura Nava
owner

Discussion  - 
 
Some reasons why +Michael Brown uses corpora
James Thomas has uploaded his slides from IATEFL ’16 (which I learned of by checking the always useful G+ CL group). I’m not reviewing them or discussing the finer details of his work h…
2
Add a comment...

Mura Nava
owner

Reading  - 
 
Free access to all articles in the journal Corpora in May. get them asap : )
h/t Olcay Sert ‏@SertOlcay
This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more. Menu. Advanced · Customer Services · Log In | Register. |. Journals · Browse; Librarians. Journal Ordering and Pricing · Subscription Activation · Open Access · Edinburgh Journals ...
2
Add a comment...