10 November 2008

over two billion words in the Oxford English Corpus and some of them are rubbish

The Oxford English Corpus is at the heart of dictionary-making in Oxford in the 21st century and ensures that we can track and record the very latest developments in language today. By analysing the corpus and using special software, we can see words in context and find out how new words and senses are emerging, as well as spotting other trends in usage, spelling, world English, and so on. Using the corpus enables lexicographers to examine one word in detail by looking at all the different contexts in which it occurs.
The OEC now contains over two billion words. Some of these are just rubbish. One misuse that particularly irritates me include
Could of and would of

The Oxford English Corpus contains about 1,000 instances of could of and would of, as in I would of stopped her. About 850 of these occur in representations of direct speech (mostly from the Fiction domain, but also from interviews and courtroom transcripts).This leaves 150 instances of could of and would of as a genuine written form compared with 4 million instances of the standard English syntax would have and could have. However willing we may be to convert have to of in spoken English, the corpus shows that the habit has not spread into written English.

Jeremy Butterfield has collected words that are annoying in the book Damp Squid: The English Language Laid Bare.

**********
I was up very early this morning.

No comments: