« "Every mother ... they" | Main | The Limits of Quantification, Part II »

Collocations vs. cliches

12 Oct 2008 05:37 am

Written yesterday:

I'm on the plane to London, reading a review copy of a book with the - I have to say - unappealing title A Damp Squid. (Thereby hangs a tale, of course, which you can read when the book is published, in December.) It contains a lot of thought-provoking stuff about how dictionaries - in particular, the OED - are now made and what else we can learn from the tools that lexicographers use to make them.

Case in point: "collocates," words that go together naturally and relatively commonly, and "collocations," combinations of such words - for instance, "eccentric behavior" but "quirky perspective." Today's lexicographers can generate statistics about how often a given word appears before, after, and in the vicinity of other particular words. This helps them zero in on precise definitions, but the idea is interesting to me for other reasons.

It brings to mind clichés and the puzzle of how these differ from good collocations. Writers are constantly being told and telling themselves to use "fresh" language. If instead of "case in point," above, I'd written "case at issue," would that be fresh language, or would it just be weird? (I'd say the latter.) Is "fresh language" itself a cliché, or is it a desirable collocation? (In between.)

I'd love to be turned loose on the "corpora" - vast collections of text and speech - from which lexicographers generate those statistics. It would be fun to find out whether "fresh language" is stale and "eccentric perspective" quirky. My suspicion is that I'd just be quantifying what's known as an ear for language, and the project would be about as useful as, and useful in a similar way to, figuring out the differences in molecular composition between good and mediocre food. But let's see if I get a chance to ask the Oxonians about collocations. Information often contains surprises.

 PS: My computer's power ran out before I got to the end of the book. The material toward the end is on subjects I know well, like "style wars" and "usages people hate." It furthered my suspicion that quantification has its limits.

Comments (4)

The British Corpus is available online.

An examination of collocations could be much more fun than you think. As an example, have some adjectives evolved to become parasitic? Is, say, "ample" now found in a novel combination so rarely that it will die when "bosom" dies? Or will it adapt from its environment in romance novels to find a new home in the less ornate milieu of the event announcement by sprouting more stubbornly from "parking"?

RE: Ear for language

I was thinking that before I scrolled.

Maybe related:

Not too long ago I heard something on the radio about subvocalization while writing, and how, in combination with the task of typing, it can "lock up" the writer's brain. My family finds me infuriatingly unreachable when I am typing. It's not that I can't hear them, but there's something about fingers moving and mouth not moving, all while hearing the "music" of the words as clear as a bell in my head; pardon the hopelessly mixing metaphors, but it's like tunnel vision for my entire consciousness. (Yes, I'll get you a cup of milk just as soon as I finish this post.)

Anyways, yes, quantification has it's limits. I won't bore your readers by paraphrasing Potter Stewart...

There are several quick and easy online tools for discovering the frequency and significance of collocations. Here are two of the best:

1. BYU Copora

2. Cobuild Concordance and Collocations Sampler


These days we all have access to a massive corpus known as the World Wide Web. Do you want to know the relative popularity of "eccentric behavior" vs "quirky behavior"? Just punch the phrases into Google and get the page counts.