Alex Steer

Better communication through data / about / archive

How not to improve social mobility

764 words | ~4 min

(Note: Links updated, April 2014)

The government has a new white paper on social mobility. It's largely fairly sensible and well researched, but the recommendation that's caused the biggest splash is for a commission, to be led by Alan Milburn, to tackle perceived barriers preventing children from poor families getting a fair crack of the whip at entering top professions: law, medicine, that sort of thing. Predictably, the Guardian loves it, and the Telegraph hates it.

I think it's hugely misconceived and will be damaging to children from poor families. Here's why.

On intergenerational mobility, for a long time the standard claim has been that it is declining in Britain. This comes from comparison of the data on the links between parents' and children's income/educational attainment from two longitudinal studies, the National Child Development Survey (started 1958) and the British Cohort Study (started 1970). The later study showed lower intergenerational mobility (aka social mobility) than the earlier (link).

Problem is, it's not necessarily true - the longitudinal studies are necessarily always based on pretty old data. Some recent boffinry has suggested that the decline is not ongoing. However, it hasn't reversed.

In short, the big determiner of your earning power is your level of education, and that's pretty consistent between studies. What's changed is the relationship between your parents' earning power and your likely educational attainment, which has become stronger - if your parents are rich, you're more likely to do well at school, more likely to enter high education and more likely to get a degree, and that means you're more likely to be rich than a comparably intelligent child from a poorer family. The expansion of higher education over the last thirty years has been of disproportionate benefit to affluent middle-class families, with relatively little increase in uptake by young people from poor families.

The evidence suggests that the best way to increase social mobility (or, to put it another way, to decrease the strength of the intergenerational income/education correlation) would be to target investment to improve the likelihood of young people from poor families entering and staying in higher education. This is a huge job, as it involves tackling some pretty entrenched anti-achievement cultures (one of the defining features of poverty traps) and trying to give kids from poor families whose parents don't care about education the same will to succeed as kids from rich families with parents who really care. This, bluntly, is why spending more public money on some children's educations than on others is fair, and not unfair as it seems. It's not a declaration of war on middle-class families; it's trying to give poor kids the same support that middle-class kids get at home anyway. This is not exactly the rampage of socialism it's made out to be. Yes, it's a shame that there are some families who don't give a damn about their kids' futures so the government has to bail them out, but the roots of the problem usually go back through generations of neglect, and the alternative is to admit that you're happy to see some kids fail because of who their parents are. And if you admit that, you might as well just build a wall round the sink estates and turn on the hose.

Which is where the new white paper comes in. The commission idea is half-naked electioneering, basically, and clearly an underlying bugbear of the Prime Minister's: people are calling it, not unfairly, the 'Laura Spence agenda'. The problem is, haranguing the professions because a lot of their working practices look particularly alien to kids from council estates will only annoy the professions (as it annoys Oxbridge, who put phenomenal effort into targeting kids from non-traditional backgrounds) and not get you very far. Top professions will continue to employ people with top qualifications, and these are disproportionately white, from middle-class parentage, and often privately-educated. All it does is remind people that the Shadow Cabinet is stuffed with Etonians, barristers and other assorted Tory Boys. And maybe that's all it's meant to do.

# Alex Steer (22/01/2009)

The story behind the headline (isn't as good)

282 words | ~1 min

Here are three of this week's more luminous education headlines:

Scared pupils wear stab vests in school (Independent)
School pupils wear stab vests to protect themselves from gangs, report says (Daily Telegraph)
Terrified kids 'wear stab vests at school' (Daily Mirror)

Obviously not good if true. Delving into the Telegraph article, though, we find this:

The report said teachers at one inner city comprehensive were "aware of young people wearing bullet-proof/stab-proof vests in school". One pupil told researchers he wore body armour because he "needed to".

One confirmation, and one school where teachers are 'aware of young people' doing it. This leaves us with the strong possibility that there are fewer kids wearing stab vests in schools than there are headlines about kids wearing stab vests in schools.

Sadly the report (written by Perpetuity Consulting for the NASUWT) doesn't seem to be available online, so we can't evaluate the data to see if there really is an epidemic of ironclad yoof or whether the authors are reaching a bit too far to get their hard work into the papers.

# Alex Steer (22/01/2009)

The promised land again

301 words | ~2 min

There's been no shortage of analysis of Barack Obama's victory speech in blogs, broadcasts and newspapers today. As far as I can see, though, this aspect of the speech hasn't received much attention.

Speaking of the very immediate challenges facing America - `two wars, a planet in peril, the worst financial crisis in a century' - Obama said this:

The road ahead will be long. Our climb will be steep. We may not get there in one year or even one term, but America – I have never been more hopeful than I am tonight that we will get there. I promise you – we as a people will get there.

This, on the surface, refers to the economic, social and diplomatic challenges of the future - and it's to the future that this campaign seems always to have looked. But the image, and slight aspects of the language, of this line from the first African-American president-elect have their roots in 1968, and these words:

I just want to do God’s will. And He’s allowed me to go up to the mountain. And I’ve looked over. And I’ve seen the promised land. I may not get there with you. But I want you to know tonight that we, as a people, will get to the promised land!

They are from Martin Luther King's final speech, given on April 3rd, 1968 in Memphis, Tennessee, the day before he was assassinated. Last night's speech reminds us that this outcome did not come from nowhere, and pays a quiet but powerful tribute to the civil rights movement by putting its ethos at the heart of Obama's articulation of what the American future means.

In short, a brilliant piece of speech-writing.

# Alex Steer (05/11/2008)

The articulation of your differences

444 words | ~2 min

This article from The Age begins with the most incredibly mangled sentence:

John McCain finally succeeded in articulating clear differences in how he would tackle the economic crisis to that of rival Democrat Barack Obama.

This is made up of an impressive number of syntactic and stylistic wrong turns.

'Differences in' instead of 'differences between', making it seem like there's been a change in a single thing (i.e. 'I've noticed such a difference in you since you gave up smoking') rather than a distinction between several ('There's a big difference between lemons and melons').
The use of an indirect question as a noun phrase ('how he would tackle the economic crisis'). There's nothing wrong with this - it's fine to say 'there's a difference between how you make toast and how I make toast' - but it's followed by...
'To' as a differentiating preposition. There's a long-standing convention, which may be a bit fuddy-duddy and is often more a hindrance than a help, that one should use 'from' here: 'I am different from you', not 'I am different to you'. However, that's not the big problem. The problem is that this differentiating construction only really works after the adjective 'different'. Here it's used to follow up the phrase 'articulating clear differences in how he would tackle the economic crisis'.
'Rival Democrat Barack Obama'. This is a problem of framing. If I were a stuntman, and there were another stuntman called Bob Evans with whom I were competing for work, I could describe him as 'my rival stuntman Bob Evans'. This is pretty common, and implies that we are both stuntmen, and he is my rival. However, if I were a sewage worker, and I were in a competition to find the best person whose job title began with 'S', and Bob was in that competition too, I'd describe him as 'my rival, stuntman Bob Evans', or as 'my stuntman rival Bob Evans'. Why? Because, in 'my rival stuntman Bob Evans', 'rival' is an adjective; in the other examples, it's a noun denoting Bob. Likewise, 'rival Democrat Barack Obama' makes 'rival' look like an adjective, and so implies that McCain is a Democrat too, which he's not. 'His Democrat rival Barack Obama', or 'his rival, Democrat Barack Obama' would fit better.

In short: top work, The Age.

PS - no, this hasn't turned into a 'ranting about bad writing' blog, but this one was hilarious, and hopefully a bit instructive.

# Alex Steer (17/10/2008)

Nobody's guide to linguistic marketing

931 words | ~5 min

I'm looking at the moment for research into the application of linguistics to marketing. Compared to the work on applying neuroscience to marketing (known, imaginatively, as neuromarketing), there doesn't seem to be much. If you've seen any, please let me know.

One I have turned up is:

Zhang, Shi, Bernd H. Schmitt, and Hillary Haley (2003), 'Language and Culture: Linguistic Effects on Consumer Behavior in International Marketing Research,' pp. 228-242 in Handbook of Research in International Marketing, ed. S. C. Jain (Edward Elgar Publishing).

It's written by two marketing/business academics and a Psychology PhD student, and describes two sets of experiments performed on a group of native Chinese speakers and a group of native English speakers (none of them English-Chinese bilingual), to see whether linguistic features present in one of those languages but not the other could affect consumer behaviour. I'll only deal with the first of these experiments here, but it's instructive as a model of how not to do research into language, cognition and behaviour.

The paper starts by drawing on the Sapir-Whorf hypothesis, which is never a good sign. For those not familiar, the hypothesis (in its usual strong form) holds that the language a person speaks affects his/her thought about and understanding of the world, and behaviour in it. This sounds like it might be sensible, but, when you think about it, it isn't. Steven Pinker summarises the arguments against well in The Language Instinct, but in brief the main problem is that if you think the hypothesis is true you have to show that language is affecting thought, not just reflecting it. None of these criticisms of the hypothesis (and they've been going round for decades now) make it into this paper. That, frankly, rings alarm bells.

Now, the experiment. Chinese has a lexical class called the classifier, which doesn't exist in English. Classifiers assign nouns to semantic classes: so, for example, the classifier 'ba' denotes things that can be grasped; the classifier 'tai' is used for electrical and mechanical equipment. The authors take the Sapir-Whorf hypothesis and make a classic prediction: that native Chinese speakers will be more likely to see similarities between nouns that have the same classifier than native English speakers will be to see similarities between the equivalent nouns in English.

There's an obvious and horrible problem with this prediction: even a well-designed experiment that showed such an increased likeliness among Chinese speakers would not be showing that the Chinese language was affecting their thoughts. The most it could show would be that the Chinese language reflects certain cognitive categorizations that Chinese speakers are capable of performing. And, of course, we already know that - if the Chinese language couldn't make semantic classifications, classifiers wouldn't exist in Chinese! The logic is perfectly circular.

Nor does it imply, by the way, that because a language doesn't have classifiers its speakers are incapable of grouping nouns semantically. English doesn't have classifiers, but the authors of this paper have no real difficulty in explaining the meaning of various Chinese classifiers in English to readers who obviously have a knowledge of English! We can see that TVs, radios and computers have similarities without the classifier 'tai' to help us out.

So let's throw the Sapir-Whorf hypothesis out. Is there still the potential for a useful experiment here? Absolutely. It could still show that the classifier system make Chinese speakers more likely to make associations between words in the same class than their English-speaking counterparts. This needn't be language affecting thought in any meaningful sense, but it could be the case that the classifiers cause a priming effect.

However, the experiment is not well designed. The participants were given pairs of nouns and asked to rate their similarity. The finding, unsurprisingly, was that Chinese speakers were more likely to rate words in the same class as similar. If this tells us anything, it tells us simply that people, given two objects whose similarity is asserted every time they are mentioned (by virtue of the classifier), will come to think of those things as similar. But this is simply priming, not language affecting thought. However, no attempt seems to have been made to filter out phonological effects. The very fact that both nouns in a presented pair, when included in full sentences, always have the sound 'ba' right before them, might also cause a speaker to see a connection between them that need have nothing to do with semantics.

The other experiment in the set is rather better designed, by the way, since it uses photographs instead of words, thus limiting the potential for priming. However, the potential is still there, and no attempt is made to account for it or eliminate it, nor is there anything to make a convincing case for the Sapir-Whorf hypothesis. The conclusion - that advertising targeted at native Chinese speakers may be better received if it uses words or images belonging to the same semantic class as whatever it's trying to promote - is interesting, but tells us everything about marketing and nothing much about language.

# Alex Steer (30/09/2008)

Learning linguistics with Dizzee Rascal

593 words | ~3 min

Linguistic matters can crop up in all sorts of places. Here's one from the lyrics to a song that's hanging around in the Top 10, Dance Wiv Me, by Dizzee Rascal vs Calvin Harris:

Get away from the bar Tell your boyfriend hold your jar And dance wiv me.

What might sound a bit odd here is the sentence, Tell your boyfriend hold your jar.

Tell, like other verbs of command or request, takes a dative object and the infinitive with to when it's used to form phrases of indirect command. Indirect command is a kind of indirect speech. I told Bob to clean his room is an indirect command; the direct command that it occasions or implies is the Clean your room (to Bob). Other kinds of indirect speech include indirect questions (Ask Tim if he wants a pint occasions Do you want a pint?) and indirect statements (Remind Nell that her dog needs feeding occasions Your dog needs feeding).

Assuming we know that jar is slang for 'drink', there would be nothing odd about the phrase Tell your boyfriend to hold your jar, just as there is nothing odd about Ask your friend to phone the police, or Order the troops to clean their boots. The line from the song is odd because it uses the base form of the verb: the infinitive, minus the 'to'.

There are verbs in English that take an object plus the base form. Most of them are causative verbs: verbs which, unlike tell or ask, make things happen, rather than just requesting or commanding that they happen. These verbs include let, have, and make: I let him sleep on the floor, I had him weed the patio, I made him sell the car. All of these take the base form, not the infinitive: it would be odd to say I made him to sell the car in modern English. (In Middle English and Early Modern English it was acceptable.)

(This can't, by the way, be an under-punctuated bit of direct speech. It's not 'Tell your boyfriend, "Hold your jar."' That would imply that the jar belonged to the boyfriend, not the person being asked to dance, and would make no real sense.)

It's probably unwise to assume that Dizzee Rascal and Calvin Harris are showing off an instance of a widespread change in the usage of tell. Another Dizzee Rascal song, You Were Always, you'll find the line:

You were always Telling me to do this Telling me to do that.

This suggests (to go a bit Chomskyan) an oddity in performance, not in competence. Rascal and Harris clearly don't have different rules for forming indirect command phrases from the rest of us. Nor is this an error in performance (since any song is necessarily fairly deliberate), nor apparently a deliberate sociolinguistic trick (they're not impersonating any kind of language or discourse). It's an arbitrary variation of a grammatical rule by an individual language-user, which is interesting enough to send us all scurrying to Google Scholar to find out whether existing theories of syntax allow for deliberate intra-speaker syntactic variability.

Or you could just go and listen to some other music instead.

# Alex Steer (13/08/2008)

The plausibility effect; or, if it sounds true, it must be true

809 words | ~4 min

Ben Goldacre, in his excellent Bad Science column in the Guardian, recently examined what he calls the plausibility effect: the tendency for people to believe statements based on the perceived credibility of the person making them, unconnected to any evident truth value. If this is true in various branches of science and medicine, it is just as true when it comes to people's reactions to words - and, in particular, to opinions on the origins of words and their correct usage.

Sometimes people develop unusually strong attachments to their favourite stories about the origins of words, to the point that they don't much like being told, for example, that posh is not an acronym for 'port out, starboard home', the f-word never stood for 'for unlawful carnal knowledge' (or 'fornication under consent of the king'), and the window tax did not give rise to the term daylight robbery. I don't make a habit of this sort of mythbusting, by the way, unless people ask: that's probably the third quickest way to use language to lose friends and alienate people. (The second is to go around correcting apostrophes; the first is to correct people's grammar while they're talking. Or possibly instituting a language policy to marginalize a whole section of society. I can't decide.)

Even so, when people do show off their favourite etymology stories to linguists and lexicographers, and those linguists and lexicographers sigh a bit wearily and say that it's an old story, but there's no data, people often give the same reply. This is: but you can see how it could be true, or some variation thereof.

This is the plausibility effect at work in a rather strong way, being used as a kind of defence. There is good reason, the plausibility defence says, for thinking that an explanation of a word's origin is true, because it sounds like it could be true. That defence, in itself, sounds superficially plausible. But, when you think about it, it doesn't make sense.

Most explanations for the origins of things will, in the absence of evidence and context, seem plausible. That's the whole point of explanations: they allow us to make sense of things, so they must at some level be sensible. The acronym explanation of the 'f-word' is much better explanation of its origin than the explanation which says that the word appeared in the late Middle English period of the language and its etymology can only be guessed at. The latter is a terrible explanation. Unfortunately, it's the one that's supported by the evidence.

Evidence is, after all, annoying: it gets in the way of a good explanation. That's why false explanations are neater: they can be comprehended without recourse to the messy details of the evidence, and so their narratives are cleaner and, superficially, do a better job of passing the plausibility test. This does not, it should be added, make them correct. It makes the plausibility test a very bad way of assessing the origins of words. This is a variant of the same 'easy narrative vs. evidence base' argument which, in much more controversial form, keeps the debate over intelligent design rumbling on. Intelligent design not only lacks any evidence in its favour, it is methodologically flawed to the point that it cannot be considered science. It relies entirely upon the plausibility effect, and the ease with which this can be applied in contexts - like biology, like historical linguistics - where seeking out and analysing the evidence takes a lot more time and effort than believing a good story. (An important caveat is that the ID debate, to a greater extent that etymological controversies, is fuelled by arguments in which some 'evidence', often quite detailed, is marshalled. Nonetheless, the distinction between good science and bad lies in the willingness to have one's evidence overturned by other, better evidence. This is good science. Bad science frequently involves being so convinced of the plausibility of one's initial explanation that one discounts any good evidence to the contrary instead of revising one's theory.)

If you're interested in the relationship between where words actually come from and where we like to think they do, start with Michael Quinion's excellent Port Out, Starboard Home, which should be enough to make you wary about plausible etymologies for the rest of your life. Which means, in turn, that you will be saving the sighs of linguists and lexicographers by not passing around etymological fairy stories. Which means that you will not make enemies of linguists and lexicographers. Which is good: they're not a powerful lobby, but they write very long letters.

# Alex Steer (28/07/2008)

Scottish dictionaries need money

602 words | ~3 min

Scottish lexicographers may be the most hard-to-sell charitable cause in the world, but here goes.

Scottish Language Dictionaries, a registered charity based at Edinburgh University, has had its funding cut by the Scottish Arts Council. This is very bad news for contemporary and historical Scots lexicography. SLD is responsible for producing the Dictionary of the Older Scottish Tongue, a historical dictionary of Scots up to 1700, and the Scottish National Dictionary, which covers the period from 1700 to the present. It also produces a range of more concise dictionaries. Its website, http://www.dsl.ac.uk, receives between 15,000 and 23,000 hits per day. For anyone who believes that Scots is worth studying, SLD's output is the best resource, doing for Scots what the OED does for English. It's also regularly consulted by lexicographers of English (there being no absolute boundary between English and Scots) and by historical linguists and literary historians (and historians generally). If it goes, a lot of people (and an entire country) lose a first-class linguistic resource.

SLD needs core unrestricted donor funding to replace that cut by the SAC. It's come up with a few novel ways to raise funds. It's holding a Scottish-celebrity-endorsed auction on eBay, hoping to stir up a bit of fervour for the Scots language. Dedicated word-fanatics can sponsor a word in the Concise Scots Dictionary for £20. These kinds of donation are useful, but they are also one-off. Charities need sustained income to allow longer-term financial planning. Since lexicography is a slow process, long-term funding is essential. If you want to do your bit to support Scottish lexicography, consider membership: it's £20 a year, or £9 if you're unwaged or a student.

Of course, it takes a lot of new membership applications to make up a government funding shortfall. What SLD really needs is support from major donors in the form of unrestricted funding to allow it to keep doing what it does. You can read the assessment of its unsuccessful but highly-recommended grant application to the SAC here. It requested £315,556 over the next two years, which gives an indication of the level of donation required to keep it doing what it's doing. It had a matched funding agreement in place: it's not clear from SLD's website whether this matched funding will still be provided, so it's possible SLD may need twice the requested amount. The SAC has issued a joint statement with SLD affirming its wish to find a solution to this funding problem. This is good news in the longer term, but for the immediate future SLD's operating reserves will only take it to about November before it has to start closing down parts of its operation.

So, if you're a potential donor who's interested in Scots lexicography, get in touch with SLD and ask them how you can help.

[Via: Grant Barrett, Language Log]

# Alex Steer (14/07/2008)

We do love dummy verbs. We are loving them right now.

568 words | ~3 min

Walking through London Bridge tube station the other morning, I heard the following pre-recorded announcement, or something like it:

Ladies and gentlemen, we do apologize for the reduced gate entry to the Northern Line.

I've heard similar announcements on the tube. Am I right in thinking that the emphatic do-insertion is fairly new? I don't recall having heard it before quite recently. Of course, the pre-recorded messages may vary from station to station.

Before this heads too far into trainspotter territory, it's worth pausing to think about the effect of that dummy verb. As well as being emphatic, it renders the rather dry 'we apologize for the reduced gate entry' into something perhaps slightly less formal, as well as more heartfelt. Compare 'I apologize' with 'I do apologize' when someone's in your seat on a train. (You could research this, but you'd spend a lot on tickets.) This may be because going round the houses a bit with one's language is a way of seeming unguarded, and of taking the edge off what could otherwise seem very pointed. Oddly, in English, it seems the emphatic insertion can have two levels of intensity. 'I do apologize' can be an politeness that suggests you really mean it if it comes after 'Excuse me, you're in my seat', and something much more heartfelt if it comes after 'The problem with you is you never apologize!' (Here, though, the emphasis is a result of using a repetition of the main verb from the previous statement in contradicting of its assertion.)

Does the same blunting effect apply to the use of the present continuous with verbs expressing emotional states? 'I'm loving your work', 'I'm hating this pub': here, perhaps the circumlocution is a kind of circumspection, a way of saying something but not too directly. Or too permanently: the present continuous differs from the do/am-insertion because it also places the expressed emotion within a timeframe, one that may become discontinuous. 'I hate this pub' is fairly final. 'I am hating this pub' can refer more openly to this pub as it is now, or simply to how you're feeling about it now. If the crowds disperse, the music on the jukebox gets better, and the prices drop for happy hour, you can be loving this pub in no time. But consider your disappointment if someone you had your eye on finally turned to you and admitted, 'I'm loving you'.

All this, and I haven't mentioned McDonalds once.

Update: Or Facebook, either. Some commentators have credited their long-standing insistence that status updates start with 'is' with the popularization of this time-limited emphatic use of the present continuous. My sense is that it's been very common for longer than that, though. A possible contender for its upsurge is Do You Really Like It?, a 2001 garage classic from one-hit wonder DJ Piped Piper and the Masters of Ceremonies, with its call-and-response refrain:

Do you really like it? Is it, is it wicked? We're lovin' it, lovin' it, lovin' it! We're lovin it' like this!

Though this is total guesswork, and really just an attempt to get more strange lyrics onto this blog under the feeble cover of sociolinguistics.

# Alex Steer (09/07/2008)

Another slice of dictionary, Mr Shakespeare?

1194 words | ~6 min

Book history tells us that books are the results of processes, and that what we might think of as the 'final form' of a book might, in fact, not be.

Consider this post, for example. The activity of writing it is, of course, a process: its producer (that's me, though it's unfashionable to say so) writes text, a process which takes time. He can also delete and rewrite text. Nor is the time taken necessarily continuous: the producer can go pause for lunch, go to work, go on holiday. He can even go back, in the case of a blog post, and change things after he's saved it. It's even possible that he might get bored, abandon the work (as Coleridge supposedly did when interrupted during the writing of Kubla Khan), even die, and be replaced as producer by someone else who adds more material, and gives more time to the work. (This is what happened, very notably, in the case of the medieval French poem the Roman de la Rose. It's what didn't happen to Dickens's final novel, The Mystery of Edwin Drood: it is unfinished.) I'm not dead, for the record - but you've only got my word that I'm the same person who began writing this post. Writing is a process; it takes time. We tend to think of the 'final form' as the last possible snapshot in a series, the point after which the text does not change.

But form is not just about language. You might view the 'final form' of this post, once I've hit 'publish', on a computer screen. But your 'final form' is different from someone else's, as it's on a different screen. And what if you print the post out? Then the 'final form' takes on a very different material form, no longer pixels on a screen but ink on paper. Your Penguin paperback copy of Hard Times may contain the same words as the serialization of the novel in Household Words in the 1850s, but its form is not the same. In Household Words it was surrounded by periodical journalism; in your Penguin paperback, by critical notes. In serialized form it was packaged for reading in small chunks; in paperback form, as a whole novel. Such differences in material context might affect your reading of Hard Times, just as watching a box set of the first season of The West Wing might make you react differently from someone who watched it unfold over an entire season for an hour a week. Republication is all part of the process of making and remaking texts.

As well-adjusted literary critics, we're getting used to this, but sometimes it's easy to forget, especially when the form of a work seems more or less fixed. At the extreme, consider the reaction to early modern humanist textual critiques of biblical texts, or more recent critical studies of the Koran. Holy books can sometimes resist being read as processes (or, at least, some of their interpreters can be quite resistant to this idea). Less extremely (and here we get to the point), consider the OED.

It's easy to think of the Dictionary as something immutable - or, even, to think that historically it has been so, even though it is now under revision. This can lead to attempts to treat the OED either as if it is locked permanently in the past or as if it belongs entirely to the present. The latter approach can lead to unfair accusations. One of the most common, which is my example here, is the idea that the OED's editors have been biased towards Shakespeare because they attribute to him so many first uses for words which a little light searching can antedate.

This has been a curiously long-standing accusation. (For one of its most sustained developments, see John Willinsky's book Empire of Words: The Reign of the OED.) As soon as you start to think of the OED as a process and not an object, though, it unravels.

For starters, the OED was not published en bloc. The 'first edition' was published in alphabetical fascicles (rather like serializations, but less racy) between 1884 and 1928. Supplements to OED1 were published in 1933 and in the late 1970s/early 1980s. In 1989 the texts of OED1 and the Supplement volumes were compiled to form OED2. The current revision, OED3, is adding lots of new material and revising and improving upon the old.

It's now 2008, 124 years since the first fascicle of OED1 was published. Thinking about it like that, it's very clear that the OED is a process, and that's why it's faulty thinking to accuse OED1 text of being biased towards Shakespeare because you can find an antedating for a word on the internet. The internet (and here's a shocker) did not exist at any point between 1884 and 1928. Indeed, research resources for the history of English were pretty basic compared to today, especially for the medieval and early modern periods. The OED's lexicographers used what material was available to them at the time. Shakespeare, unlike many since-rediscovered Elizabethan authors, was well represented in critical editions, concordances, etc. Put simply, if you were looking for evidence of a word in use in early modern England, and both Shakespeare and an obscure pamphleteer had used it, you might find the Shakespeare usage and not the pamphleteer because you had a Shakespeare concordance or a text of Shakespeare to hand, and the pamphlet was in some remote library somewhere, unknown. Of course, the editors and their assistants did read lots of obscure early modern texts, but they couldn't catch as much as a full-text search through Early English Books Online can today. Seen today, this can look like a pro-Shakespeare bias, but it's a selection bias, not a genuine editorial one. To try to apply the 'bias' line to first usages, in particular, is ridiculous: the OED editors always looked for the earliest uses of words they could find, and they always included them. They did not hide earlier quotations to make Shakespeare look good; they simply did not know the earlier quotations existed. That would not be pardonable today, but they are not writing it today.

The OED is being written today, though. The difference between what could be done then and what can be done now is one of the motivators for the ongoing revision: the process of making and remaking the OED needs to be resumed. New editors have picked up where old ones left off, and they change things. The OED's form now, consulted online, is no more its final form than the first fascicle was in 1884 or the completed first edition was in 1928. It keeps going, like the language it records.

# Alex Steer (01/07/2008)