Language models, truth and logic
1404 words | ~7 min
Much has been made of large language models and their struggle with truth. Specifically, LLMs like ChatGPT have a tendency to turn out statements that are plausible but demonstrably false. Some of these are logical (eg incorrect calculations or conversions), others ontological (eg making up academic references).
This probably won't matter much for very long. Computers are already good at logic and calculation (e.g. working out what 9 x 5 is) and search (e.g. finding articles about Norwegian beekeeping). Things that are hard for an LLM can be done easily by another type of algorithm. These are such fundamental computing tasks that most of us barely think about them any more. Assume that at some point fairly soon we'll see progress in the development of 'supervisory' systems that can identify what kind of computation is being asked for, and triage accordingly.
Despite crowing from some corners, and legitimate concern from others (e.g. those worried about the spread of algorithmic misinformation), the flaws are interesting because they remind us what LLMs are optimised for. Beyond the usual constraints of speed and energy use, logical algorithms are optimised for accuracy; search algorithms are optimised for completeness then relevance. LLMs are optimised for plausibility, based on their resemblance to things written by people.
I think that may explain what I'd call the jittery enthusiasm towards them in professional services (e.g. marketing, consulting, law, PR). Go onto LinkedIn and you can't move for breathless commentary about the latest thing that ChatGPT has managed to do, much of it written by people who are paid to do that thing in real life. These sudden outpourings of just-in-time thought leadership, of human beings writing about machines that can write, have a distinctly 'I for one welcome our new insect overlords' feel to them. It's remarkable how many of them take the view that, actually, one shouldn't be scared of these innovations, because they'll help us work smarter. I would suggest that so many people suddenly shouting this out loud to strangers on the internet is not a symptom of mass calm.
I think LLMs represent such an affront to the professional services because those industries have previously treated plausibility as a moat against automation. Fifty years or so ago, computation began to bite into large sectors of the service economy. Many tasks like filing, document retrieval, calculation, measurement, bookkeeping and so on turned out to be trivially easy for computers to do. It bears remembering that in the older sense of the word a 'computer') was a person who did calculations for a living.
The professional services responded in two ways. First, of course, by embracing computerisation and automation of these kinds of tasks. (The 2021 US Census occupation data lists vasts numbers of people who work with computers, but no longer any who are computers.) But they also responded by changing what they valued. They progressively but insistently made the case that the bits of their work that could be done by machines were not, really, the bits that truly mattered. (The process of doing this also marginalised and demeaned the people who did that work as it was progressively computerised - see Jennifer S Light’s paper ‘When computers were women’ for a survey of this.)
Take, for example, advertising media planning, a subject the maths of which are close to my heart. The foundations of this discipline are not hard to describe: they consist of identifying where to place advertisements (or other branded assets, content, etc) to maximise the attention received to a defined group of people, usually in defined circumstances (eg ‘when they are searching for a hammer’, ‘before next Wednesday’); and to do this for the best possible price for a given amount of attention and commercial return. As an outsider you would imagine that most of this consists of matters of fact, things that can be established one way or the other with sufficient effort. There may not be one single best way of delivering a given media budget against specified objectives, but there will certainly be one or more best ways. While any media plan is a prediction based on past results, it is a testable, measurable prediction; in other words, it is computable. Since $820 billion was spent worldwide on paid media last year, you would be right to imagine that even small improvements in these computations could be hugely commercially valuable.
Yet this represents a tiny fraction of what the media services industry talks about, and what it value. When media professionals talk to themselves and to their clients, more time and effort is spent on innovation, creative uses of media, consumer insights, future trends, and other things that tend to be topics of intelligent discussion but not matters of fact. The tendency of media procurement processes to dwell on these areas suggests that clients buying media services share the belief that the computatable parts are not the important parts. Now, this might all be entirely reasonable if the fundamental computation questions mentioned above were solved problems with generally-agreed solutions, and the greatest source of differentiation between media organisations were their points of view on these more speculative and forward-looking matters. However, this is not true. Good analysis and optimisation routinely unlocks double-digit improvements in media delivery and return even for large advertisers.
Measurement and analysis are not everything; they are often hard; and ease of measurement can too often be mistaken for importance. But in professions whose main role is to advise businesses how to spend their money for growth, the matters of fact are too often systematically downplayed by both the buyers and the sellers of these services. Despite the enthusiasm for various exotic data techniques, the basic computable facts get too little attention. I suspect this is because people in many professional services industries have conditioned themselves to believe that the bits that are done by humans are the important bits... and many of the most human tasks in these industries are those that involve being plausible and convincing in the absence of matters of fact.
Being plausible, interesting and convincing are important, of course, when decisions need to be made and actions taken under conditions of uncertainty. We have to be able to weigh evidence, take judgement calls, build consensus, encourage and inspire others, and so on. But these are supplementary drivers of value in many professional services industries. When planning a national advertising campaign to sell dog food, the most important thing your media planner can do is not give you a point of view on emerging trends in immersive gaming; it is to tell you where, when and how to get the attention of the greatest number of eligible dog owners for your budget. Of course, a point of view on immersive gaming may help you do this better - but again, this is a testable prediction and requires an understanding of the existing facts. We tend not to like to dwell on this, because it reminds us that in many of the core value-driving parts of the professional services, computers have long since proved to be faster and more accurate than people.
And this is why, I think, generative AI represents such an affront, even if we’re all pretending it doesn’t. We have finally programmed computers to sound plausible: to come up with cogent-sounding rationales and summaries on the fly; even to produce writing and imagery that looks like it took original thought and care. In short, all the bits that we told ourselves were the important parts because they couldn’t be done by mere counting and sorting machines. Well, now many of them can - for now under supervision, perhaps soon without it. The parts that looked entirely safe turn out not to be at all.
The people claiming to be unconcerned have all the credibility of a horse salesman laughing at a Model T Ford. The expansion of AI will transform professional services the same way that the expansion of automated calculation and search did, and probably more so. We all have work to do, and we owe our colleagues and clients better than to brush it off.
# Alex Steer (05/02/2023)