Alex Steer

Better communication through data / about / archive

Vine, Instagram and fake digital trends

876 words | ~4 min

Three recent posts from the Marketing Land blog tell a morality tale on the perils of believing your own hype when it comes to digital trends.

On June 8th, the blog published an article titled 'Vine passes Instagram in total Twitter shares'. It hinged on this chart, from Topsy Analytics, apparently showing volumes of tweets posted to Twitter containing links to Instagram and Vine:

So far, so interesting. Plucky little Vine overtaking its giant competitor.

Then less than three weeks later, another article: 'A Week After Instagram’s Video Launch, Vine Sharing Tanks On Twitter'. And with it, an update of the same chart:

Taken together, these two would suggest that Instagram and Vine were locked in a head-to-head struggle, with some dramatic reversals of fortune. Very exciting stuff for those who love a good trend. First, Vine is the underdog that races into the lead. Next, Instagram knocks it back by introducing video sharing. A rollercoaster of trend-based action.

But hang on. Because this should give us all reason to raise an eyebrow. A quick bit of Googling tells us that Instagram has 130 million active users, while Vine has only 13 million. Are we really to believe that the two services are neck-and-neck?

To Topsy's credit, they saw the story brewing and posted a response. They explained:

The free Topsy service generates trend charts using a sample of the most influential people and tweets. This allows users to see emerging trends among influencers in real time... while the free service gives you a high-level snapshot of the momentum and direction of the social conversation around a topic or domain, Topsy Pro gives you the complete and unfiltered picture that accounts for every single tweet. Because influencers tend to move faster than the general social media population to try the newest things—which is part of what makes them influential—new trends or changes in the direction of trends can appear amplified in charts generated by our free service.
And they published this chart, showing the real ratio of Instagram to Vine shares:

![Instagram and Vine trends chart](http://farm4.staticflickr.com/3735/9167979252_dbf718b04d_o.png)

In short, Vine has grown a lot recently, and has suffered a recent dip. But the whole story is based on nonsense.

In fairness, Marketing Land also published a clarifying post - though I don't think they give due prominence to the 'postscript' on the two original posts, both of which are still up, which points out that their analysis was wrong. Ideally they should take those two posts down and replace them with a retraction.

Topsy, on the other hand, are to be congratulated for handling this well - even though they should probably have a much bigger warning sign on their free product.

But mainly, this is a story about the dangers of unchecked trends. This sort of story can be hugely influential. If you're a digital publisher and you only read the first one, you might decide to shift your efforts from Instagram to Vine; or from Vine to Instagram, if you read the second. As it is, you'd be crazy to act on the basis of either piece of 'information'. But you wouldn't necessarily know that.

Marketers are bombarded with trends by an increasing number of digital marketing and analytics companies. If the marketers don't know the right questions to ask, they are vulnerability to plausibility bias: the tendency to believe that stories constitute evidence, just because they contain numbers and are told by people who sound like they know what they're talking about.

So if you're a marketer, and someone shows you a trend, here are three questions you should always ask:

  1. How are these metrics calculated? Always ask this for metrics that appear to be something other than simple counting - especially scores like 'sentiment', 'influence' or 'customer value'.
  2. Is the data from a single source? Always ask this if someone is asking you to make a comparison. If I show you data on smartphone penetration in the UK vs Botswana, how do you know the data's been collected using the same method and represents a fair comparison?
  3. Is the data sampled? Always ask this. Just always. And then find out how it's sampled and how you can be sure the sample is representative. In the case of the data above, it definitely wasn't.

And if you don't get sensible answers to those three, be very careful indeed.

# Alex Steer (29/06/2013)


'Big data' in 1980

326 words | ~2 min

The paragraph below contains the earliest found usage of 'big data', which now appears in the OED's recently-added definition of the term.

Written in 1980, it's from Charles Tilly's The Old New Social History and the New Old Social History. It merits reading in full. (I've added a couple of links.)

The cliometricians "specialize in the assembling of vast quantities of data by teams of assistants, the use of the electronic computer to process it all, and the application of highly sophisticated mathematical procedures to the results obtained", (Stone 1979: 11). Against these procedures, Stone lodges the objections that historical data are too unreliable, that research assistants cannot be trusted with the application of ostensibly uniform rules, that coding loses crucial details, that mathematical results are incomprehensible to the historians they are meant to persuade, that the storage of evidence on computer tapes blocks the verification of conclusions by other historians, that the investigators tend to lose their wit, grace, and sense of proportion in the pursuit of statistical results, that none of the big questions has actually yielded to the bludgeoning of the big-data people, that "in general the sophistication of the methodology has tended to exceed the reliability of the data, while the usefulness of the results seems -- up to a point -- to be in inverse correlation to the mathematical complexity of the methodology and the grandiose scale of data-collection'' (Stone 1979: 13), For this eminent European social historian, the large enterprises which took shape in the 1960s have obviously lost their attractions.

Over-complicated, opaque, too proud of itself, and not useful enough. I leave it to you to judge how much has changed in the last 33 years.

# Alex Steer (22/06/2013)


Less than you assume, more than you imagine: Futureproofing online privacy

998 words | ~5 min

Bit of a long read, this. Blame cross-country rail travel.

Henry Porter, writing in the Guardian, is apoplectic about alleged efforts by GCHQ and the NSA to collect vast quantities of internet data direct from the fibreoptic cables that form the backbone of the net. He writes:

The story..must surely shake that complacency and demand a review of the profit-and-loss account in the safety versus liberty debate. And that must take in the effect the actions and views of a generation of middle-aged politicians, journalists and spies will have on people aged under 25, who may have to live with total surveillance under regimes that may be much less benign than the ones we know.

Despite this being a classic case of slippery slope rhetoric, I tend to agree. But since plenty of people will be writing about this story (as they have already) in terms of liberty vs security, I'm going to talk instead about expectations of privacy online.

What does it mean to have privacy online? In one sense, not much. Online activity is activity in a domain which is defined by communication: the transmission of information between parts of a network. By communicating over the network you are inviting third parties, not just to overhear your communication, but to be part of it. Asking for privacy in the classic sense of not being overheard is a little like asking for privacy in a game of Chinese Whispers.

But obviously this isn't satisfactory, so it can't really be what we mean when we talk informally about online privacy. Imagine that we are playing Chinese Whispers. You want to get a message to me, so you pass it through a chain of other people. They, rather obviously, know what your message is. You do not expect privacy of communication from them. But you do, reasonably, expect that they will keep your message confidential and not pass it on to others who are not in the chain.

So if I send a message from Machine A to Machine Z, and it passes through Machines B, C, D, and so on, can I reasonably expect that a stored copy of it will not be read by any person or machine not involved in its direct transmission to its destination? Or should I expect this to happen and adjust my behaviour accordingly?

Most of us think probabilistically about this, at least informally. Rather than talking in terms of absolute permission or inhibition, you figure out the probability that someone will access your message, and weigh that against the downside risk of them doing so. In other words: how likely is it that my message will go public, and how much damage would it do?

We are all aware that our online activity is part of a vast amount of similar, almost identical activity by others. So we modify our behaviour to some extent, but not as if we were being broadcast to the nation. Suppose I live in an authoritarian society and hold a critical opinion of the president. I may express this in an email to a like-minded friend, because I make a judgement that the effort required by some secret policeman to dig it out from the whole pile of online communications would be too high to make the risk of being caught badmouthing the great leader all that high.

The problem is, we're rubbish at judging risk.

The Guardian piece at the top demonstrates this in one direction. Henry Porter writes:

The two countries [Britain and the US] are rapidly perfecting a surveillance system that will allow them to capture and analyse a large quantity of international traffic consisting of emails, texts, phone calls, internet searches, chat, photographs, blogposts, videos and the many uses of Google.

Are they? Are they both capturing and analysing it? Because while capturing it may be easy (if expensive), analysing it is much harder. In particular, analysing it down to the level of individual users' individual behaviours is extremely hard, since you're effectively trying to run very granular searches on some of the largest datasets you could imagine. I suspect the author is overestimating the risk to individual liberty by underestimating the cost and complexity of the kind of operations he imagines. This is the conflation of what's plausible with what's possible.

And yet... when it comes to making this sort of judgement we also underestimate the risk, because we tend to think in terms of what we believe is possible now. Which is unwise when we're talking about permanent records of our online activity. Given time, it's perfectly legitimate to worry about what's merely plausible (any logically feasible kind of analysis), because it may become possible (thanks, Moore's Law). We also need to be aware of the fact that there are whole categories of data analysis that are possible now that were impossible a few years ago. I started my career as a dictionary editor, and when the dictionary I edited was first published in the late 19th century, there was no way to search its text except by the alphabetical ordering of its headwords. Now you can call up the results for any word in the dictionary; run regular expression queries to find words and phrases that contain fragments that interest you; and even mine the whole structure of the text for patterns.

In short: people with access to your data can probably do less with it now than you assume, but will probably be able to do more with it in future than you imagine. Any serious debate about online privacy should include that assumption.

# Alex Steer (21/06/2013)


The 'fallacy' fallacy

746 words | ~4 min

I've mentioned a couple of times here recently that there are plenty of people in the marketing industry who try to sound smart by trying to make other people's smart pronouncements sound dumb. (Try saying that three times quickly.)

Sadly this piece on Digiday is a textbook example of shoddy thinking in this genre. It's called 'The "Big Data" Fallacy', which obviously drew my attention. Early on in the article, we find this:

Investing in a DMP is something of a credibility test, with advertisers under pressure to make this “big data” technology the central component of all marketing strategy with other pieces, including multiple DSPs and networks, plugging in to the DMP. The problem with this strategy is that it is based on a fallacy — big data is just regular data, and its something every business should already be built on.

Whoa, nelly.

'Its [sic] something every business should already be built on.' That would make it not a fallacy, then. In fact, it would make the statement - advertisers should make data a central component of their activity - a truism.

Just like if I come out and say, 'the sky is blue' (when it is), that's not a fallacy. It's just the bleeding obvious.

It rolls on:

Successful online advertising is not about accumulating data, but actually doing something with it. The DMP and DSP are elements of a larger solution. Marketers don’t need a single data platform – they need a comprehensive marketing operating system.

It would be neat if I could demonstrate that this were a fallacy. It's not - just an unsubstantiated claim. If the problem is that data is already there and not being used, the exact constitution of whatever makes it usable doesn't matter, as long as it does the job. So you immediately start to wonder what the agenda is here.

I'm going to try to avoid quoting the whole article, so I'll stop here:

The age of big data is really no different from what businesses should have been doing all along. Data is prevalent in every organization... This is important data for marketing, but it represents just one component in a well-balanced strategy. Data only provides value when matched with media, so advertisers actually need access to media operating in tandem with their data.

Okay, once again, I've no idea where this comes from. There's no premise in the argument that justifies the leap to 'data'. So one can safely assume that the author has a vested interest in connecting data to media operations. And the author works for MediaMath, who do precisely this.

There's nothing wrong with a sales pitch. In many ways this blog is itself a kind of sales pitch (albeit an odd, roundabout, geeky one), since I work with data and information in the marketing industry. But at the moment the whole domain of 'big data' (and yes, I'll do a post on that term at some point) is full of people trying to demonstrate that their way of doing things is brilliant and everybody else's is wrong. Which would be fine, if it were true.

But it isn't - or, at least, it's not demonstrably true. Big data in marketing is more or less the wild west, with lots of models going round being more or less unproven. Be wary of anyone who tells you otherwise. There are proven models and there are proven models, but to my knowledge nobody has the kind of rigorously tested normative information yet that you'll find in market research, let alone the actual hard sciences after which a lot of big data practitioners are modelling themselves. (You remember, just like lots of financial engineers used to, before they accidentally blew up the economy.)

All of which means that when you see a blog post, article, conference presentation etc. that opens by dissecting someone else's 'fallacy', you should be aware that there are huge vested interests at play. And there's a good chance that the victim of the dissection wasn't fallacious at all.

I call it the fallacy fallacy: the erroneous tendency to assume that, because someone is in competition with you, he or she is wrong. It's lazy and we all need to stop using it.

# Alex Steer (20/06/2013)


Internet Explorer: makes ads about Do Not Track; tracks you on its website

425 words | ~2 min

I'd like to show you an ad. It's the latest for Microsoft Internet Explorer, and it tries to persuade us that Microsoft is on the side of consumers and their privacy concerns:

Do you see what they did there? Tugged at the heartstrings with an obvious human truth: we all have things we love to share, but we all have things we don't want people to know about us. And that, says the ad, is why Internet Explorer comes with 'Do Not Track' switched on by default, meaning that websites can't set cookies that remember users' online behaviours and preferences.

Firstly, Do Not Track has absolutely nothing to do with the kind of personal information that the ad talks about. That's the kind of information you pick up by snooping on people's personally identifiable accounts or by listening to what they say on Twitter.

If you come to a website that sets a tracking cookie, here's what that cookie can do:

  • Record which pages you visit
  • Record what events you take (e.g. buying, adding to basket)
  • Remember your machine if it visits the site again
  • Serve up recommendations (like Amazon does) about things you may find interesting or valuable

And here's what it can't do:

  • Tell who you are
  • Tie your behaviours back to your identity
  • Tell anyone anything about you as a person
  • Harm your reputation in any way

Unless of course you choose to share your identity with it by creating an account, giving your personal details and logging in. In that case, the cookie is irrelevant as all your behaviours are logged against your account, just as they would be if you were a registered customer of an old-fashioned mail order business.

But that's not the best thing about the ad. The best thing is that if you go to Internet Explorer's website and take a look at the source code, you'll see that it's running Google Analytics.

Let me say that again. Internet Explorer's website is running Google Analytics.

Which drops a cookie on your computer. To track your behaviours.

Well played.

# Alex Steer (16/06/2013)


Big data and small ambitions

415 words | ~2 min

Having a drink last night with an old friend who spends a lot of time working with data and statistics, we realised we'd both heard too many talks this year that say one of two things.

The first is, Big data is going to change everything.

The second is, It's not the data that counts, it's what you do with it.

Both of which prove mainly that people like sounding smart. (Yes, I'm saying this on a blog. I know, sorry...)

More and more, my response to both of these is:

If you want to know which companies will succeed using big data, look at which ones succeed using small data.

Not exactly world-shaking, is it? But since about halfway through 2012 people have been talking about data as if it never existed before. Suddenly it's as if everyone is deeply committed to the idea that you can use information about people to find out more about their lives, needs, attitudes, values and behaviours. And it's as if this is somehow new.

It's not new, and many of the people who are suddenly banging the big data drum are to be doubted in their convictions, and in the scale of their ambition. Some (mainly on the IT side) have been gathering data for years and not making any serious use of it. Others (mainly on the advertising side) have been so consistently rude about the data that has been made available to them - by media agencies, by market research companies, by clients themselves - that you'd be forgiven for thinking that they considered data to be an impediment.

By contrast to both of these, there are people and there are organisations who for years have made every effort to make the most of any piece of information they could scrape together - using it, thinking laterally with it, taking it past the obvious and using it to keep themselves honest, and to push themselves to be more ambitious, and more creative. Trust them, not the latecomers.

Data can be an impediment to creativity if it is used badly. Just like more data can be used to find insights that unlock a new perspective on people. On that basis, having more data is valuable, in much the same way that having lots of bullets is useful in a fight. You just need someone who knows how to fire the gun.

# Alex Steer (15/06/2013)


How the Guardian reads the Daily Mail

442 words | ~2 min

The data team at the Guardian have created a word-tree visualisation tool which lets us query what commenters on the Daily Mail website have to say on various topics. The logic of the tool is pretty simple, as they explain:

It uses the most recent ten comments from over 100 stories featuring the words "young offenders institution" posted by the MailOnline since 2009. To use it just put in any word and it will say what comes after in any of the comments in the database. For example, if you put in the word "scum" then you can see that many users are happy to throw that word around to describe offenders. "Scum and scummer" was one inventive way that a user got their point across.

The article is a cracking read - as you'd expect, it provides example after example of crazy Daily Mail comment-bait.

But that's where I have a problem with the application of this tool - both as a linguist and as someone who works a lot on the fair and balanced use of data. Word-tree visualisation tools are useful for evaluating usage by spotting frequency patterns above the word level.

But here, the Guardian aren't using the tool to evaluate usage, so much as to judge users.

Anyone who takes at face value the Guardian's (genuine, serious and long-standing) commitment to data-driven journalism should question the approach they've taken in this piece. A quick look suggests that they've picked lemmas (words and phrases) that are geared towards providing a quick thrill for the Guardian's readers, who (I rather suspect) enjoy taking a dim view of the parochial opinions of Mail readers. Lemmas they pick for analysis include:

  • scum
  • bring back
  • this country
  • parents

The piece continues:

Some of the old bugbears of the Daily Mail such as "human rights", "Labour", "jail", "prison", "tax" and "the judge" also make for fun reads.

I can't say this strongly enough. If you go into a piece of analysis with strongly-held prejudices, you will tend to find things that confirm those prejudices, because that's what you'll be looking for. The bigger the data set, the more you will find to confirm what you already believe. (It's one of the reasons why big data analysis in data-rich but complex domains like economics is so fraught with error.) That kind of lazy poke-the-monkey analysis is, I'm sad to say, exactly what the Guardian is guilty of here. They should know better, and should do better.

# Alex Steer (27/05/2013)


Microsoft: Work From Here. Until you drop.

324 words | ~2 min

If you travel by train in the UK you might have seen Microsoft's new ad campaign for Office 365. It's called 'Work From Here' and it's a really smart piece of media thinking. Creative Review has the details - they've taken over train stations, railway carriages, ticket halls etc., to reinforce the point that with Office 365 you can work from anywhere.

Thing is, while I like the media (and the creative), I really don't like the thought.

Read the above: Here's where Lisa finalised the figures and posted them just after eight.

Get a life, Lisa.

There are plenty more in the same vein. I saw one about a guy called Ben getting some documents to his angry boss in the nick of time. Poor Ben.

If 'work from here' is supposed to be liberating, I think it has the opposite effect. Office 365 comes across as the successor to the Crackberry, a tool that makes sure you can never blame the mere fact of being away from your desk - or it being night-time, or a weekend, or anything else - for your failure to deliver some unspecified documentation to someone, somewhere. Because whichever 'eight', morning or evening, Lisa is sending her documents at, it's probably a time she should be seeing friends or loved ones instead of Excel-jockeying her life away.

So thanks, Microsoft, for reminding us that wherever we are, whatever we're doing, in any railway station throughout the land, we shouldn't be buying a coffee or playing Angry Birds or browsing the weird souvenirs in WH Smiths. We should be working.

Image via Creative Review, used with thanks.

# Alex Steer (14/04/2013)


Traders, guardians and big data

698 words | ~3 min

I find it amazing that people can talk seriously about the idea that big data might spell the end of theory. After a decade in which data analysis failed to provide a credible risk model for the global financial system, you'd think we'd be a bit more circumspect about off-model risk.

But then I'm a strategist by inclination, so I would say that, wouldn't I? I tend to believe that analysis of the available data should be balanced with thought, imagination, speculation. Rather than prolonging the argument, it's worth looking at why certain claims are made about big data, and what they tell us about the competing visions of the future that they represent.

The emerging field of big data is an odd cultural mix at the moment, like the world of high finance was in the 1990s. It has put C.P. Snow's two cultures - in this case, advertising and technology - in the same room to see what happens. As we've seen, that can be a productive but very unstable mix. We need a way to spot each others' assumptions - and our own.

I'm grateful to Andrew Curry for introducing me to Jane Jacobs' model of Traders and Guardians, from her book Systems of Survival. It offers a nice way of thinking about how different mindsets approach the problem of progress (social, technological, etc.). I quote from Mary Ann Glendon's reviewof the book:

Because traders’ prosperity depends on making reliable deals, they set great store by policies that tend to create or reinforce honesty and trust: respect contracts; come to voluntary agreements; shun force; be tolerant and courteous; collaborate easily with strangers. Because producers for trade thrive on improved products and methods they also value inventiveness, and attitudes that foster creativity, such as "dissent for the sake of the task"...
Guardians prize such qualities as discipline, obedience, prowess, respect for tradition and hierarchy, show of strength, ostentation, largesse, and "deception for the sake of the task." The bedrock of guardian systems is loyalty. It not only promotes their common objectives, but it keeps them from preying on one another. They are wary of, even hostile to, trade, for the reason that loyalty and secrets of the group must not be for sale.

Guardians and traders fulfil different roles. Guardians give continuity, protection, standards, quality. Traders give disruption, innovation, growth, opportunity. According to Jacobs, problems arise when the two systems of behaviour get mixed up or imbalanced. When you have too many traders, you end up with the kind of overconfidence that generates instability and risk. When you have too many guardians, you end up with the kind of overconfidence that generates complacency and stagnation.

Big data contains plenty of traders - doing deals, chasing the new new thing. But also plenty of guardians - applying models and processes, promoting standards and best practice. The two need to be kept in balance. Too many traders and we'll end up selling way ahead of our capabilities, cutting corners, disregarding expectations of privacy. Too many guardians and we'll build white elephants that don't move with the times.

Traders tend to incline to the view that big data will kill theory - like all the other 'this changes everything' tech and financial fads of the last twenty years. They stand to be disappointed. Guardians tend to think it won't, that it's a useful accelerator of traditional research and analysis techniques. Both are likely to be surprised, and both need to pay close attention to the other's point of view.

If the guardians win the day, big data will stagnate into a world of heavy-duty IT platforms. If the traders win, we'll find ourselves in a bubble that may burst with damaging consequences. If the two learn to balance each others' demands, they might create something of lasting value.

# Alex Steer (14/04/2013)


Startup strategy and 'framing contests'

381 words | ~2 min

By some roundabout route I found myself reading this paper today, by Sarah Kaplan of the Wharton Business School. It introduces the idea that when people get together to set strategy for a business, they end up in a 'framing contest' - a fight between competing individual worldviews, all trying to turn themselves into a consensus:

Frames are the means by which managers make sense of ambiguous information from their environments. Actors each had cognitive frames about the direction the market was taking and about what kinds of solutions would be appropriate. Where frames about a strategic choice were not congruent, actors engaged in highly political framing practices to make their frames resonate and to mobilize action in their favor. Those actors who most skillfully engaged in these practices shaped the frame which prevailed in the organization.

It's a good read, and made me think about startups - especially those in new sectors (like big data).

Startups get a lot of criticism for being willing to sell anything to anyone, often long before their strategy has been defined. And of course for selling slideware, vapourware, Photoshopware, and various other kinds of -ware which mean you don't yet have a product/service/offer and are trying to get your clients to fund your R&D.

Seen through Kaplan's lens, this becomes less a fault than an early-stage survival tactic. Rather than studying the market, setting a strategy, building a roadmap then going to the market to attract clients, startups are outsourcing their framing contests. You could call them 'framing auctions'. Defining a minimum viable strategy, then having a lot of conversations with a lot of different clients and seeing which ones lead to a sale, deal, pilot project, etc. Put this way, the definitional slipperiness of most startups becomes a way of getting prospective clients to take part in your framing contest.

This can cause sleepless nights for the pure-minded, but may be a pretty smart way to start building a brand. There's a bigger question about whether strategic planning works for startups, and even for emerging sectors, or whether a broader-based foresight model is the best way to start preparing for the different ways a market might go.

# Alex Steer (13/04/2013)