Monday, January 23, 2012

"Open Data" Needs to Die

Amongst all the UK GovCamp 2012 buzz, point #18 from Tom Sprints' write-up caught me as being one of the more curious:
18. A lot of “open data” sessions just seemed to me to be variations on a theme, and didn’t sell themselves to me at all. I am therefore worried that some of those discussions are either very esoteric, or insufficiently informed by people who understand the issues rather than the tech.

Where have we come from?

As a data geek (I like the word "mechanic" myself), it's been intriguing to see the conversation around "open data" change over successive GovCamps. A few years back, the question was heartily "How can we get hold of data?" - Tim Berners-Lee was starting out on his comeback tour, and mySociety were beginning to show that data could be made useful with some clever tools.

As I remember it (likely in a fairly biased narrative way), the conversation then switched fairly rapidly into "What's the best way to open up data?" - in terms of what data and what platforms were most useful to developers. Suddenly data stores had (experimental) APIs, and the public realm had massive amounts of spending data. There was some loose rhetoric about transparency and accountability, while developers picked things apart with fine Excel toothcombs.

Then things got more interesting, as it turned out everything that had happened so far didn't automagically lead to Amazing Stuff Happening. The question became a necessary "So what?" - as if transparency and accountability weren't enough by themselves! The topic turned to users and reasons and (more often) to interesting examples. Surely, somebody was clamouring for this stuff after all this?

I'm kind of hoping this explains something about why "open data" sessions are a bit fumbly-jumbly now.

Open data got complicated, quickly. Because data is complicated. Jump to the present, and conversations rapidly flit between all of the above either because everybody is involved at the same time, or the people who should be involved, aren't.

"Open Data" is harmful

Or both. The paradox is that it's become difficult to talk about open data firstly because those who were talking about it from one point of view are now talking about it from many points of view. And secondly because those who weren't talking about it before aren't talking about it now. Data silos still exist. Most people still use Excel. Statisticians still output reports.

The term "open data" is meaningless now. Not just meaningless - actively harmful. If you're used to talking about it, then the conversation has begun to fragment and coalesce around more subtle outcrops. And if you're not used to talking about it, then you're put off because nobody can explain what it means - and more importantly, what it means to you. So you carry on as normal.

My session at GovCamp on Data Engagement was, in retrospect, an attempt to get back to the previous question of "So what?". What I really want to do is fence the conversation off from the technical, economic and political aspects of data (although I'm still into all these things) and focus on the why. I desperately tried not to use the term "open data" because I think it would have distracted the discussion. (To be honest, I wanted to find something better than "data engagement" too, hence the phrase "Everyday data".)

And I'm really glad that some of the idea got taken up on day 2 by Tim Davies and others. A "Charter" for engaging with data really starts to delve into how we think about how to make data useful.

I admit I'm a little afraid that the term "Open Data Engagement" just makes the discussion even more vague. What does that mean to you if you have no idea what it is, or what Open Data is supposed to be? Is it all at risk of becoming another buzzword? What about "Data Usability", or "Public Data Engagement"? I'm still aware just how much I hate the terms "Public Understanding of Science" and "Public Engagement with Science". Are we going round in circles?

Should we call a Stats Spade a Stats Spade?

Many people with useful, everyday data and databases really don't think in terms of data. Because the data is about stuff they know, they think of it as "information". Maybe even a "resource". But ask them what "data" they have and they'll probably give you a back-up of their website.

One of the interesting points coming out of the Data Engagement session was that people deal with data all the time - think football, Formula 1, house prices, etc. But do people even refer to this as "data"? Or - more likely - do they call them "stats"? Mention "stats" and people think of tables, averages, and counts.

In a way, "stats" makes sense where "data" doesn't. "Information" makes sense where "data" doesn't. "Data" is tricky because it's all of this and more. It's figures, it's formats, it's visualisations. No wonder even those who understand this get confused when talking to each other. The more you try to take "Data" into the real world, the less the term applies.

Should the "open data" moniker be scrapped instead of more "useful" terms like these? Would this make talking about implementing it more difficult, or easier? After all, any conversation on how to make data useful quickly turns away from talk of even databases and on to other issues (standards, protocols, best practice, comprehension).

Maybe if we talk about our bus times as "public information", and spending figures as "spending figures" then people will be interested in it, and we can stop trying to work out what "open" means.

Maybe.

3 comments:

Tim Davies said...

Good questions and reflections.

Whether or not open data terminology is useful is, I suspect, dependent upon audience.

If we think about 'open data' as a shift from government holding onto data, to providing it to the public, then we do I think still need to be able to talk about "Open Data Initiatives" (and the engagement charter is specifically targeted at the people involved in these initiatives) - but these should be thought of similar to any government initiative (Best Value; Customer Contact Centre Initiatives etc.): things a small group of people talk to each other about, but not really public names for stuff. It's also likely that we should think of "Open Data Initiatives" not so much in isolation, but more as part of a more general class of Open Government initiatives.

The bits of open data initiatives most people will come into contact with should definitely be given much more user friendly labels: 'Bus information" etc. and whether people know they have got anything to do with openly licensed, machine readable, standardised data at all or not should be inconsequential.

However, there is a wider element of public awareness of the shift to open data that's needed, but perhaps it's better framed as part of public awareness of 'Right to Information and Data' - rather than 'open data' as some technical project of government.

Gareth said...

I really like this post. It makes a lot of sense for me at the moment. I've just spent 4 months working on a review with our Cllrs about whether and how the authority should publish 'open data'. These aren't retired generals, they're very switched on elected Members, but they're not that fussed about open data. They're even slightly suspicious of the whole agenda. Talk to them about open data and they glaze over, talk to them about 'the digital divide' and they're really interested.

I'm not sure what the answer is but I agree that we probably do need different ways to talk about this.

James Hendler's anecdote about Trader Joe's in Albany was a really goo example. I'll try to summarise it somewhere and link back.

@grthwll

Unknown said...

Interesting questions! At least german computer scientists in their 101 classes like to differentiate between "data", "information" and "knowledge". I never really could grasp the essences of this disambiguations, and even of those people I suspect they soon forget about it again.

Add to this the term "Raw Data" that is used a lot by TBL and the scene. I tried to introduce social scientists and philosophers into this topic and usually get to the argument - there is no such thing as "raw" data, as any measurement is already determined with a priori categorizes or assumptions of the world.

On the one hand I like "quantification" or "quantitative" as terms, since thats what most data that is talked about is. On the other hand I like "mundanisation" is a term I think about ("Veralltäglichung" in my tongue - german), that is trying to get this data/information/knowledge/quantification part of everyones live or civil society.