One provocation for Big Data

I’ve started thinking a lot about Big Data and what it could mean for museums in a time when, as Danah Boyd and Kate Crawford write “The era of Big Data has begun.”

The two have put forward an excellent and provocative paper about some of the weaknesses and problematics associated with the use of Big Data, titled Six Provocations for Big Data. Chief amongst these is the idea that Big Data is changing the very way we research. They write:

Big Data not only refers to very large data sets and the tools and procedures used to manipulate and analyze them, but also to a computational turn in thought and research (Burkholder 1992). Just as Ford changed the way we made cars – and then transformed work itself – Big Data has emerged a system of knowledge that is already changing the objects of knowledge, while also having the power to inform how we understand human networks and community…

We would argue that Big Data creates a radical shift in how we think about research. Commenting on computational social science, Lazer et al argue that it offers ‘the capacity to collect and analyze data with an unprecedented breadth and depth and scale’ (2009, p. 722). But it is not just a matter of scale. Neither is enough to consider it in terms of proximity, or what Moretti (2007) refers to as distant or close analysis of texts. Rather, it is a profound change at the levels of epistemology and ethics. It reframes key questions about the constitution of knowledge, the processes of research, how we should engage with information, and the nature and the categorization of reality. Just as du Gay and Pryke note that ‘accounting tools…do not simply aid the measurement of economic activity, they shape the reality they measure’ (2002, pp. 12-13), so Big Data stakes out new terrains of objects, methods of knowing, and definitions of social life.

This is merely one of the fascinating propositions that the two put forward, as they argue for a serious interrogation of the way Big Data will shape research, and problematise the problems of both the nature of the data, and the way it is used in analysis.

It is a very interesting paper, and one that discusses a very real issue that I think museums will more and more have to confront in coming years, vested as we are in “the nature and the categorization of reality.” Museum collection databases are a significant cultural resource – and a knowledge asset in their own right. However, to date, they have rarely been treated as such. Museum collection data is still generally considered as secondary to the object itself as an interpretive tool. It merely supports the object.

However, as we move further into this new era – an era when data can be related, mined and aggregated with new viscosity, when the value of data for knowledge production increases, then museums need to address this issue. We need to think about the quality of our data, and how we want people to be able to access and use it. We need to ask who should manage and take care of our data, and what data should be included. If it has the potential to be as valuable (maybe even more so?) to society as our objects, then surely it needs to be taken care of with the same level of priority.

In my recent post on whether museums should still be treating the physical space as the main one, Mia Ridge made the following comment:

And to play devil’s advocate… there are probably lots of people who can do more interesting things with museum content online than your average museum can currently manage. That might be because of resourcing or recruitment issues, a lack of imagination, because the organisation doesn’t know how to value or get excited about online content, whatever… but maybe if they’re not going to do digital well, then museums should just open up their data and let other people get on with creating the next wave of museums online.

This too raises interesting issues for museums about how to best make their data available for others to use, however, because effective data modelling is often complex. As Daniel W. Rasmus writes, in his article on Why Big Data Won’t Make You Smart, Rich, Or Pretty

Combining models full of nuance and obscurity increases complexity. Organizations that plan complex uses of Big Data and the algorithms that analyze the data need to think about continuity and succession planning in order to maintain the accuracy and relevance of their models over time, and they need to be very cautious about the time it will take to integrate, and the value of results achieved, from data and models that border on the cryptic.

So, if Big Data is becoming increasingly important in research and the constitution of knowledge, and yet museums are not themselves necessarily likely to be the ones using it internally (assuming that our expertise lies elsewhere) how can we then think of continuity and succession planning for our data, to ensure it is useful for other researchers? Is this something we can even achieve?

The Linked Open Data movement is obviously going to be a part of this, but I wonder how much further we need to go. Surely the notion of moving from object-based knowledge to knowledge that integrates Big Data starts to essentially change the very core of how museums function as a knowledge institution? And if it does, what does that mean? Is it even possible for museums to tackle this without knowing what an anticipated end result might be? Or is this something that is too complex to be dealt with for all but a very few institutions (if any)? And if so, do we just withdraw from what some believe will be the fifth wave in the technology revolution?

This zippy little article shows what 100 million calls to 311 revealed about New York. What patterns could emerge from our collections if we could analyse information about our collections on such scale? Would it become feasible to see both the trees and  the forest of the museum collection – the objects, and the large-scale contexts in which they exist. Could utilising museum  collections data in this way recomplexify museum objects and collections, adding new layers of meaning and reconnecting them   back to the wider world of information?

I have no answers here. These are still ideas in sketch, and there is much more to be discussed as my ideas evolve on this subject. But I think it is something we should be talking about.