One provocation for Big Data

I’ve started thinking a lot about Big Data and what it could mean for museums in a time when, as Danah Boyd and Kate Crawford write “The era of Big Data has begun.”

The two have put forward an excellent and provocative paper about some of the weaknesses and problematics associated with the use of Big Data, titled Six Provocations for Big Data. Chief amongst these is the idea that Big Data is changing the very way we research. They write:

Big Data not only refers to very large data sets and the tools and procedures used to manipulate and analyze them, but also to a computational turn in thought and research (Burkholder 1992). Just as Ford changed the way we made cars – and then transformed work itself – Big Data has emerged a system of knowledge that is already changing the objects of knowledge, while also having the power to inform how we understand human networks and community…

We would argue that Big Data creates a radical shift in how we think about research. Commenting on computational social science, Lazer et al argue that it offers ‘the capacity to collect and analyze data with an unprecedented breadth and depth and scale’ (2009, p. 722). But it is not just a matter of scale. Neither is enough to consider it in terms of proximity, or what Moretti (2007) refers to as distant or close analysis of texts. Rather, it is a profound change at the levels of epistemology and ethics. It reframes key questions about the constitution of knowledge, the processes of research, how we should engage with information, and the nature and the categorization of reality. Just as du Gay and Pryke note that ‘accounting tools…do not simply aid the measurement of economic activity, they shape the reality they measure’ (2002, pp. 12-13), so Big Data stakes out new terrains of objects, methods of knowing, and definitions of social life.

This is merely one of the fascinating propositions that the two put forward, as they argue for a serious interrogation of the way Big Data will shape research, and problematise the problems of both the nature of the data, and the way it is used in analysis.

It is a very interesting paper, and one that discusses a very real issue that I think museums will more and more have to confront in coming years, vested as we are in “the nature and the categorization of reality.” Museum collection databases are a significant cultural resource – and a knowledge asset in their own right. However, to date, they have rarely been treated as such. Museum collection data is still generally considered as secondary to the object itself as an interpretive tool. It merely supports the object.

However, as we move further into this new era – an era when data can be related, mined and aggregated with new viscosity, when the value of data for knowledge production increases, then museums need to address this issue. We need to think about the quality of our data, and how we want people to be able to access and use it. We need to ask who should manage and take care of our data, and what data should be included. If it has the potential to be as valuable (maybe even more so?) to society as our objects, then surely it needs to be taken care of with the same level of priority.

In my recent post on whether museums should still be treating the physical space as the main one, Mia Ridge made the following comment:

And to play devil’s advocate… there are probably lots of people who can do more interesting things with museum content online than your average museum can currently manage. That might be because of resourcing or recruitment issues, a lack of imagination, because the organisation doesn’t know how to value or get excited about online content, whatever… but maybe if they’re not going to do digital well, then museums should just open up their data and let other people get on with creating the next wave of museums online.

This too raises interesting issues for museums about how to best make their data available for others to use, however, because effective data modelling is often complex. As Daniel W. Rasmus writes, in his article on Why Big Data Won’t Make You Smart, Rich, Or Pretty

Combining models full of nuance and obscurity increases complexity. Organizations that plan complex uses of Big Data and the algorithms that analyze the data need to think about continuity and succession planning in order to maintain the accuracy and relevance of their models over time, and they need to be very cautious about the time it will take to integrate, and the value of results achieved, from data and models that border on the cryptic.

So, if Big Data is becoming increasingly important in research and the constitution of knowledge, and yet museums are not themselves necessarily likely to be the ones using it internally (assuming that our expertise lies elsewhere) how can we then think of continuity and succession planning for our data, to ensure it is useful for other researchers? Is this something we can even achieve?

The Linked Open Data movement is obviously going to be a part of this, but I wonder how much further we need to go. Surely the notion of moving from object-based knowledge to knowledge that integrates Big Data starts to essentially change the very core of how museums function as a knowledge institution? And if it does, what does that mean? Is it even possible for museums to tackle this without knowing what an anticipated end result might be? Or is this something that is too complex to be dealt with for all but a very few institutions (if any)? And if so, do we just withdraw from what some believe will be the fifth wave in the technology revolution?

This zippy little article shows what 100 million calls to 311 revealed about New York. What patterns could emerge from our collections if we could analyse information about our collections on such scale? Would it become feasible to see both the trees and  the forest of the museum collection – the objects, and the large-scale contexts in which they exist. Could utilising museum  collections data in this way recomplexify museum objects and collections, adding new layers of meaning and reconnecting them   back to the wider world of information?

I have no answers here. These are still ideas in sketch, and there is much more to be discussed as my ideas evolve on this subject. But I think it is something we should be talking about.

The Internet, GLAMs and the production of new knowledge

In line with my involvement in the Digital Culture Public Sphere in the last week, one major question that has been surfacing time and time again during the discussions: How do we pitch GLAM organisations as being for the future, rather than simply about old things, and nostalgia? Or, in other words, how can we make GLAMs sexy to politicians?

Museums are often thought of as being about ‘old stuff” and stories. Much of our publicly recognised value still seems to be in the kind of nostalgia or memory arena. We can absolutely see this in the kind of language that was used within the National Culture Policy Discussion Paper, in which cultural institutions have the following “pitch”:

The Government also funds national collecting institutions which perform a central role in preserving and making Australia’s art and culture accessible. These institutions have traditionally centred their activities on collections management which includes documentation, conservation and exhibition. However, changing community expectations of access and service have created additional areas of common interest, including education, interpretation, regional delivery and digitisation of collections.

Even in this policy language, the view of cultural collecting institutions is really only about preservation and accessibility of art and culture. The value of our collections is seen to only reach so far as education and interpretation.

But right now, GLAMs have far greater potential in the creation of new knowledge, particularly with the incredibly rich data that’s held within and around our collections. In a data economy, we are actually incredibly rich with the sort of data that no one else has.

Ben Goldacre at the Guardian published an article on Friday, arguing for the incredible value of everyday government data. He writes

Amazing things happen when you pull individual pieces of information together into larger linked datasets: meaning emerges, as you produce facts from figures. If you’ve ever wished you were born in the 19th century, when there were so many obvious inventions and ideas to hook for yourself, then I seriously recommend you become a coder, because future nerds will look back on this time with the exact same envy. But that leap forward will be tediously retarded if we don’t make the government allow us to use the pavements.

This is the same argument that I’ve started making in regards to GLAM collections. As I said in my Public Sphere presentation:

We cannot now even imagine the full possibilities that might come from the uploading of our collections to the Internet… Who knows what possibilities for new discovery, new knowledge and new insight lie hidden in the collections of our museums, galleries, libraries and archives? Digitising our collections and making them available online in usable forms… will lead to incredible new opportunities for cultural institutions to gain new relevance in the global knowledge economy.

GLAM collecting institutions have incredible information resources that can tell incredible, and hitherto hidden, stories about the development of society and of the natural world. We should be partnering with researchers, scientists and data visualisation specialists. Although we might hold expertise on our collections at an object level, or even a collection level, there is new knowledge that is held within our collections that will be liberated when we can pull together the individual pieces of information, and find new meanings.

The Internet, and Linked Open Data, really do liberate our cultural institutions to be more than just the sum of their parts. Now might be the time that GLAMs really do come into their own, as public institutions that truly serve the public both off- and online.