While attending the Digital Humanities Summer School at Oxford university this summer, I had the chance to see a variety of lectures. The first of these was by Chris Lintott (Department of Physics, University of Oxford). Chris Lintott has been involved in the development of what has been termed Citizen Science – the communal engagement with science research – and runs one of the most notable of these projects, Zooniverse. My apologies if this is somewhat after the event, but here is the gist of Chris’s talk.
Chris started with the example the data produced by astrophysical research: CERN, for example, produces hundreds of terabytes of data per second during its experiments (Terabyte = c.1000 Gb). This is ‘Big Data’ indeed and pushes at both the limits of computing and the funding of such research. As an answer to the processing and the funding of digging such large amounts of data, crowdsourcing produces a very rich dataset. Involving multiple readers of data, crowdsourcing enables a high level of crosschecking and has been generating original knowledge and insight.
Chris then enumerated a number of examples of science-related projects that use communal collaboration to dig data; the first of which was Galaxy Zoo which analyses data from the Hubble space telescope. Galaxy Zoo makes it easy for non-academics to take part: as you can see on the page that asks for your help classifying types of galaxies, it is as easy as clicking a button. This is a very important feature of getting communal participation: make it too difficult at the first step and you’ve lost your potential researcher. Chris argued that the key to people’s participation in crowdsourcing research like this was motivation: after a motivation survey was conducted that asked what kind of involvement people preferred the largest proportion voted to ‘contribute’. It reflected, he suggested, a powerful desire for people to own their research. Indeed, that first step led on to people producing their own specialised communities (and their own online forums) within the larger Galaxy Zoo community. In most areas of new research there are typically a number of known unknowns, so it was also key to produce task-specific fields of enquiry, managing the kind of questions you want crowdsourced.
The extension of Galaxy Zoo to encompass a number of new areas of large-scale projects resulted in the umbrella project Zooniverse. Chris warned not to ignore the problems of scale and specifically not to underestimate the potential numbers of contributors: across its various projects Zooniverse currently has 666,074 people taking part (Galaxy Zoo on its own has had around 250,000 people involved so far). While the project is dominated by astrophysics (five projects based on data supplied by space telescopes and satellites) it also includes humanities-orientated projects: transcribing papyrus documents in Ancient Lives, interpreting whale song Whalefm (‘Whalefm’), and analysing historical climate data Old Weather. Old Weather uses the meticulously recorded weather data contained in Royal Navy ships’ logs dating back to the eighteenth century. What’s particularly interesting in this project is that the ships’ logs also include a huge variety of the day-to-day details of shipboard life – anything, in fact, that particular duty officer chose to write down. This data is also included in the project’s database and is fully searchable, so the community is engaging with research well beyond the confines of climatology.
Chris then moved on to discuss a variety of other humanities-focused crowdsourced projects, including the Bodleian library’s project on musical scores What’s the Score. Commenting again on the issue of building motivation, Chris commented that the most successful crowdsourcing projects do not face users with tutorials but use mini-help boxes supplying context as they go long: ‘dump them into the deep end’ he suggested! Indeed, the New York Public library’s project to transcribe thousands of restaurant dishes on its huge collection of historical menus is a good example. Participation in the What’s on the Menu project starts with just the click on one button (they’re up to over a million of dishes). Crucial, then, it to ensure that results are immediately obvious and tangible and that engagement with the wider community is easy. The Ancient Lives project (under the Zooniverse umbrella) involves transcribing ancient papyrus and uses a basic on-screen interface like a transcribing keyboard. It also includes a feature called ‘Talk’ – one click from the interface to engage in immediate responses to a particular image one is working on.
This lead Chris to argue that perhaps ‘crowdsourcing’ may not be the right way of conceptualising the kind of work done by such communal research and suggested that gaming theory might be more applicable to certain projects: an alternative way to imagine the motivation and rewards of crowdsourcing. Examples here include Fold it: a game to research protein molecular structures, which is, it has to be said, complex and expensive. Similar, but much more user-friendly and addictive looking, is DigitalKoot. At first glance this involves two games, ‘Mole Bridge’ and ‘Mole Hunt’, but they are in fact programs designed by the National Library of Finland to transcribe 19th Finnish-language newspapers: as you play you transcribe. Turning analysis into gaming is obviously attractive and involves a shift in motivation. Similarly, the communal engagement with the SETI project (the Search for Extra-terrestrial Intelligence) offers various badges depending on what you have found, from interesting signal to an actual alien. However, this exemplifies the potential problems in gaming and motivation: unsurprisingly no one has yet got the top badge in SETI. In short, Chris argued, don’t replace authentic experience and meaningful participation with goals. Instead, if we wanted to design projects around crowdsourcing, he reminded us that the people who want to get involved in such communal research are specialists in something: build on that.