So in March, I was invited to my first hack. Me, an English Literature lecturer was going to have to produce something with computers in one day? Now read on …
This was the EEBO-TCP hackfest, an event designed to launch the release into the public online domain of over 25,000 texts from the fifteenth to the seventeenth century. These texts have been curated and encoded by the Text Creation Partnership, a collaborative project between the University of Michigan, the Bodleian Library University of Oxford, and Proquest, the publishers of online database Early English Books Online. The idea of the hackfest was that humanities researchers and scholars would come together with digital researchers and technologists and create – in a day – innovative and imaginative ways of exploring, analysing, and developing this huge corpus. Now, while I’ve been tinkering with digital humanities approaches myself, I’m no programmer. Moreover, I’m an eighteenth-century-ist so I was stepping a little outside my normal safety zone. So it was with some trepidation, yet also with considerable excitement, that I dipped a toe into my first digital hack. The setting was the new Bodleian Weston library: appropriately for a day building things, it was still under construction.
It started with a speed-date. Over plenty of coffee thirty-or-so of us circulated around telling our stories and plans to anyone we could button-hole. Given humanists seem to be in the majority, most people were looking for a tech person to help out, and my case, slightly desperately so. My idea was to analyse some of the structural features of pre-eighteenth-century fiction, such as dedications, prefaces, letters to the reader, chapters, illustrations etc. But what I didn’t know was how to bring out that data from a large corpus and produce something potentially meaningful.
I needn’t have worried. Everyone was incredibly receptive and eager to make our plans work, so I found my geek (I know he’s happy with that epithet!): the extraordinarily energetic Dan Q from the Bodleian’s digital team. Together with a couple of people
working with formal features of seventeenth-century alchemy texts, we found ourselves a table and began to work out how we might visualize this structural data. And this is the part that I found really exciting: within a couple of hours I had created a sub-corpus of fiction from the total of 25,000 texts, Dan had written some code to identify and count all the structural features I could think of (with some advice from Simon Charles from the TCP project about the TEI markup), and it had started producing some figures. With the knowledge that we all had to present our work at the end of the day, I had to think of ways to set out the results to suggest some kind of point to all this: in short, the ‘so what? question. (The crude but quick answer: by putting the texts in chronological order and colour-codingour Excel sheet, a hint of a pattern emerged).
Meanwhile, others in the room were experimenting with identifying the frequency of colour words, the use of Latin, simulating the shelves of the St Paul’s book-sellers, and even creating a game based on witch-trials (this by Sarah Cole, using Twine), and a team thinking about how to make the archive user-friendly to a more diverse audience (see Sjoerd Levelt’s prize entry to the EEBO-TCP Ideas Hack competition). Given my idea was conceived off-the-cuff, it was rather splendid to share third prize with our colleagues working on the same table.
What impressed me was the advantages offered by scale of the corpus and the rigour of its markup. Both of these features of the TCP project enabled me and Dan to produce – with surprising speed – a set of results for a question that would otherwise be much more difficult to answer. But what really blew my mind was how my tech guy took my simple question to another level: Dan wondered ‘how the structural differences between fiction and non-fiction might be usable as a training data set for an artificial intelligence that could learn to differentiate between the two’ (see his own blog post on the event).
I came away a slightly different academic, no longer intimidated by big data, enthused by digital collaboration, and now a big fan of the day-long hack.
I was extremely pleased with such a positive response to my workshop on digital editing, ECCO, 18thConnect, the Oxford Text Archive, and EEBO-TCP (whew!). Thanks to all who attended and especially for the fascinating discussion that ensued. As promised here is the PDF of the slide show liberate-bsecs-2015.pdf (thanks to Laura Mandell for some of the images).
I’m already thinking about next year at BSECS – maybe a session that really is a workshop, called ‘Bring your laptop’?
This sounds very interesting indeed: the Folger library will be running a new summer institute in July 2013 on Early Modern digital humanities. I quote the announcement (on the Early Modern Digital Agendas website):
In July 2013, the Folger Institute will offer “Early Modern Digital Agendas”under the direction of Jonathan Hope, Professor of Literary Linguistics at the University of Strathclyde. It is an NEH-funded, three-week institute that will explore the robust set of digital tools with period-specific challenges and limitations that early modern literary scholars now have at hand. “Early Modern Digital Agendas” will create a forum in which twenty faculty participants can historicize, theorize, and critically evaluate current and future digital approaches to early modern literary studies—from Early English Books Online-Text Creation Partnership (EEBO-TCP) to advanced corpus linguistics, semantic searching, and visualization theory—with discussion growing out of, and feeding back into, their own projects (current and envisaged). With the guidance of expert visiting faculty, attention will be paid to the ways new technologies are shaping the very nature of early modern research and the means by which scholars interpret texts, teach their students, and present their findings to other scholars.