I think it’s fair to say that this year’s annual meeting attracted more panels on digital humanities than ever before (and that doesn’t even include the pre-meeting THATCamp workshops: for a good review of that see Lisa Maruca’s post on Early Modern Online Bibliography). I’ve posted already on the use of digital technology in teaching 18thC culture, but there were still quite a large number of panels that included discussions of digital humanities – whether explicitly labelled ‘digital humanities’ or not. What interested me were the issues that kept cropping up about how digital archives design data to be searched and how they are actually searched.
I was especially intrigued, in the roundtable ‘Digital Humanities and the Archives’, by Randall Cream’s (West Chester) call for digital archives to try to mimic the joyful moment of “serendipitous discovery” in traditional archives: such “interpretive moments” produced through unexpected answers to “unthought” problems may be difficult to reproduce in digital archives which depend so much upon naming, cataloguing, and tagging. Michael Gavin addressed how one manages the digitization of plays, with the special nature of a play as text and as a theatrical performance. For Michael Gavin, this is not addressed in the current tagging models of TEI, and outlines how he modified the tagging to produce an archive whose searches can be sensitive to these two play-contexts. Clearly, all were agreed that the move towards semantic tagging would enable a more human and sustainable interaction with digital data (semantic tagging, using XML for example, has the ability to describe concepts and meanings; as opposed to HTML which describes the nature of the document and its relation to other documents. If anybody wants to, I’m perfectly willing to be corrected on this very rough definition). In the ‘Poetry and the Archive’ roundtable, questions of use and searchability were again implicit. Jennifer Batt’s (Oxford) description of how the Digital Miscellanies Index could be searched was a good example of a digital resource that, perhaps paradoxically, is a more open-ended research tool: since this is in index of first and last lines and not a digital archive of texts, researchers are perhaps left to their own intuition. It is, of course, arguable: both Andreas Mueller (Worcester, UK) and Kyle Roberts (Loyola, Chicago), in the panel ‘Digital Approaches to Library History’, outlined digital archives that were, in effect, archives with a thesis and so imagined ways of searching that would be directed towards research problems specific to their archives (in this case, library collections that are extant or are now dispersed). Roberts, on the Dissenting Academies Online project, aimed to create a “virtual library” system able to comprehend multiform library catalogues and records including author catalogues, short list catalogues, borrowing registers for 12,000 titles, 45,000 borrowings and over 600 borrowers. What was described was a process of tagging that enables the user to track borrowing by individual “borrower profiles” and the borrowing of individual books; profiling the development and use of a particular library collection over time; and to reveal shelving habits and systems. Mueller’s collaboration with the Hurd Library (the still-extant library of Bishop Richard Hurd (1720-1808)) also aimed at a “virtual” library, but by through digital visualization. Using shelving catalogues and the few surviving original shelf marks together with digital images of the shelves and a digital schematic loaded with data may enable users to research how this man of letters interacted, not only with the books in his collection, but also with the space of his library. The data mapped into the visualization would be garnered from Hurd’s annotations, letters and entries in his commonplace books. While I have to declare an interest in the Hurd Library collaboration, it seems to me that these two projects have an important contribution to make in rethinking library history.
But design is only one half of the process, and while designing digital archives involves thinking carefully about the questions a user asks of the archive, two panellists on the ‘Digital Humanities and the Archives’ roundtable raised interesting questions about the ways and results of searching a digital archive for the user’s perspective (in both cases here, this was ECCO). Bill Blake (NYU) asked “what makes a good keyword search”, and produced a list of popular search terms (“slavery” coming top). He suggested that many users had an impulse to “retrieve” rather than “search” and that the poorest keyword search terms effectively reproduced what was in the archive (one of the most popular search terms “slavery” was a good example of this). He argued that the best searches operated on a conceptual level. Indeed, that is what I’ve been training my own students to do, many of whose first try at ECCO was using a broad topic-based search term: they discover that the results of such search terms are useless and relatively quickly begin to think about the processes involved in deciding on a better search term (a factor I thought Bill Blake’s paper rather underplayed). Sayre Greenfield (Pittsburgh) posed a rather different problem with search results: what about “interpreting lack of results”? He argued that one can only “confirm the validity of negative results” by comparison to positive results elsewhere. Using the example of a phrase search “Ay, there’s the rub” resulted in only two (!) hits in ECCO; searching the Burney Collection resulted in a much larger number of hits, evidence that in the eighteenth century this particular phrase of Shakespeare’s inhabited the “cultural micro-climate” of journalism and not literary discourse (ECCO doesn’t include journals and newspapers).
Managed serendipity anyone?