I think it’s fair to say that this year’s annual meeting attracted more panels on digital humanities than ever before (and that doesn’t even include the pre-meeting THATCamp workshops: for a good review of that see Lisa Maruca’s post on Early Modern Online Bibliography). I’ve posted already on the use of digital technology in teaching 18thC culture, but there were still quite a large number of panels that included discussions of digital humanities – whether explicitly labelled ‘digital humanities’ or not. What interested me were the issues that kept cropping up about how digital archives design data to be searched and how they are actually searched.

I was especially intrigued, in the roundtable ‘Digital Humanities and the Archives’, by Randall Cream’s (West Chester) call for digital archives to try to mimic the joyful moment of “serendipitous discovery” in traditional archives: such “interpretive moments” produced through unexpected answers to “unthought” problems may be difficult to reproduce in digital archives which depend so much upon naming, cataloguing, and tagging. Michael Gavin addressed how one manages the digitization of plays, with the special nature of a play as text and as a theatrical performance. For Michael Gavin, this is not addressed in the current tagging models of TEI, and outlines how he modified the tagging to produce an archive whose searches can be sensitive to these two play-contexts. Clearly, all were agreed that the move towards semantic tagging would enable a more human and sustainable interaction with digital data (semantic tagging, using XML for example, has the ability to describe concepts and meanings; as opposed to HTML which describes the nature of the document and its relation to other documents. If anybody wants to, I’m perfectly willing to be corrected on this very rough definition). In the ‘Poetry and the Archive’ roundtable, questions of use and searchability were again implicit. Jennifer Batt’s (Oxford) description of how the Digital Miscellanies Index could be searched was a good example of a digital resource that, perhaps paradoxically, is a more open-ended research tool: since this is in index of first and last lines and not a digital archive of texts, researchers are perhaps left to their own intuition. It is, of course, arguable: both Andreas Mueller (Worcester, UK) and Kyle Roberts (Loyola, Chicago), in the panel ‘Digital Approaches to Library History’, outlined digital archives that were, in effect, archives with a thesis and so imagined ways of searching that would be directed towards research problems specific to their archives (in this case, library collections that are extant or are now dispersed). Roberts, on the Dissenting Academies Online project, aimed to create a “virtual library” system able to comprehend multiform library catalogues and records including author catalogues, short list catalogues, borrowing registers for 12,000 titles, 45,000 borrowings and over 600 borrowers. What was described was a process of tagging that enables the user to track borrowing by individual “borrower profiles” and the borrowing of individual books; profiling the development and use of a particular library collection over time; and to reveal shelving habits and systems. Mueller’s collaboration with the Hurd Library (the still-extant library of Bishop Richard Hurd (1720-1808)) also aimed at a “virtual” library, but by through digital visualization. Using shelving catalogues and the few surviving original shelf marks together with digital images of the shelves and a digital schematic loaded with data may enable users to research how this man of letters interacted, not only with the books in his collection, but also  with the space of his library. The data mapped into the visualization would be garnered from Hurd’s annotations, letters and entries in his commonplace books. While I have to declare an interest in the Hurd Library collaboration, it seems to me that these two projects have an important contribution to make in rethinking library history.

But design is only one half of the process, and while designing digital archives involves thinking carefully about the questions a user asks of the archive, two panellists on the ‘Digital Humanities and the Archives’ roundtable raised interesting questions about the ways and results of searching a digital archive for the user’s perspective (in both cases here, this was ECCO). Bill Blake (NYU) asked “what makes a good keyword search”, and produced a list of popular search terms (“slavery” coming top). He suggested that many users had an impulse to “retrieve” rather than “search” and that the poorest keyword search terms effectively reproduced what was in the archive (one of the most popular search terms “slavery” was a good example of this). He argued that the best searches operated on a conceptual level. Indeed, that is what I’ve been training my own students to do, many of whose first try at ECCO was using a broad topic-based search term: they discover that the results of such search terms are useless and relatively quickly begin to think about the processes involved in deciding on a better search term (a factor I thought Bill Blake’s paper rather underplayed). Sayre Greenfield (Pittsburgh) posed a rather different problem with search results: what about “interpreting lack of results”? He argued that one can only “confirm the validity of negative results” by comparison to positive results elsewhere. Using the example of a phrase search “Ay, there’s the rub” resulted in only two (!) hits in ECCO; searching the Burney Collection resulted in a much larger number of hits, evidence that in the eighteenth century this particular phrase of Shakespeare’s inhabited the “cultural micro-climate” of journalism and not literary discourse (ECCO doesn’t include journals and newspapers).

    1. I think Bill Blake’s distinction is a good one and alludes to the kind of thinking behind the search (and I think we’re talking about inexperienced users here). Simply putting in a search term and hoping that will yield answers in and of itself and without further research (or much) is what I think he meant by a process of ‘retrieval’ – for example, that notion that each hit will make sense without context. ‘Searching’ suggest to me an approach that involves, at least, some reflection on the question you’re asking, some even general idea of what might come up and what you want to do with those results. In short, it would involve some conceptual or thesis-driven thinking. Getting my Eng. lit. undergraduate students to do good searches involves first of all, some narrowing of the parameters – and the rule ‘never use the simple search’: begin by limiting it to genre and/or span of a few decades (by narrowing the genre it can sometime help students judge the tone of the text). I also try to get them to think laterally, metaphorically or historically: so if they are investigating the representation women “delicacy” instead of just “women”; for men it might be “affectation”. Of course, this doesn’t happen straight away and there’s a two-way process of them getting to know the period and its way of thinking.


    Searching is also an issue that highly interests me. Rather than retrieval instruments, I view databases as discovery aids, and the process of using them as such changes the queries we pose and the ways those queries are articulated. Yet, the databases we are using in the U.S. do not operate on semantic or meaning-based searches—their search architecture operates via traditional keyword searching. Thus, the search engines for ECCO, EEBO, and the like in the U.S. lack the capability of computer-driven conceptual searching. Have you been using the new JISC Historic books collection to do conceptual searching? I was fascinated by this development (see EMOB post), and I had been seeking someone who had used this tool in the UK to offer some insights as well as comparisons between searches conducted on this platform and ones performed through the publishers’ sites.


  3. I, too, was intrigued by these notions of semantic searching, but I wonder how differently they would work from what we do with keywords. Would you do a search and receive an image of it within a semantic “web” of related terms? (I’ve used search platforms in the NLS that work this way). Or how? What basic information literacy should tell you is that novice users need to make inferences from initial results that allow them to refine the search in more productive directions. So how would we speed up or reinforce this process of refinement with semantic searching?


    1. @David and @Eleanor. I’m on the Advisory Panel for JISC Historic Books and their use of concept clouds was, as you point out in your informative post Eleanor, an attempt to better enable conceptual thinking when carrying out searches. I’ve tried it myself: the most interesting conceptual linkages generally occur when you try the smaller words in the cloud. Of course, that’s maybe just me. I confess I’m not sure how the parameters of such conceptual clouds are designed, but I presume they are generated by your initial search, so if you’ve put in an obvious term, the concept cloud won’t necessarily give you many great leaps / levels / steps in thinking. I also think that it’s perhaps less useful for experienced researchers who are generally several steps ahead of the cloud (and how’s that for a suggestive phrase?).


