Digital Editing Project outline and Digital editions criteria
In 2012 I started supervising an English undergraduate dissertation: this was a online digital edition and it was my first experience of supervising a student’s digital project. What follows is a joint blog post of two parts – one from me and the other from Jess MacCarthy (the student) – that reflects upon our experiences. You can see the final online edition here:
Thoughts from the me, the supervisor
A couple of years ago, I decided to learn a little more about the back-end end of digitized primary resources. I attended a boot-camp into the why and how of encoding, using XML encoding and the protocols of the TEI, at the Digital Humanities Summer School at Oxford University. Just over a year ago (late Spring 2012) I decided that the best way to learn is to teach. Simultaneously, I wanted to conduct a trial on producing a digital edition of a Defoe text that used up-to-date protocols of digital editing as well as the open-access ethos of the great majority of current digitization projects. So I asked our 3rd year English undergraduates whether anybody would be willing to do this for their dissertation project. Luckily, I had a volunteer, Jessica McCarthy.
I left it up to Jess to decide which Defoe texts she would like to work on: like any large-scale project, sustaining enthusiasm is essential. But it also meant that Jess would find a lot out for herself about Defoe’s writings. However, an important factor was that I was not expecting Jess to spend time transcribing the text and so we had to source a reliable electronic copy in plain text. This would give Jess the freedom to decide how she wanted to encode it and how it would be presented online. However, it also occurred to me that the question of a ‘reliable’ electronic copy in plain text was an interesting issue of discussion in itself: what different kinds of texts and what kind of reliability are offered by, for example, Project Gutenberg, Google Books, Jack Lynch’s Eighteenth-Century Resources, or Romantic Circles? Examples that directly raised other questions were close by: at Bath Spa University we are lucky enough to have access to the large-scale digital resources of EEBO and ECCO. Texts accessed via these different resources come in various forms: digital facsimiles, plain text transcriptions from post-1800 print editions, hyperlinked and encoded texts, or a combination of plain text and facsimile texts. So this first stage of the project actually involved a deeper understanding of the nature of existing electronic resources, databases and archives, and would more effectively immerse Jess in important questions concerning the format, usability and access to historical literary texts. How are issues of access related to the kind of texts one was accessing? What does the format of these texts have to say about how they can be used and who are using them? What processes are involved with the type of text available on these resources? What is a ‘text’ in a digital context anyway?
Such questions are important, first, because undergraduate students do not often understand why different online resources look and feel the way they do. So I try to make explicit to students the differences between a facsimile, an edition, and an encoded text and the significance of those differences for how the text is to be used and for whom. The facsimile usually presents no problem to understand; although, for example in the case of ECCO, the relation between the image and the text (unseen and what one actually searches) is not fully grasped by many undergraduates, which provokes some interesting discussion. Second, this contextual understanding is essential for students to decide what kind of edition they are going to create. In this I ask students to consider their readership or, as Dan Cohen put it in ‘The Social Contract of Scholarly Publishing’, the ‘demand side’ of Cohen argued that the print model has built-in assumptions about value and audience: ‘The book and article have an abundance of these value triggers from generations of use, but we are just beginning to understand equivalent value triggers online.’ Jess, for her own project – as you can see – decided to provide two editions to appeal to a variety of readerships: one an online edition with hyperlinked notes and a textual commentary; the other an encoding of that text. (In this, we looked to an edition on Romantic Circles as our model).
So, back to an earlier stage of decision-making. If we were after plain text copies of eighteenth-century editions, and not texts that were edited at some point later, that left two options for sources: the Oxford Text Archive and 18thConnect. There are currently 728 texts attributed to Defoe available via 18thConnect and 121 via OTA. Despite the ease with which one can download texts in a variety of file formats from OTA, I deliberately steered Jess towards 18thConnect because of its use of TypeWright. This software enables users to correct a number of individual 18c texts released to 18thConnect by ECCO (as frequent users of ECCO will know, the text that users are able to search is a rather mangled version, the product of now dated OCR software trying to decipher 18c typography via microfilm).
I may well continue to use this, since the advantage for any student is not only the knowledge gained about the workings and limitations of large-scale digital resources like ECCO that might be normally taken for granted, but also the added perspective gained on the processes of transformation from material document to electronic text.
Why encode and why TEI/XML?
Most databases allow one to perform searches based on a variety of categories (author, place of publication, title, date etc) because the texts have been ordered and sorted according to these categories. One can perform ‘all text’ searches. But I struggled, at first, to explain the limitations of this kind of markup to my students. So I’ll give you a similar kind of example I gave to Jess in relation to ECCO. Let’s imagine I’m searching some works by Defoe and I want to find references to High Church clergyman Henry Sacheverell (bap. 1674, d. 1724). Unsurprisingly there are quite a few, but it misses a number of important Defoe poems. Now I happen to know Sacheverell is mentioned in More Reformation and in The Double Welcome but ECCO didn’t find these. Why? Because in The Double Welcome his name is spelt ‘Sachevrel’, and in More Reformation it is ‘Sachavrell’. We could of course put in alternative spellings or use fuzzy searching. But this wouldn’t find more oblique references such as the one in Hymn to the Pillory where his name is pseudo-anonymously presented as ‘S———ll’. A machine does not know this is Henry Sacheverell. Similarly, it would not correctly identify this if Defoe had ever called him ‘Henry’ or ‘old Sacha,’ or something more figurative like ‘the Devil in a pulpit’ that we human readers would be able to interpret. More importantly, what if we didn’t know how Defoe alluded to Sacheverell at all?
A machine searches for strings of symbols and cannot recognise that one string of symbols represents another different string of symbols unless we tell it that each of those particular combination of symbols represent the same named entity. As Lou Bernard put it “only that which is explicit can be digitally processed,” or to put it another way encoding is to “make explicit (for a machine) what is implicit (to a person)”.
For me, then, the project has enabled me to reflect upon strategies for teaching digital technology and identifying – or beginning to – what issues are essential to introduce to students: the how and why of digital editing.
Jess McCarthy’s perspective: decentering authority?
I’m going to be going on a slightly different track; I’ll be talking about how in some ways my edition decentres some of the authority of a traditional printed edition of a text.
It wasn’t until I’d starting researching my reflective essay that I realised that my edition achieves this, to an extent, through my encoding of variants in the XML version. Most modern scholarly editions of texts work on the basis of editorial interpretation and intervention in creating a definitive edition which most closely presents the editor’s understanding of the author’s intentions. These editions are usually created through extensive use of textual apparatus, such as tables of variants and considered reasoning supporting the inclusion of one variant and the exclusion of another. Digital methods of presenting texts have brought into sharper focus how this approach to assembling an edition is based largely on limitations of its publication media. Marilyn Deegan and Kathryn Sutherland pointed out that,
for some the new technology has prompted the recognition of the prescriptive reasoning behind such editions as no more than a function of the technological limits of the book, less desirable and less persuasive now that the computer makes other possibilities available; namely, multiple distinct textual witnesses assembled in a virtual archive or library of forms. 
I aimed to achieve a presentation of multiple textual witnesses in my own edition by encoding variant readings into my XML document. This made it possible to present the different states of the text without privileging one state over another. This approach questions the idea of an ideal or more representative version of the text by presenting each state as equally valid and as existing simultaneously. Although I was able to present variants within my encoding without making any claims as to which witness was more authoritative, this was only really achievable within the encoded document. For example:
<l n=”19″>The undistinguish’d Fury of the Street,</l>
</app> Mob and Malice Mankind Greet:</l>
To present the text on the website I had to choose a copy text based on what I considered to be the most complete representation of Daniel Defoe’s intentions in A Hymn to the Pillory. I based my edition of the text on the second edition, corrected with additions. This decision was reached early in the project and it was based on the logic that this was the earliest edition available that presented a fuller version of the text. Given the common editorial practice of selecting either the first available edition or the last edition known to have been produced by the author, I would reconsider my choice of copy text were I to start again. However, despite being an unorthodox approach to a copy text, contemporary editions of A Hymn to the Pillory based on the first edition include the later additions found in the second edition, and given that variants between the two texts have been included, I don’t think that my earlier decision undermines the authority of the text presented in a significantly damaging way.
This concern might seem to conflict with my encoding of variants. There I have deliberately not identified a lemma and chosen instead to present multiple, simultaneous witnesses that destabilise the assumption that there are readings that are more valid. This approach works well if you are concerned with textual criticism or data mining to create distant readings of texts. However, I wanted my edition to be as useful as possible to the widest possible audience, so the traditional concern of the humanities with close readings and interpretation had to be considered, and which depend on a stable text to interpret. Marilyn Deegan and Kathryn Sutherland acknowledge this, pointing out that ‘the editor’s exercise of proper expertise may be more liberating for more readers than seemingly total freedom of choice.’ Although digital technologies are highlighting how text can be treated differently in electronic formats, the primary concern for most readers of literature is still in interpreting the meaning of the text (rather than how it was composed or its variant states); and to interpret the meaning rather than the textual history, a stable edition needs to be presented.
I wanted to support the authority of my edition as a serious scholarly work so I included all of the textual apparatus that you would expect to find in a scholarly print edition. C. M. Sperberg-McQueen argues that ‘electronic editions without apparatus, without documentation of editorial principles, and without decent provisions for suitable display are unacceptable for serious scholarly work.’ While this doesn’t necessarily mean that apparatus for digital editions has to work in the same way or with the same concerns as print editions, it situates intellectual integrity as remaining a key concern for supporting the authority of an online edition.
I used hyperlinks as a way to discretely point to textual annotations from A Hymn to the Pillory and also in order to direct readers to further online points of interest, either from the annotations themselves, or from further reading. Phillip Doss argues that ‘by allowing escape from the context of a single documentary sequence, hypertext allows a reader to escape the linearity imposed by print media.’ There are positive and negative implications to the use of hypertext links that I tried to consider within my edition. An obvious limitation of using hypertext is exactly that it allows readers to escape the linearity of the text. On the other hand, by using hyperlinks I have been able to provide easy access to extra-textual material that would not be possible to include in a print edition. For instance, where I have been able to find them, I have included works by people that are mentioned in A Hymn to the Pillory. This has meant that intertextual relationships can be explicitly explored, rather than simply acknowledged. In this way the text is shown to be the product of many various influences in a way that is more difficult to achieve using physical means of publication and although the text is still the main focus of the edition it is presented less in isolation.
Lisa Spiro’s essay ‘“This Is Why We Fight”: Defining the Values of the Digital Humanities’ argues that ‘for the Digital Humanities, information is not a commodity to be controlled but a social good to be shared and reused.’ This is very much an attitude that I adopted in my approach to this project. My website is open access, making it freely available to anyone who wants to use the information presented. However, although this project is not formally associated with Bath Spa University, as an undergraduate studying there I had the privilege of institutional access to specialist resources that I would not have been able to use to support my research otherwise. Access to services such as the Dictionary of National Biography (DNB) and Eighteenth Century Collections Online (ECCO) allowed me to work using facsimiles of the copy text and research biographical annotation with confidence in the reliability and authority of my sources. I chose to hyperlink these sites where I have relied on them for my research to maintain the integrity of my sources. Although this means that some users may not be able to access the sites at the end of the hyperlinks I believe that being able to present information based on what these resources provide goes a small way to democratising the information that they contain. Working with the knowledge that not all users will be able to reference my sources, I tried to make my annotations as comprehensive as possible while still maintaining a focus to how they are relevant to the text.
At its core this project has an engaged interest in making specialist information freely available in the most useful, reliable form possible. It has supported ongoing work to make other scholarly resources more reliable by using 18thConnect’s TypeWright and hopes to engage with the widest possible audience by providing not only what is traditionally expected from an authoritative edition of a text but also by incorporating the formats that digital encoding supports for more specialist pursuits and longevity.
 Marilyn Deegan and Kathryn Sutherland, Transferred Illusions: Digital Technology and the Forms of Print (Farnham: Ashgate, 2009), p.87.
 Transferred Illusions, p.71.
 C. M. Sperberg-McQueen, ‘Textual Criticism and the Text Encoding Initiative’, The Literary Text in the Digital Age, ed. Richard J. Finneran (Michigan: University of Michigan Press, 1999), p.41.
 Phillip E. Doss, ‘Traditional Theory and Innovative Practice: The Electronic Editor as Poststructuralist Reader’, The Literary Text in the Digital Age, p.218.