My talk concerned the practice of teaching digital literary studies to undergraduate students (slides and audio recording are here). I wanted discuss the English literature student’s experience of technology in the classroom. I also talked about the meaning of digital humanities as it is deployed by both scholars and university managers; how the relationship between a discipline and the digital – from both an academic’s and a student’s point of view – is very different from the kind of learning technology that tends to manage students rather than a pedagogy that enables students to become creators. Finally, I argued for a tactical pedagogy that focuses on small-scale praxis, and a focus on building and enabling connections between academic colleagues, between academics and students, and students and the world beyond the institution.
I was 9 in 1969 and a child of the Apollo moon-landings, the film 2001 A Space Odyssey (more of which later), and TV. The time is significant because between then and my early teenage years my TV viewing was filled with the sci-fi fantasies (and re-runs) of Star Trek, the Gerry Anderson productions Thunderbirds, Stingray, Captain Scarlett, Joe 90, UFO, and Space 1999, and the Irwin Allen productions The Time Tunnel, Lost in Space, and Voyage to the Bottom of the Sea. In the background, lights blinking in seemingly meaningful patterns, were the computers, banked in serried rows behind the human actors. (I recently found out that all the computers in the Allen films were the same ex-US Air Force air-defence computer).
These images also found their echo in my brief flirtation with the electronic-prog-rock group Tangerine Dream: when I saw them live I gazed, not at the performers, but at the rhythmic lights of the Moog sequencers.
Such was my fascination that I remember asking my bewildered dad, who was a wood-carver and joiner by trade, for help in building a computer (I don’t remember if we ever did build a replica, although I do remember us building a satellite out of wood and tin foil).
It is perhaps telling that the most magnificent and startling computer did not have tapes or blinking lights. In Stanley Kubrick’s 1968 film 2001: A Space Odyssey (Arthur C. Clarke wrote the novel at the same time), the single camera eye of the initially benevolent computer HAL came to symbolize its terrifying implacability when it murders most of the space crew. The film has, in all sorts of ways, left its mark on me. But it is striking that HAL was not left as a homicidal nemesis. In a scene of considerable poignancy, the surviving crew-member, Dave Bowman, pulls out HAL’s circuit boards. As more and more boards are pulled out, HAL says “Dave. Stop. … I’m afraid. … My mind is going. I can feel it’ and when it slowly sings a song it had been taught, it regresses to a kind of childhood.
Unlike the more recent films and a TV series about artificial intelligence and consciousness – Artificial Intelligence, I, Robot, Her, Ex_Machina, Humans) – this scene does not depend upon humanoid features to enable a leap of empathy. It’s a difficult trick to pull off, generating that connection for a machine that is both terrifying and awe-inspiring (and just look at my own anthropomorphic language). There is also an uneasy feeling that we are being pulled ever-so-slightly off-centre – but it is a hallmark of Clark’s fictions, and Kubrick’s film, that such human decentring is accompanied by feelings of the sublime.
After becoming a radio amateur and working as a telecommunications engineer the arc of my life swung away from electronica to embrace acting, punk, forming a band, going to University, becoming a lecturer in English literature. Now embedded in the humanities and fascinated by the impact of the digital humanities, and almost permanently glued to my very own computer, I find the old interests coming back, the arc re-connecting my fascination with humans and technology. I’m not sure, even now, where it will lead, but I feel again that peculiar off-centred-ness and excitement.
Not so long ago I was reviewing a lecture I regularly present to students studying Samuel Richardson’s Clarissa. Looking back, I had no idea that this would lead me to speculate about how bibliographic data relating to English literary history is recorded in electronic databases.
It was a lecture that aimed to give some context about the ‘rise of the novel’ and I always had fun by reminding them just how illegitimate the novel was in the first half of the eighteenth century, and how even literary works formed a tiny proportion of what was published. But this time around, I thought I would actually present them with evidence. Some actual quantities. I came up with the idea of homing in on the year Clarissa was first published: 1748. My first attempt was quick and dirty.
Using the English Short Title Catalogue (ESTC) I typed in the year 1748, left everything else blank, and noted how many publications it returned (2550). I then thought it would be instructive to see how many of these were literary (in the loosest sense), so I went to Eighteenth-Century Collections Online (ECCO) and narrowed the 1748 search down via their category ‘Literature and Language’ (c.250 hits). Now to find how many of those were novels. Both databases yielded results with the subject ‘fiction’ and I then – to ram home the point – narrowed that list down to new titles published that year. Only 0.5% of all works produced in 1748 could be classified as new fiction. In a culture which perceives imaginative writing as practically synonymous with the novel, the result was a gratifying gasp of surprise from my student audience.
However, this rough-and-ready exercise set me on a different path, and made me think about how these databases, upon which we rely so trustingly, categorise our literary heritage. The simple exercise above revealed clear disparities between these databases in both the numbers and the titles returned, and some odd things about the way ESTC and ECCO had tagged these works. For a detailed breakdown of tags and titles I found, see my spread-sheet here.
A quick bit of history. The page images available in ECCO are digital scans of microfilm photographs of the original physical copies; in other words, as Ben Pauley has pointedly remarked, a remediation of a remediation. The original microfilming was contracted out by the British Library in the mid- to late-twentieth century. These were then purchased and sold to research libraries by a US company called ‘Research Publications’ (I have a sudden flash of memory from my postgraduate days, seeing that name on the microfilm boxes as I painstakingly loaded a film into the reader). In the 1990s that company was then bought by Gale. By 2002 the microfilms had been scanned and ECCO was launched as commercial database in 2003. A second tranche of material (ECCO Part II) was published in 2009.
ECCO got its bibliographic meta-data (for example, details about printers, publishing history, physical description, holding libraries) from the ESTC. However, the ESTC itself has a tangled history. It began life as the Eighteenth-century Short Title Catalogue in 1977. In 1987 it extended its remit to include material from c.1472 to 1700 (incorporating data from the Short Title catalogue of books printed in England, Scotland, and Ireland, and of English books printed abroad, 1475–1640 and the Wing catalogue which covered the period 1641-1700), and was then renamed the English Short Title Catalogue. Indeed, the precise relationship between ECCO and the palimpsest that is the ESTC is an obscure one, echoing (if you’ll forgive the pun) that between Pro-Quest’s database Early English Books Online, the ESTC and the Short Title Catalogue, as Bonnie Mak has elegantly pointed out.
When it comes to the question of how subject headings were assigned, there are few hard facts. However, Gale-Cengage gives some clues about this metadata on their website FAQs. At some point around 2009, just before the second tranche of digitized texts were published, the MARC (Machine Readable Catalogue) records for ECCO were ‘enhanced’ by adding Library of Congress (LoC) subject headings. These were obtained from ‘existing’ library records which held the physical copy. However, where this was not possible, ‘ESTC licensed the work of adding LoC headings.’ This process resulted in ‘[o]ver 274,000 subject headings’ being added; Gale notes that these ‘were added through the combination of harvesting and manual assignment.’
It seems there was at least considerable potential for divergence between these two systems of gathering and assigning subject headings, driven as they were by different organisations and groups of people. This might well have led to the bibliographers or cataloguers at Gale to adopt a different way of tagging and searching for subject headings.
Returning to the oddities I encountered in preparing my lecture: ESTC enables a search via ‘Subject (genre)’ and ‘Subject;’ ECCO has a drop-down option for ‘Subject.’ However, while ESTC tagged the genre field with ‘novel’ or ‘fiction’ and its subject field ‘fiction’, ECCO tagged the subject fields as ‘fiction’ and/or ‘English fiction’ (note the ‘and/or’ for further confusion). In all, this yields five different sets of results. Moreover, just looking at the widest set of results for the subject heading of ‘fiction’ (including reprints and new editions), the most striking aspect was the far larger number of results returned by the ESTC than by ECCO. There are no instances where ECCO identifies a work as ‘fiction’ that the ESTC does not. Even when ECCO tags A spy on Mother Midnight: or, the Templar Metamorphos’d as ‘fiction’ and the ESTC does not, the ESTC nevertheless tags it as ‘novels.’ However, there are some notable instances where ECCO does not follow the ESTC’s lead. For example, where the ESTC rightly categorises Henry Carey’s Cupid and Hymen: a voyage to the isles of love and matrimony as ‘fiction’ it is not listed as such in ECCO. Even more obviously missing as ‘fiction’ in ECCO is Henry Fielding’s canonical novel The History of the Adventures of Joseph Andrews! Conversely, someone at ECCO must have thought tagging Ovid’s Heroides. English Ovid’s epistles … Translated into English verse as ‘fiction’ – as did the ESTC – was, at best, misleading.
Perhaps this goes beyond the issue of the management of data? It is intriguing to speculate on the human intelligence behind the original LoC headings and how they were assigned. Are we talking about individuals who were re-interpreting the nature of the actual texts themselves? How else to account for some of these idiosyncrasies?
Let’s go back to Fielding’s The History of the Adventures of Joseph Andrews (first pub. 1742; 4th ed. 1748) which is tagged by the ESTC as ‘Tobacco-fiction,’ a subject heading that is at least consistent across the ESTC and ECCO. But this is assigned to just three texts in the whole catalogue; the other two are novels by Tobias Smollett: The Adventures of Peregrine Pickle (1751) and The Expedition of Humphry Clinker (1773). Now, it’s true that there are people who smoke in these novels; but there are plenty of other protagonists from the fiction of the period who smoke too and it’s not as if tobacco is a significant plot-device. To take one more example, the anonymous Suite des lettres d’une Peruvienne. Again the subject heading is consistent across the two databases: ‘Epistolary fiction, French-18th century;’ but it is the only title from the entire database that is associated with this subject heading.
More interesting still is what happens to the two variants of Nehemiah How’s A narrative of the captivity of Nehemiah How. For the first on my list (ESTC Number W014008) ECCO seems to agree with its status as fictional, although its ESTC category ‘novels’ has been changed to the less contentious ‘fiction.’ Was someone working for Gale more astute in their reading of eighteenth-century narrative form? Human interpretation in the database is also evident when it comes to the other variant (ESTC number W34168), which looks to have been added later since it appears in ECCO Part II. Notably any tags formally declaring its fictionality have gone: in the ESTC it is replaced with the more precise genre tag of ‘captivity narrative.’ However, in the ECCO even this slight hint of narrative is ignored, and instead opts to follow ESTC’s more historical-sounding subject tag of ‘Indian captivities.’
More anomalies could be found (help yourself!) but these few examples are intriguing. How this metadata has been assigned seems to have been the result of a tangled history of cataloguing and bibliography, machines and human agency, and the messy process of translation between academic projects and commercial digital publishing. It’s a warning – just in case we need another – about how we use the meta-data available to us via resources like ECCO, EEBO and the ESTC. While invaluable, careful use also requires knowledge about the historical processes behind the creation of these databases. We might also say that human database bibliographers faced the same problems of interpreting and categorising the eighteenth-century novel as literary scholars do, and as critics in the eighteenth century clearly did. So one more thing: it’s easy to forget that behind the search interface on your computer screen, that black box of the database, what we are looking at is evidence everywhere of human intelligence, diligence, error, and above all, interpretation.
 Characteristically, ECCO returns slightly different numbers even when the same search is repeated. See Joseph Dane, What is a Book?The Study of Early Printed Books (University of Notre Dame Press, 2012), pp.224-7.
 In this essay I make no claim for a comprehensive list of fiction published in 1748 or even to define what fiction is or was. For example, Jerry Beasley’s Check List of Prose Fiction Published in England 1740-1749 (University Press Virginia, 1972), might also be a good place to start. But would we want to include, for example, chapbooks as fiction? Quite possibly, but neither Beasley’s checklist nor ECCO includes them, and the ESTC’s coverage of this genre is unclear.
 Thanks to Ben Pauley; also to Scott Gibbons, Giles Bergel, and Elizabeth Grumbach for helpful conversations.
 See Laura Mandell, ‘The Business of Digital Humanities: Capitalism and Enlightenment’, Scholarly and Research Communication, 6.4 (2015). http://www.src-online.ca
This is the text of a talk I gave at the panel session for ‘Opening the book: reading and the evolving technology(ies) of the book’ for Academic Book Week, at the Institute of Historical Research, School of Advanced Study, London. 10th November, 2016. This post first appeared on the IHR blog.
I want to talk about the undergraduate perspective on a particular kind of academic book – the edition. In fact my starting point is that, from the student perspective (and according to some scholars), there is no longer a clear idea of what that is.
The place and perceived value of the printed critical edition seems to be still firmly established. I once asked my students to identify and compare value markers of their printed text in front of them and of an online version of the same text, and they made a pretty good case for the printed text, citing everything from the name of the publisher, to modes of reading, navigation, and interaction, and even pointing to the durability of its medium. And this in a digital humanities module. However, asking them to tell me how and why either of these versions look the way they do was a far more tricky question. So my polemic will be a plea for teaching in a way that puts students themselves in the position of editors and curators of literary texts: and that the best way of doing this is an engagement with digital editing and curating.
But first, I’m going to begin by outlining how a dramatic rise in the online availability of our literary heritage drives certain changes in reading and studying practices. When a lot of academics are running to catch up with the accelerating process in disseminating the world’s literary heritage online – even in their own field, and I include myself – is it any wonder that our students, stepping off the path of the printed set text, also find themselves slightly taken aback and click on the top hit in Google? Because there is indeed a chaotic mass of types of texts they can find. In addition to catalogue entries and Amazon hits, there are texts from web sites and web ventures that essentially depend upon some form of commercial revenue or profit (e.g. Google, Luminarium, editions via Kindle, and even apps), non-profit web organisations (e.g. Project Gutenberg, Poemhunter, Internet Archive, Hathi Trust), nationally-supported or privately-endowed institutions (e.g. Folger digital texts, British Library Shakespeare Folios), University libraries (e.g. SCETI, Virginia, Adelaide, Bodleian), a whole host of academic projects (e.g. Rosetti Archive, EEBO-TCP, the Correspondence of William Godwin, the Walt Whitman Archive) and, of course, via institutionally-accessed and pay-walled commercial publishers (like Cengage or ProQuest). My essential point is that there is a blurring of the definition of the ‘edition’. What we see – for sometimes good reasons – are projects that describe themselves as digital archives, databases, digital library collections, social editions (like Transcribe Bentham), and apps (e.g. Touchpress’s The Wasteland). And texts that come via these platforms look, feel and function very differently.
Between the printed and digital text, there’s a two-way process happening. The easy and quick availability of texts online drives a certain kind of reading of printed editions which makes invisible ‘the history of their own making’ (D. F. McKenzie). At the same time, undergraduates don’t often spot the distinction between the kinds of texts they find online and the one in their printed critical editions. This partly because they see only the text in their editions, and not the ‘edition’ (introduction, textual note, annotations, etc.): the actual edition becomes invisible. I don’t want to denigrate undergraduates’ skills and this isn’t entirely the students’ fault: it’s partly how English literary studies – at least in many seminar rooms – is still running with the idea of the literary text as an immaterial abstraction (despite the influence of various kinds of historicization). It’s this that renders invisible the processes that shape the form of the book in their hands. So I guess my rant is partly a plea for a serious consideration for the materiality of the book and a bigger role for the history of the book in English Studies.
But I’m also thinking about the lack of attention (at undergraduate level) paid to how editions and texts end up on the web in the ways they do. Formats vary hugely, from poorly catalogued page facsimiles, to unattributed HTML editing of dodgy nineteenth-century editions, to scholarly high-standard editing with XML/TEI encoding. But there are still plenty of these digital versions and collections that make it very difficult to see who these resources are for and how they got to look and function the way they do. And, as I’ve hinted at earlier, issues of format and accessibility are linked to how the various sites and projects are funded. In significant ways a lot of texts available digitally do much worse than the print edition at signalling ‘The history of their own making.’
So, the second half of my polemic is about how we should be making our students more aware of how the edition is remediated based on an understanding of the limits and affordances of digital technology and of how the internet works. Because this is where digital technology can open their books in a vital way. I’ve found it intensely interesting that the digital humanities community has been using a variety of material and haptic metaphors to describe what it is they are doing – ‘making’ or ‘building.’ For me, this is wonderfully suggestive. In asking my students to understand the processes involved in transforming a material book into an printed edition and then a digital edition is a necessarily haptic experience. This experience – a process that involves decisions about audience, purpose, authority, and technological affordances and restraints – enables a student to understand their literary object of study in a vital and transformative way. It might seem odd that I’m emphasising materiality in a debate thinking through the effects of what is, ostensibly, an immaterial medium, but technology is material and digital editing should involve the material aspects of the book and material work. My undergraduate dissertation student is producing a digital edition of a work by Henry Fielding: she will be going to the British Library to see the source text as an essential part of her learning. In a few weeks time, my students will be building a digital scanner partly out of cardboard; after that even our training in digital markup will start with pencil and a printed sheet of paper.
So I’m arguing that we give students the opportunity to be academic editors of books, and not just in theory but in practice; to enable them to be creators and not merely consumers of texts, because the electronic editions of the future should be powered by an early and vital experience of digital making.
 D. F. McKenzie, quoted in Jerome McGann, ‘Coda. Why digital textual scholarship matters; or, philology in a new key,’ in The Cambridge Companion to Textual Scholarship, eds, Neil Fraistat and Julia Flanders (Cambridge: Cambridge University Press, 2013), pp. 274-88 (p.274).
 I’m always reminded of internet hacktivist Aaron Swartz’s maxim: ‘It’s not OK not to understand the internet anymore.’
In our MA in Literature, Landscape & Environment me and my students have been looking at the influence of Virgil’s Georgics in eighteenth-century literature, and the theme of change and decay came up fairly frequently in our discussions. Indeed, the ‘Preface’ to Daniel Defoe’s A Tour ‘thro the Whole Island of Great Britain emphasises this aspect as key to understanding Britain in the 1720s:
The Fate of Things gives a new Face to Things … plants and supplants Families, … Great Towns decay, and small Towns rise; … great Rivers and good Harbours dry up, and grow useless; again, new Ports are open’d, Brooks are made Rivers, … navigable Ports and Harbours are made where none were before, and the like.
Defoe’s particular emphasis on change in the British nation can be seen by the simple expedient of counting up how many times he uses the word ‘new’ in the Preface (thirteen times). Even more striking is to see this visualisation of a word frequency analysis (using Voyant):
On the face of it, Defoe pays equal attention to rise and decay, but – like Virgil’s Georgics – the aspect of dynamism in the nation’s landscape that Defoe gets most excited about is one of vital newness. (For another reading of mutability in the Tour and the city of London, see my post ‘Defoe, Google, cities and Mr Penumbra’s 24-Hour Bookstore’.)
 Defoe, A Tour Thro’ the Whole Island of Great Britain (London: printed, and sold by G. Strahan, … MDCCXXIV ), p. iv. [ECCO, 5/12/15].
In July 1703, Daniel Defoe was convicted of sedition and on July 29th began the first of three appearances in the pillory: the first day in Cornhill, near the Royal Exchange; the second day at Cheapside; the third day at Fleet Street near Temple Bar. However, Defoe’s appearances were far from humiliating – at least on the first day. According to contemporary, and hostile, reports, on the 29th Defoe was surrounded by supportive crowds, including City big-wigs as well as ‘the rabble.’ Moreover, Defoe’s works were being ‘Hauk’d and Publickly Sold’ (including the very work he was convicted for, The Shortest Way with the Dissenters, as well as his Hymn to the Pillory) while he ‘Glory’d’ in the experience. But there is also a tradition that flowers were strewn around Defoe as he stood in the pillory. The image of Defoe standing nobly in the stocks whilst the populace of the City lay down flowers in admiration was memorialised most memorably in the 1862 painting by Eyre Crowe (see here for more details); it was also engraved in the same year by James Charles Armytage (see here at the National Portrait Gallery). The painting’s caption is worth quoting:
July 31, 1703, Daniel Foe, alias De Foe, this day stood in the pillory at Temple Bar in pursuance of his sentence, given against him at the last sessions at the Old Bailey for writing and publishing a seditious libel, entitled The Shortest way with the Dissenters. During his exhibition he was protected by the same friends from the missiles of his enemies: and the mob, instead of pelting him, resorted to the unmannerly act of drinking his health, etc..
This depiction also appears a few years later in William Lee’s 1869 biography of Defoe, The Life and Recently Discovered Writings of Daniel Defoe. We also know that Lee had been writing on and researching Defoe since at least 1860, so would have likely seen the Crowe painting. We can go back further: Walter Wilson, in his 1830 biography, Memoirs of the Life and Times of Daniel De Foe, had this to say:
Tradition reports, that the machine, which was graced with one of the keenest wits of the day; was adorned with garlands, it being in the midst of summer. The same authority states, that refreshments were provided for him after his exhibition.
But Wilson doesn’t cite his ‘authority.’ Recent biographers have been more reticent: while John Richetti recounts the story of his works being sold, he considers the flower-throwing as ‘a less likely tradition’ and Maximillian E. Novak chooses not to mention it at all.
It may be a ‘tradition’ but the shakiness of its foundations can be glimpsed in a few ways. Eyre’s painting sets Defoe’s pillorying at Fleet Street, Temple Bar (which can be seen in the background). There might have been a nice piece of bookish irony to place Defoe’s triumph at the heart of the eighteenth-century publishing industry. However, his appearance at Fleet street was on the third day, July 31st: the surviving contemporary reports place the scene of a supportive crowd only at his first appearance on July 29th at Cornhill. Moreover, Wilson’s evocation of a summer’s day complete with drinking and flowers is given the lie via another contemporaneous account. The diarist John Evelyn noted that on July 31st and August 1st there was ‘Thunder & lightning & raine’. [my emphasis]. The scene of a summer’s day, with a pillory strewn with flowers and surrounded by merriment seems the stuff of myth.
 Contemporary accounts quoted in Paul Backscheider Daniel Defoe: His Life (Baltimore: Johns Hopkins University Press, 1989), p.118; Maximillian E. Novak, Daniel Defoe: Master of Fictions (Oxford: Oxford University Press, 2001), p.191.
 Lee (3 vols), 1:73. Eyre’s painting is reproduced between pages 74 and 75.
 Furbank, P. N., and W. R. Owens, The Canonisation of Daniel Defoe (New Haven: Yale University Press, 1988), p.64.
 Paula Backscheider, in the one of the most detailed accounts of Defoe’s imprisonment, questioning and pillorying, replicates this scene without giving a source: ‘By all accounts, … the only things thrown at him were flowers,’ p.118; P. N. Furbank and W. R. Owens also repeat the claim of flower-pelting by ‘contemporary accounts’ , but without citing their authority. A Political Biography of Daniel Defoe (London: Pickering & Chatto, 2007), p.24.
 John Richetti, The Life of Daniel Defoe (Oxford: Blackwell, 2005), p.24
 Evelyn cited in F. Bastion, Defoe’s Early Life (London and Basingstoke: Macmillan, 1981), p.300.
The Digital Humanities Caucus invites paper proposals for two panels (see below) for ASECS 2016, Pittsburgh. Deadline for proposals to be sent to panel organisers: 15th September.
1. “Small-Scale Digital Humanities” (Roundtable) (Digital Humanities Caucus). Stephen H. Gregg, Department of English and Cultural Studies, Bath Spa University, Newton St. Loe, Bath. BA2 9BN, UK; Tel: (044) 7771702912; E-mail: firstname.lastname@example.org
A large, but largely unreported, amount of digital humanities work occurs outside of big research centres or well-funded collaborative projects. Such work might be undertaken by a scholar who is the sole academic in their Faculty – or one of a small handful of academics in their University – engaged in the digital humanities. They might also be working on a highly focused or a relatively small-scale digital project. This is a roundtable panel that seeks share the experiences of small-scale digital humanities work and the lone digital humanist. It seeks to engage with the challenges facing such scholars, such as:
· building value and recognition at home
· creating networks and collegial support at home
· networking outside the home University (regional, national, international)
· finding funding
· issues of technical support and training
2. “Building an Eighteenth-Century Corpus” (Digital Humanities Caucus) Scott Enderle, Skidmore College AND Mark Vareschi, University of Wisconsin-Madison, 600 N. Park St. Madison, WI 53706; Tel: (908) 420-1396; E-mail: email@example.com and Vareschi@wisc.edu
The Digital Humanities Caucus invites proposals on the politics, possibilities, and practicalities of building an Eighteenth-century corpus. While much focus in the digital humanities has been on the analyses of corpora, this panel considers the selection and construction of corpora necessary and prior to such analyses. How possible is it to create a “complete” or “representative” corpus? As we build corpora, how should we address the problem of archival silences? Further questions this panel may explore: What processes might we use to select works in a corpus? (Selection by “hand”? By some algorithm? Based on this or that metadata? What kinds of arguments are these different methods useful for?)
How should we think about the disjoint temporality of corpora? (An unplanned corpus — the books on a bookshelf — may include works from many different periods. A planned corpus built using temporal constraints may include just those texts from a given period, but only if they have been preserved by successive generations.) What could, for example, an eighteenth-century corpus tell us about the Victorian era or the seventeenth century? Might histories of reading help us build corpora? (How accessible were different kinds of documents? What reading habits did they invite?). This panel invites interdisciplinary perspectives and innovative presentation formats.
Last week I volunteered to chair some sessions for the MIX: Writing Digital conference at Bath Spa University. The conference brought together a wonderful and eclectic mix of creative digital writing and trans-media publishing projects. As perhaps the only literary-critical scholar at the conference (and an eighteenth-centurist to boot), I was on the borders of a lot of the discussions taking place – enjoyable and intriguing though they were. It was perhaps this that led me to play around with my engagement with the conference. So below are some visualisations of the conference programme – in this case the programme also included bios of the delegates and abstracts of the presentations, so it’s reasonably representative of the conference’s themes . The first is a word-frequency analysis of the conference programme using Voyant Cirrus, but with some of the obvious large-frequency words – like ‘University’ ‘Writing’ and ‘Digital’ – edited out, a move that I think brings out some of the finer detail of the conference’s themes.
Given so many of the projects and writings discussed during the conference were thinking through the possibilities offered by asychronous engagements with text, it seemed apposite that this kind of playing around with various analysers offered another way of engaging with the various texts of conference.
So in March, I was invited to my first hack. Me, an English Literature lecturer was going to have to produce something with computers in one day? Now read on …
This was the EEBO-TCP hackfest, an event designed to launch the release into the public online domain of over 25,000 texts from the fifteenth to the seventeenth century. These texts have been curated and encoded by the Text Creation Partnership, a collaborative project between the University of Michigan, the Bodleian Library University of Oxford, and Proquest, the publishers of online database Early English Books Online. The idea of the hackfest was that humanities researchers and scholars would come together with digital researchers and technologists and create – in a day – innovative and imaginative ways of exploring, analysing, and developing this huge corpus. Now, while I’ve been tinkering with digital humanities approaches myself, I’m no programmer. Moreover, I’m an eighteenth-century-ist so I was stepping a little outside my normal safety zone. So it was with some trepidation, yet also with considerable excitement, that I dipped a toe into my first digital hack. The setting was the new Bodleian Weston library: appropriately for a day building things, it was still under construction.
It started with a speed-date. Over plenty of coffee thirty-or-so of us circulated around telling our stories and plans to anyone we could button-hole. Given humanists seem to be in the majority, most people were looking for a tech person to help out, and my case, slightly desperately so. My idea was to analyse some of the structural features of pre-eighteenth-century fiction, such as dedications, prefaces, letters to the reader, chapters, illustrations etc. But what I didn’t know was how to bring out that data from a large corpus and produce something potentially meaningful.
I needn’t have worried. Everyone was incredibly receptive and eager to make our plans work, so I found my geek (I know he’s happy with that epithet!): the extraordinarily energetic Dan Q from the Bodleian’s digital team. Together with a couple of people
working with formal features of seventeenth-century alchemy texts, we found ourselves a table and began to work out how we might visualize this structural data. And this is the part that I found really exciting: within a couple of hours I had created a sub-corpus of fiction from the total of 25,000 texts, Dan had written some code to identify and count all the structural features I could think of (with some advice from Simon Charles from the TCP project about the TEI markup), and it had started producing some figures. With the knowledge that we all had to present our work at the end of the day, I had to think of ways to set out the results to suggest some kind of point to all this: in short, the ‘so what? question. (The crude but quick answer: by putting the texts in chronological order and colour-codingour Excel sheet, a hint of a pattern emerged).
Meanwhile, others in the room were experimenting with identifying the frequency of colour words, the use of Latin, simulating the shelves of the St Paul’s book-sellers, and even creating a game based on witch-trials (this by Sarah Cole, using Twine), and a team thinking about how to make the archive user-friendly to a more diverse audience (see Sjoerd Levelt’s prize entry to the EEBO-TCP Ideas Hack competition). Given my idea was conceived off-the-cuff, it was rather splendid to share third prize with our colleagues working on the same table.
What impressed me was the advantages offered by scale of the corpus and the rigour of its markup. Both of these features of the TCP project enabled me and Dan to produce – with surprising speed – a set of results for a question that would otherwise be much more difficult to answer. But what really blew my mind was how my tech guy took my simple question to another level: Dan wondered ‘how the structural differences between fiction and non-fiction might be usable as a training data set for an artificial intelligence that could learn to differentiate between the two’ (see his own blog post on the event).
I came away a slightly different academic, no longer intimidated by big data, enthused by digital collaboration, and now a big fan of the day-long hack.