Hacking the Early Modern: the EEBO-TCP hackfest

[The original version of this post was first published by ABO Public: An Interactive Forum for Women and the Arts 1640-1830].

So in March, I was invited to my first hack. Me, an English Literature lecturer was going to have to produce something with computers in one day? Now read on …

 Hunched over our laptops in the Weston library

Hunched over our laptops in the Weston library

This was the EEBO-TCP hackfest, an event designed to launch the release into the public online domain of over 25,000 texts from the fifteenth to the seventeenth century. These texts have been curated and encoded by the Text Creation Partnership, a collaborative project between the University of Michigan, the Bodleian Library University of Oxford, and Proquest, the publishers of online database Early English Books Online. The idea of the hackfest was that humanities researchers and scholars would come together with digital researchers and technologists and create – in a day – innovative and imaginative ways of exploring, analysing, and developing this huge corpus. Now, while I’ve been tinkering with digital humanities approaches myself, I’m no programmer. Moreover, I’m an eighteenth-century-ist so I was stepping a little outside my normal safety zone. So it was with some trepidation, yet also with considerable excitement, that I dipped a toe into my first digital hack. The setting was the new Bodleian Weston library: appropriately for a day building things, it was still under construction.

It started with a speed-date. Over plenty of coffee thirty-or-so of us circulated around telling our stories and plans to anyone we could button-hole. Given humanists seem to be in the majority, most people were looking for a tech person to help out, and my case, slightly desperately so. My idea was to analyse some of the structural features of pre-eighteenth-century fiction, such as dedications, prefaces, letters to the reader, chapters, illustrations etc. But what I didn’t know was how to bring out that data from a large corpus and produce something potentially meaningful.

Hattige XML grab
Detail of the XML file of Gabriel de Brémond, Hattige: or The amours of the king of Tamaran A novel. 1683.

I needn’t have worried. Everyone was incredibly receptive and eager to make our plans work, so I found my geek (I know he’s happy with that epithet!): the extraordinarily energetic Dan Q from the Bodleian’s digital team. Together with a couple of people

Dan Q looking at my trusty mac
Dan Q leaning over my trusty mac

working with formal features of seventeenth-century alchemy texts, we found ourselves a table and began to work out how we might visualize this structural data. And this is the part that I found really exciting: within a couple of hours I had created a sub-corpus of fiction from the total of 25,000 texts, Dan had written some code to identify and count all the structural features I could think of (with some advice from Simon Charles from the TCP project about the TEI markup), and it had started producing some figures. With the knowledge that we all had to present our work at the end of the day, I had to think of ways to set out the results to suggest some kind of point to all this: in short, the ‘so what? question. (The crude but quick answer: by putting the texts in chronological order and colour-coding our Excel sheet, a hint of a pattern emerged).

Meanwhile, others in the room were experimenting with identifying the frequency of colour words, the use of Latin, simulating the shelves of the St Paul’s book-sellers, and even creating a game based on witch-trials (this by Sarah Cole, using Twine), and a team thinking about how to make the archive user-friendly to a more diverse audience (see Sjoerd Levelt’s prize entry to the EEBO-TCP Ideas Hack competition). Given my idea was conceived off-the-cuff, it was rather splendid to share third prize with our colleagues working on the same table.

What impressed me was the advantages offered by scale of the corpus and the rigour of its markup. Both of these features of the TCP project enabled me and Dan to produce – with surprising speed – a set of results for a question that would otherwise be much more difficult to answer. But what really blew my mind was how my tech guy took my simple question to another level: Dan wondered ‘how the structural differences between fiction and non-fiction might be usable as a training data set for an artificial intelligence that could learn to differentiate between the two’ (see his own blog post on the event).

TCPhack-nicework-DanQ
‘Nice work Stephen” Nice work Dan”

I came away a slightly different academic, no longer intimidated by big data, enthused by digital collaboration, and now a big fan of the day-long hack.

Why you shouldn’t call yourself a True-Born Englishman.

The belief that ancient family lineage enables a person to claim a superior legitimacy of national belonging has been given a shocking airing recently. So it’s worth remembering that Daniel Defoe punctured this poisonous myth over 300 years ago.

coo.31924013179399-14The True-Born Englishman. A Satyr was initially a counter-response to John Tutchin’s The Foreigners: an attack on William III’s rule by focusing on his Dutch origins. Yet it catalysed a much wider-ranging satire on xenophobia and the idea of ethnic purity. Defoe’s poem starts with the idea of ingratitude towards what he views as the nation’s saviour (William III) and accuses the English nation of pride. He aims to prick this ‘bubbled Nation’ (27):

To Englishmen their own beginnings show,

And ask them why they slight their neighbours so.

Go back to elder times, and ages past,

And nations into long oblivion cast;

To old Britannia’s youthful days retire.

And there for true-born Englishman enquire.

Britannia freely will disown the name,

And hardly knows herself from whence they came:

Wonders that they of all men should pretend

To birth and blood, and for a name to contend. (43-52)

National pride based on lineage gets a rough ride. Defoe’s scorching reminder that England’s history is one of continual invasion from Romans, Picts, Scots, and Normans:

From whose mixed relics our compounded breed,

By spurious generation does succeed;

Making a race uncertain and unev’n,

Derived from all the nations under Heav’n.(171-75)

The English, then, are an illegitimate race whose claim to ‘ancient pedigree’ is based on nothing more that,

’Tis that from some French trooper they derive,

Who with the Norman Bastard did arrive:

The trophies of the families appear;

Some show the sword, the bow, and some the spear,

Which their great ancestor, forsooth, did wear. (212-18)

Defoe’s energy is focused on undermining pride in status and lineage: with each repetition of the phrase ‘true-born Englishman,’ the emptier it becomes. To this end, one of the most repeated ideas that drives Defoe’s satire links illegitimacy and mixture:

Thus from a mixture of all kinds began,

That het’rogenous thing, an Englishman:

In eager rapes and, a furious lust begot,

Betwixt and painted Briton and a Scot. (334-37)

This, Defoe scornfully cries, is the source of the ‘well-extracted blood of Englishmen’ (347). His incredulity, then, is to hear such people attack the non-English:

The wonder which remains is at our pride,

To value that which all wise men deride,

For Englishmen to boast of generation,

Cancels their knowledge, and lampoons the nation.

A True-Born Englishman’s a contradiction,

In speech an irony, in fact a fiction. (368-73)

So, next time you begin to argue about what it is to be English (or indeed what being British means), just think on Defoe’s poem

Play, experiment, and digital pedagogy

CSIRO_ScienceImage_7630_test_tubesFirst of all, a hat-tip to Willard McCarty: during a talk at Bath Spa University in March of this year, he quoted early-twentieth-century English critic I. A. Richards and it was this that crystallised my scattered thoughts on my students’ encounter with digital approaches to English literature. Richards prefaced his book Principles of Literary Criticism with the highly suggestive notion that ‘[a] book is a machine to think with’. Richards’ image was not an idle one: an ardent believer in the interplay between the arts and sciences, both his book and the book in the abstract – like any piece of technology from the automated looms of the late eighteenth century onwards – embodied human-designed creative procedures. Through the book, by bringing to bear those same human processes of thought, we are able to examine civilization and what it is to be human: the very task the book was designed to ‘re-weave’.[1] In the digital age it is hard to avoid the resonances: the preeminent machine of our age – the computer – is also governed by human procedures (programming) and ‘processing’ has now become almost entirely associated with computers. Yet we forget that books are, as Richards is implying, an invitation to be (re)processed by humans. What I want to emphasise is that this re-processing – what we less starkly call literary criticism – can be envisioned as a series of procedural building blocks.

What I’m also drawing upon has been defined by Ian Bogost as ‘procedural literacy’. Developing the idea that computing programming is a kind of literacy, Bogost proposed that ‘any activity that encourages active experimentation with basic building blocks in new combinations contributes to procedural literacy.’ Such a literacy in processes and procedures (such as I have described) becomes a foundation that can be applied elsewhere: ‘[e]ngendering true procedural literacy means creating multiple opportunities for learners—children and adults—to understand and experiment with reconfigurations of basic building blocks of all kinds.’[2]

This movement between play, experimentation and a critical awareness in the processes of interpretation was evident during a session on my undergraduate module Digital Literary Studies. Students were introduced to distance reading and invited to work with Voyant Cirrus on eighteenth-century novels. It was apparent in the workshops that the preliminary results of this analysis were not immediately significant or meaningful. So, the next stage involved playing with word choices, selecting synonyms to create clusters of meaning, or choosing antonyms to gain critical leverage. Given these were historical texts, another step involved researching historical inflections using the OED. Some students wanted add another interpretative layer: using Google’s N-Gram Viewer (with caution) they zoomed out even further. It was interesting to watch. The movement between these steps was not linear: some students moved back into the print copy of the novel for a close reading; some students shuttled back and forth between a few key procedures.

The initial surprise that textual visualization did not produce an immediate interpretation was a useful warning about the technological lure of instant answers. Instead, results became merely a first step in a series of experiments: each set of word choices – let’s call them hypotheses – required us to re-think the interpretative assumptions about the text(s). Moreover, the significance of the results was also subject to constant discussion, as if the text itself was changing shape. What my students discovered via this experimentation is the fascinating tension between different processes of interpretation: between what I. A. Richards might call re-weaving and what Lisa Samuel and Jerome McGann termed ‘deformance.’[3] The aim of the session was to generate some analyses of the literary history of the novel between 1660 and 1799; but the session also enabled students to slow down and reflect on their processes of interpretation: it trained them to be procedurally literate.

I started with citing I.A. Richards, part of a group of critics and intellectuals who in the early twentieth century placed close reading at the heart of English Studies. Despite its varied fortunes it is still there. What is most resonant for me and my students is the interplay between close reading, digital reading and procedural literacy. Experimentation puts both students and tutor at the very edge of their knowledge, but it is a place that is productively challenging. In also helping students to see their learning as series of processes that can be modified and reiterated, we are also enabling them with a critical and creative self-awareness that fits them for the rapidly changing twenty-first century world.

[1] I.A Richards Principles of Literary Criticism. 3rd ed. London: Keagan Paul, 1926, vii.

[2] Ian Bogost, ‘Procedural Literacy: Problem Solving with Programming, Systems, & Play.’ , 52:1&2 (Winter/Spring, 2005), 32-36.

[3] Lisa Samuels and Jerome McGann, ‘Deformance and Interpretation.’ New Literary History 30:1 (1999), 25-56.

 

21stC web activist and 18thC feminist in one speech …

MarthaLanetweet

Dame Martha Lane Fox, who is championing the setting up of an Institute  – Dot Everyone – to drive digital knowledge in the UK, quoted the late internet activist Aaron Swartz in her talk for the BBC Dimbleby Lecture: “It’s not OK not to understand the internet anymore.” In a talk ‘Dot Everyone: Power, the Internet, and You’ she outlined three areas in which the UK needs to develop its digital skills:

  • to educate and understand the history of the internet;
  • to put women at the centre of digital skills and address the current gender imbalance;
  • to take a lead in exmining the moral and ethical challanges posed by the internet.

Throughout, she also name-checked pioneers in computer technology such as Ada Lovelace, Alan Turing, and Sir Tim Berners-Lee. In calling for a revolution in the government’s thinking towards digital skills, she finished her talk by quoting someone we students of eighteenth-century English writing are very aware of: the pioneering feminist Mary Wollstonecraft: “the beginning is always  today”. I was impressed …

Encoding with English Literature undergrads

xmlgrabThis is an overview and reflection on a two-hour workshop I ran for English Literature undergraduates introducing XML/TEI. ‘Encoding worksheet’ (word doc) is here.

Previously I had taught XML/TEI in one-to-one tutorials, so this was the first time I had tried a group workshop, comprising two students who I was supervising (their final year dissertation projects were digital editions) and two students whose projects concerned print editing (from a module on Early Modern book history run by Prof. Ian Gadd). The knowledge base of these students was very varied: some had no experience of coding or markup; at the other end of the spectrum one was already competent with HTML. What, then, was the best way into encoding given this varied cohort?

TEI adviceMy answer was to start with the skills they already had (as @TEIConsortium emphasised), and emphasise the continuum between digital encoding and the traditional literary-critical analysis students use when preparing any text. After all, we’re so frequently concerned about the relationship between form and meaning. And it is the particular capability of XML/TEI to render this relationship between form and meaning that distinguishes it from other kinds of electronic coding.

So the first part of the workshop started with pencil-and-paper tasks. We first annotated a photocopy of a poem. Then I gave them a print out of the transcribed poem stripped of some of its features – title, line spaces, peculiar line breaks, italicisation. I then asked them to annotate, or markup, this version with a set of instructions to make it look like the ‘original’. The result was that the students not only marked up formal features, but clearly had a sense that these features also carried meaning. For example, I asked, “why was it important to render a line space?” I also pointed out that none of them had inserted the missing title in the plain text version, which raised some eyebrows: “Is it part of the text?” “Well, how do you define the text?”, I replied. These question were important for several reasons. I wanted to make the point that markup was a set of editorial and interpretative decisions about what the ‘text’ was and how it might be rendered and for what purpose. I also wanted to emphasis that both practices – whether pencil notes in the margin or encoding on a screen – involved very similar processes.

I next wanted to translate these points into an electronic context, by illustrating the differences between HTML as, essentially, a markup for how a text looks, to XML as a markup for describing that text. I did this by using my WordPress editor: by inserting a few HTML tags in the text editor mode then switching to the ‘visual’ mode they could see these features reproduced.[1]

At this point we moved to the computers and got down to some encoding in an XML editor (Oxygen). My main aim here was to enable them to markup the same poem in an XML editor to see how easily their literary-critical procedure could be transferred to this medium. In this, I was very gratified: all the students were able to create an XML file and mark up the poem remarkably easily.[2] I spent the last section of the workshop answering the implicit question: “you can’t read XML, so what is this for?” Given the restrictions on time, I had to briefly engage with some very broad issues of digitization and preservation and of analysing big data. Putting it simply, I remarked “computers are stupid,” (my mantra), “but if we markup up our texts cleverly, we can get computers to look at large bodies of knowledge with precision.” Demonstrating this was tricky given the time restrictions, but I had a go by exemplifying the more complex encoding of meaning possible in XML/TEI. I used a former student’s markup of Defoe’s Hymn to the Pillory and an XML file of A Journal of a Plague Year. The former demonstrated the encoding of names; for example I asked “how would a computer know that ‘S—ll’ is Dr Henry Sacheverell unless you have a way of encoding that?” The Journal was useful for demonstrating the highly structured nature of TEI and the ability of us to markup structural features of texts in precise ways: features that a computer can then process.

Journal-XMLgrab

I also demonstrated the flexibility of TEI: by inserting a new < after a </> XML automatically shows a dropdown list of possible markup elements and attributes. But my key point was that deciding which features to encode – out of all the possible features of a text – was an interpretative and editorial decision.

My aim for the workshop was modest: to enable students to make the leap from so-called ‘traditional’ literary-critical skills to the basics of encoding in XML, and in this I think the session was successful. On reflection, I think there two points which I hadn’t judged quite right. I hadn’t anticipated how quickly they could mark up a poem in XML; I think that was because the transition from pencil annotations to coding on screen worked very well. The last section – on the bigger point of getting computers to read literary texts – turned out to be more important than I had presumed and I would do this differently if I were to run this again. This might involve a follow-up session that, given the success of the first part of the session which involved hand-on tasks, would ask students to markup some more complex textual issues with TEI. This could be combined with a demo that not only showed some well-encoded texts but also the results of some data-mining of a medium-sized XML/TEI corpus.

I’ll keep you posted …

[1] There are probably better ways to demonstrate this, given the limitations of the WP text editor, but it was very much to hand.

[2] I acknowledge here my use of teaching materials from the Digital Humanities Oxford Summer School (the very same ones from which I had learnt TEI).

What is a novel in the eighteenth century? Some numbers …

Some of my undergradutes playing with data…

Digital Literary Studies

Students Ben Franks and Alice Creswell share their charts on some keyword searches conducted via the the ‘Genre’ filter in ESTC across 1660-1799. The first chart breaks down the 2,880 hits from the genre term ‘Fiction’ into various title keywords:

Fiction Fiction

This second pie-chart breaks down the 1,434 hits from the search term ‘Novels’:

Novels Novels

We wondered about the ways in which the ESTC catalogue had tagged these genres and the exent to which they overlapped (meta-metadata questions?). But these results were given additional context and meaning by setting them against the same keyword searches on Google’s N-Gram viewer and some more granulated searches of the metadata of the 1,000 novels in the Early Novels Database.

Ben and Alice’s favourite titles? The Devil Turn’d Hermit (check that full title!) and Adventures of a Bank-Note.

View original post

Report on CRECS Fight Club, 3 Feb 2015

A lovely event with some great questions from the floor about the canon!

CRECS//

by Alison Harvey

Tuesday night saw the launch of the Cardiff Romanticism and Eighteenth Century Seminar series, which kicked off in style with Fight Club: a no-holds-barred, trash-talking, dirty-fighting academic debate between six of English Literature’s finest. There was standing room only in Special Collections and Archives, with a superb turnout of over 60 undergraduates, postgraduates and staff. Each speaker had just 5 minutes to convince the audience that their chosen author was a true Romantic Genius. 

View original post 819 more words

Liberate the Text @BSECS conference 2015

18thConnectgrabI was extremely pleased with such a positive response to my workshop on digital editing, ECCO, 18thConnect, the Oxford Text Archive, and EEBO-TCP (whew!). Thanks to all who attended and especially for the fascinating discussion that ensued. As promised here is the PDF of the slide show liberate-bsecs-2015.pdf (thanks to Laura Mandell for some of the images).

I’m already thinking about next year at BSECS – maybe a session that really is a workshop, called ‘Bring your laptop’?

Eighteenth-century literature and the digital undergraduate

Bw1i4-yIcAAuCSj[This is a slightly amended version of a post that originally appeared on the blog of the North American Conference of British Studies]

Over the past couple of years I’ve been guiding some final year undergraduate students to create online digital editions of literary texts from the eighteenth century (see here, here, and here). To me, getting students to work with digital technology alongside eighteenth-century British Literature is now an exciting, but also essential, facet of my teaching. So I thought I would share how I got here with a brief overview of some developments, exercises and courses I’ve picked up in my own browsing over the past few years that teach eighteenth-century literature and are inspired by digital humanities.[1]

Digitisation

The huge acceleration of the digitisation of historical texts in the past decade and a half has been the catalyst for a trickle-down effect from research to teaching practices. Released in 2003, and as one of the biggest databases of eighteenth-century material, Eighteenth-century Collections Online (ECCO) arguably generated some the first reflections on using digital resources to teach eighteenth-century literature at undergraduate level: see my own 2007 paper and the many posts on teaching with ECCO on Anna Batigelli’s Early Modern Online Bibliography blog. The issue of cost and accessibility aside, the exponential rise of such resources – such as the Burney Newpapers database, English Broadside Ballads, and Old Bailey Online – has enabled students to enrich their knowledge of eighteenth-century literary culture: they were able to see unusual and non-canonical texts, to examine literary works in the light of historical or cultural ideas specific to the period or even decade, and to pose invigorating questions about literary value.

Blogging and wikis

This initial phase crossed over with tutors and professors experimenting with writing assignments and the different engagement with literary texts that might be enabled by digital platforms such as the wiki or the blog post. See for example, the work of Tonya Howe (Marymount University); the course run by Emily M. N. Kugler (Colby College) Histories and Theories of the 18thC British Novel; and Prison Voices 1700-1900, which has for example, this piece on Daniel Defoe’s Moll Flanders (this via Helen Rogers, Liverpool John Moores University). Adrianne Wadewitz (now sadly deceased) was also a leading experimenter using Wikipedia as a teaching tool. In this vein, Ula Klein has also recently written about her summer course on eighteenth-century women poets that involves the creation of wikis (here).

Beyond the blog

Sharon Alker (Whitman College) and Benjamin Pauley (Eastern Connecticut SU) reflected on using a variety of tools to teach Defoe including Second Life and Google maps. Laura Linker (High Point University) asks her Gothic novel students to use Google Earth to map narrative journeys, and even Second Life as a way of entering into characterization. In a course entitled ‘Remediating Samuel Johnson’, John O’Brien (University of Virginia) set up a collaborative digital anthology of Samuel Johnsons’ works using texts accessed via 18thConnect (significantly, a platform that begins to deal with the problem of access). John’s aim was explicitly student-centred: ‘[m]y hunch is that students will have a good idea of what students like themselves need to know to make sense of challenging eighteenth-century texts.’ Students of Rachel Sagner Buurma (Swarthmore College) experience hands-on work with a wonderful digital resource the Early Novels Database – see the students’ own blogs here. In a different course Rachel asks students create experimental and imaginative bibliographical descriptions of unusual and non-canonical eighteenth-century novels, see here.

Media shifts

Also fascinating are those courses and projects that use the very medium of digital technology to enable student to grasp the eighteenth-century’s own preoccupation with changing forms and media. As Rachael Scarborough King (New York University) suggests: ‘[d]rawing such connections between the experimentation and advances of eighteenth-century print culture and our own period of media transformation can offer a crucial foothold for students encountering eighteenth-century texts for the first time.’ Rachel asks students to write blog posts incorporating different adaptations of English literature as a way of getting a sense of these texts’ original meaning, form and transmission. In a course devised by Mark Vareschi (Wisconsin-Madison) he sets an ‘experimental assignment in digital composition and adaptation’ tasking students to tweet, 140 characters at a time Samuel Richardson’s Pamela as they were reading the novel. The course designed by Evan C. Davis (Hampden-Sydney College), Gutenberg to Google: Authorship and the Literature of Technology, also pays close attention to the form of literature in this period. In ‘Friday assignments’ there are intriguing tasks such as comparing how we read via print and via e-readers, and using online resources about typography and the Letter M Press app to enable students to re-create and reflect upon the physicality of print in the hand-press era.

I’m about to run my own digital literary studies course focusing on the eighteenth century this coming academic year, and I’ve found the work of others in this field fascinating and tremendously inspiring.[2] My thanks to everyone for letting me link to their courses and students’ projects.

[1] See Rachel Schneider’s blog post Eighteenth-Century Literature meets Twenty-First Century Tech, which reviewed the SHARP roundtable at ASECS 2014, organised by Katherine M. Quinsey, ‘Wormius in the Land of Tweets: Archival Studies, Textual Editing, and the Wiki-trained Undergraduate.’ Quotations in this post are from the authors’ proposals for the Digital Humanities Caucus panel ‘Digital Pedagogies’, organised by Benjamin Pauley and Stephen H. Gregg. The phrase ‘inspired by digital humanities’ is my deliberately broad definition that covers the wide variety of uses of digital technology and digital resources across the courses I’ve found. Since my particular interest is in eighteenth-century literature, if you are interested in syllabi that are focused on digital humanities beyond literature, or beyond the eighteenth century, then there are superb bibliographies here. Because I’m most interested in how these tools have been brought into the undergraduate classroom, I’ve not discussed here the (impressive and exemplary) graduate work in courses run by Lisa Maruca (see Mechanick Exercises), or Allison Muri’s Grub Street Project. For an excellent set of tips and examples see Adeline Koh’s essay ‘Introducing Digital Humanities Work to Undergraduates.’

[2] In this context I should acknowledge my debt to George Williams (University of South Carolina Upstate). George’s own course – despite being an eighteenth-centuryist – is focused on an earlier media shift, and is organized around Sir Gawain and the Green Knight.

Tweet as Alexander Pope Day (#tweetaspopeday

Conrad Brunstrom’s adroit commentary on the formal qualities of Pope’s poetry and Twitter!

conradbrunstrom

pope

Once a year, just once a year I like to relax and stop pretending that I can actually express myself with any degree of verve and finesse using only 140 characters and instead give the whole day over to Alexander Pope.

Now there was someone calling themselves Alexander Pope who was tweeting away – but they were making up their own couplets and trying to be topical.  I can’t be doing with that.  Far preferable was Samuel Pepys, who used to send daily nuggets from the 1660s of the “… and then to Vauxhall where did ogle Lady Castlemaine mightily” variety,  Haven’t heard from Pepys for a while though he did say his eyesight was getting bad.  No, for “tweet as Pope day” (#tweetaspopeday) all I want to do is send actual couplets from actual Pope poems, pretty much at random, at intervals through the day and see if the…

View original post 352 more words