All posts by shgregg

About shgregg

Lecturer in English, Bath Spa University.

Reading the changing platforms of Eighteenth Century Collections Online.

In Old Books and Digital Publishing: Eighteenth Century Collections Online, I analysed the various interfaces to Gale’s Eighteenth Century Collections Online (ECCO), including Jisc’s Historical Texts as well as Gale’s various platforms. However, since the book was published both Gale and Jisc have modified these interfaces, so I wanted to reflect on some of the arguments I originally made about these platforms.[1]

Detail from the ESTC record of Patrick Browne, The Civil and Natural History of Jamaica (1789)

A key strand of my argument was that the digital representation of bibliographic meta-data, such as format, variations in printing, if and how it was illustrated, errors, pagination (essentially, data ingested from the English Short Title Catalogue, see for example, above), was a significant factor in how we apprehended each book’s material presence. Because each different interface represented that record of their materiality differently, our perception of that presence, the  ‘bookishness’ of these books, depended on which interface we were using. I argued that these differences reflected some wider changes. First, the direction of publishers such as Gale and ProQuest has been towards packaging their digital products to enable cross-collection access and searching; however, given different collections were built on different standards of metadata, it became very difficult to represent metadata language in a consistent way.

Second, the influence of computational analysis or text-mining was reflected in Gale’s new platforms for ECCO from around 2014. As I argued: ‘the increased focus on the text of the books in ECCO risks bypassing the bookishness of books, those eccentricities introduced by the handmade processes of book production and transmission’ (92). Originally, such lack of material evidence was particularly felt in the beta release of the Gale Digital Scholar Lab platform where the results lists and the book viewer included no bibliographic information at all. However, between December 2021 and March 2022 Gale updated that interface. The book-viewer interface is now based on the same architecture as the Gale Primary Sources interface and bibliographic metadata is now available in the same way as the standard GPS interface.

Figure 1 Gale Digital Scholar Lab, ‘Explore’ view, detail  (31.03.2022)

Figure 2 Gale Digital Scholar Lab, ‘Full Citation’ view, detail (31.03.2022)

As you can see, bibliographic metadata is still rather awkwardly presented across two different viewing options (‘Explore’ and ‘Full Citation’). I won’t rehearse the arguments between bibliography and text-mining here, but it does seem a shame to make certain ways of reading less accessible on Gale’s now standardised platform.

Figure 3 Jisc Historical Texts, ‘Details’ view, detail (31.03.2022)

By comparison, Historical Texts, Jisc’s platform for accessing ECCO, presents bibliographic data in a slightly more user-friendly way in one whole pane. However, that metadata is not consistent across all collections accessed via the platform, and as I discussed in the book, the data was still rather attenuated. However as of July, 2021, Jisc ingested more bibliographic data from the ESTC, naming the source library and listing holding libraries for items in ECCO.[2]

Setting aside my point about the bookishness of books, one of the other central points in Old Books and Digital Publishing was that digital archives are far from static entities. For the moment, the collection itself remains stable, but how we interface with those old books – how we apprehend their material presence –  is subject to change. At the end of my book, I said that ‘ECCO is still changing’ (104). So, indeed, it turns out. But the rapidity with which commercial interfaces are updated underline how essential it is that we learn how to read these platforms and trace their histories.

[1] These arguments were also made in an earlier blog post ‘Paratexts and metadata: the interfaces of Eighteenth-Century Collections Online (July, 2020).

[2] Jisc Historical Texts – Development Roadmap (accessed 21/02/2022).


CFP: Isolation and Eighteenth-Century Studies (Defoe Society panel ASECS 2022)

Isolation is arguably the zeitgeist of the year of COVID-19. Remote working, online learning, shielding, stay-at-home orders, social distancing–all involve some form of isolation, whether enforced or self-imposed. This inescapable theme, then, seems particularly appropriate for an author whose works insistently probe the meanings of isolation. Defoe’s fiction, for example, obsessively returns to the relationship between individuation, civil community, and isolation beyond Robinson Crusoe: Roxana longed for isolation; Captain Singleton made halting attempts to overcome it; and, as evidenced by many journalistic and mass media pieces, A Journal of the Plague Year resonates with our current pandemic. Moreover, the differences among isolation, solitude, and loneliness also have a political dimension. As Hannah Arendt argued, isolation is the prerequisite for totalitarianism; by creating division and destroying the “public realm of life,” isolation radically disempowers collective action and communal agency. Defoe’s works also examine the politics of isolation, whether articulated via national culture or party politics (think about the anti-isolationist True-Born Englishman, or Legion’s Memorial). This panel seeks short papers or other explorations of isolation in eighteenth-century writing and culture: what it means, its costs, its benefits, its resonance today.

Please send abstracts to Dr Stephen H. Gregg ( and Professor Laura Stevens (

See the conference website for more information on ASECS 2022.

Paratext and Metadata: the Interfaces of Eighteenth-Century Collections Online

This post was orginally written for and published by the Eighteenth-Century Paratext Research Network in August 2020. My thanks to Corinna Readioff for permission to re-blog it via Manicule.

Eighteenth-Century Paratext Research Network

by Stephen Gregg (Bath Spa University)

Gérard Genette’s theory of the paratext is usually applied to some form of manuscript or printed material, but why not digital material?[1] In this post I want to explore how we might think about paratextuality in the relation to Eighteenth-Century Collections Online (ECCO). Briefly, ECCO is an online database published by Gale-Cengage. First published in 2003, it gives access via subscribing libraries to 184,536 titlesof material printed between 1700 and 1800, comprising the searchable text and digital page images. It is currently accessible via four different user interfaces (UIs):

  • Gale’s original, standalone, interface (2003)
  • Gale’s two cross-collection platforms, Gale Primary Sources (2016), and Gale Digital Scholar Lab (2019)
  • JISC’s platform Historical Texts (UK only, 2014)

The UI has obvious affinities with Genette’s notion of paratext as a space wherein reader and text interact: ‘a zone not just of transition, but of transaction; the…

View original post 1,231 more words

Pandemics, plagues, and literature

Detail from the cover of Camus, La Peste. Public Domain, Wikimedia

I’m sure there will be more, but I thought it timely to pull together in one place the various blogs and articles that have been drawing parallels between the coronovirus pandemic and the plagues of the eighteenth century (this is an admittedly very ‘long’ eighteenth century, but I rely on your mercy). Most of these drew my attention becuase they mention Daniel Defoe’s A Journal of the Plague Year, published in 1722 and dealing with the Great Plague of 1664-65 (but implicitly addressing an outbreak in France in 1721).

First (I think) is Marina Hyde’s masterfully biting parallel between the politics and public reactions of Britain in March 2020 and Defoe’s depiction of the populace of 1664-65.

This can be expanded by reading Adam James Smith’s blog post (with Jo Waugh) on social distancing in the Journal.

Next up was Marcel Theroux’s fascinating round-up of plague literature, including Defoe’s Journal, Boccaccio’s The Decameron, Thomas Mann’s Death in Venice, and ending with Camus’ 1947 novel, The Plague.

This is a response to someone who pointed out that this didn’t incliude Mary Shelley’s underated The Last Man (1826) – Olivia Murphy’s blog post discusses the novel’s disturbing prophecy of global catastophe.

Last, but not least, this op-ed in The Washington Post on the Journal, with the eye-catching title ‘The author of Robinson Crusoe was the Anthiony Fauci of his age.’

Screens, microfilms, and books: dreams and reality

In 1935 scholar-technologist Robert C. Binckley imagined how, with the aid of microfilm publishing, ‘the scholar in a small town can have resources of great metropolitan libraries at his disposal; similarly, a 1994 brochure for the microfilm collection The Eighteenth Century imagines the archive coming ‘to your library.’[1] In 1981, announcing the filming of the 18thC Short Title Catalogue, editor Robin Alston imagined the ability for scholars to seamlessly move between microfilms and a computer-based catalogue: ‘[e]ach text selected for filming will be keyed to the machine-readable record, … this means that users – whether libraries or scholars – will be given a unique opportunity to acquire access to both bibliographical records and whole-text reproductions.’[2] In 1995, riffing on Jorge Louis Borges’ fictional universe comprising a vast library of all knowledge, ‘The Library of Babel’, Kevin Kelley imagined the digital universe of knowledge: ‘[p]ages from the books appear on the screen one after another without delay. To search Borges’s Library of all possible books, past, present, and future, one needs only to sit down (the modern solution) and click the mouse.’[3]

Matthew Kirschenbaum, discussing commentaries on computing in the 1980s and 90s, conceptualised such visions as a ‘medial ideology’; and Nanna Thylstrup identified such dreams as key ‘spatial tropes’ characteristic of mass digitization projects from the 1990s to the present.[4]

When I was researching eighteenth-century books and their various remediated versions, my experience came close to many of these ideas. But dreams or theory didn’t really capture the experiential reality of analysing texts across and between different media. Analysing the fortunes of Patrick Browne’s The Civil and Natural History of Jamaica (1756, 1789), for instance, involved a complex dance of technologies and embodied experience. While a trackpad has replaced a mouse, I did indeed ‘sit down’ at a screen and worked from home, looking at online images on my laptop using ECCO (via JISC Historical Texts); images which were then downloaded and stored there. In addition, bibliographical records and locations had to be found, using an online bibliography – the ESTC – in conjunction with the British Library online catalogue and ordering service. However, in one important way the archive couldn’t come to me, so I had to go to the archive. Because ECCO is a based on 2D representations of books, and because catalogue records are to an extent an abstraction, I travelled to the British Library in order to examine this particular book-copy in all its 3D particularities. While at the library I navigated between my laptop (digitized images, online bibliography), the book, and a microfilm reader. It was not seamless. Not only did I have to re-learn how to load and use a microfilm reader, this and its control box took up most of the table space so that it was impossible to have all three items on the desk at once.


The distributed nature of transnational commercial publishing, however, really came home when I discovered that the particular book-copy I was reading on the microfilm reader was physically present in another continent. Luckily, I was able to converse with librarians across the Atlantic. Their generous help in being my eyes on the physical book underlined the limitations of online collections of texts. Of course it wasn’t luck – my scholarship was supported by their work, enabled by the visibility of institutions on the web, and by access to the global infrastructure of the internet, including the shared, privileged access to a paywalled online collection of texts known as ECCO. How that came into being is for another, longer, story.


[1] Robert C. Binkley, ‘New Tools for Men of Letters’, Yale Review n.s. 24 (1935): 519–537, Reprinted in Selected Papers of Robert C. Binkley, ed. Max H. Fisch (Cambridge, Mass.: Harvard University Press, 1948), pp. 179-197 (p.184). The Eighteenth Century (Reading: Research Publications International, 1994).

[2] Robin Alston, ‘ESTC texts on microfilm’, Factotum: The Newsletter of the XVIIIth century STC, no.12, July 1981, p.2.

[3] Quoted in Nanna Bonde Thylstrup, The Politics of Mass Digitization (Cambridge, Mass.: MIT Press, 2018), p.103.

[4] Matthew Kirschenbaum, Mechanisms: New Media and the Forensic Imagination (Cambridge, Mass.: MIT Press, 2008), p.43); Thylstrup, Mass Digitization, p.107.

Remembering Robinson Crusoe: 1719, 1970, 2019

April 2019 and I’m thinking about Daniel Defoe’s The Life and Strange Surprizing Adventures of Robinson Crusoe, first published in April 1719 (it was entered on the Stationer’s Register on April 23rd).

Engraving and title page. Credits: Public Domain,

However, I’m reminded of my first encounter. This was the the black and white Anglo-French TV series first produced in 1964 and shown on BBC TV in the late 60s and early 70s, usually in an afternoon slot during my school holidays (see the image at the top of the post). The memory of this is suffused with an aura of contentment – my own, that is – lazily watching TV on an afternoon. And my memory of it is selective since the dominant images that come from the series also construct the time of Crusoe’s shipwreck on the island rather like my own school holidays: exciting and yet boring, carefree and occasionally and perhaps unintentionally comic. And what really sticks in my mind is Crusoe’s building and making (see episode 5). Now, I’m not sure now why this should be, since I’m no DIY-er. But there were still places near the suburbs where I lived as a child in Leeds that were uncultivated and undeveloped: places where I could go on my own or with friends among weeds growing to shoulder height and explore woods (one with a derelict WWII bunker). So there was something in my solitary rambles of the isolation, freedom and making things with sticks that the TV series evoked. Yet seeing these episodes again, I realise I had completely forgotten the flashbacks to Crusoe’s time with his father in (a strangely rural) York. Was it because that – sitting in front of the TV – I had no need to know about fathers and parents and home? Or was it that the promise that what Crusoe himself called his ‘rambling’ impulse was precisely the opposite of the world of home and contentment, where men ‘went silently and smoothly thro’ the World’, as Crusoe’s father puts it.

Memories of my own life as child, images from the Crusoe TV series, and my memory of the effect of these images move and shift around themselves in peculiar ways. Now, as a Defoe scholar and a father of boys, inspecting my memory becomes a far more complicated task. Certainly, my nostalgia of childhood ‘rambling’ owes much to a projection of present-day loss: “would I let my own children now do the kind of solitary adventuring I did then?” That myth of adventure passed on to me throught this TV series is now tempered by how I can see that it’s a sanitised version of Defoe’s 1719 novel: the TV version is slightly emptied of Defoe’s religious and moral rhetoric. More troubling is how the issue of Friday’s subservience is reduced to a kind of friendship, which now makes for very uncomfortable veiwing. And I even mentioned that I remembered watching the series with contentment …

Maybe it’s the musician in me, but the strongest memory I have is of the series’ music: it is this that most precisely captures the mixed images of Crusoe’s poignant isolation and my nostalgia for carefree adventure.

The opening theme’s grand, rolling strings evokes the crashing of seas and waves and suggesting the epic nature of escape, journey, and adventure. But it’s one of the incidental scores that has the most powerful place in my memory since it concentrates solely on giving shape to the underside of adventure, poignantly evoking the tedium and loneliness of shipwreck (this runs from about 0. 45 in, to 5.18).

(Note: first version of this post published 2013; updated 2019).

Finding ECCO-TCP texts

This is a list of the places you can access the corpus of texts from the Eighteenth-Century Collections Online Text Creation Partnership (ECCO-TCP).

[updated 8th October 2019]

[Image “Magnifying Glass” by nathanmac87 is licensed under CC BY 2.0]

A quick bit of history. The Text Creation Partnership started, in 1999, as a collaboration between the university libraries of Michigan and Oxford, the Council on Library and Information Resources, and the publisher of Early English Books Online, Proquest. The aim was to create high quality ‘standardized, digitally-encoded electronic text editions’ starting with 25,000 titles from Early English Books Online.

In 2005 the project expanded to include Gale-Cengage’s Eighteenth-Century Collections Online (as well as Evans Early American Imprints by Newsbank). However, while the EEBO-TCP project flourishes (with around 40,000 texts transcribed so far), the work on ECCO-TCP stagnated at around 2,000 texts. As well as the main partner institutions of Michigan and Oxford that offer access to the ECCO-TCP corpus, there have been a variety of spin-off projects. The result is a rather confusing jumble of access to the eighteenth-century TCP texts, so I’ve listed them below, with a few comments. [1]

Via ECCO-TCP page at the Text Creation Parnership main site

From Text Creation Partnership main page there’s a ‘About the texts’ tab, under whch you can select ECCO-TCP. Here there are a variety of links that enable you to search the corpus, view the full text of a title, and/or download files of the texts. The links are available below, with some comments.

University of Michigan library. A useful variety of ways of searching the corpus. You are able to view the full text of individual works online.

ARTFL. A collaborative project between University of Chicago and the French Government to provide access to a variety of digitized resources. This search engine enables some very useful types of search, including KWIC views, terms by frequency and/or year, as well as being able to view the texts online (I haven’t had much time to seriously play with this yet):

‘Download the original SGML/XML files’ (in several batches) available to download as zip files

The Oxford Text Archive. A gateway to various TCP-transcribed  collections, sorted by date range, orginating collection (e.g. EEBO, ECCO, Evans), and even subject. IMHO, it’s great if you’re looking to find and download individual works since they are available in a useful variety of file formats: HTML, XML, ePub, mobi (for Kindle), and plain text (UTF-8).  [URL updated 8 Oct. 2019]

Corpus of Late Modern Early English Medical Texts, a project run by Helsinki that uses ECCO-TCP files (though the title of the Heksinki project is the ‘Corpus of Early English Medical Texts’).

Other Places!

18thConnect Offers a searchable catalogue of TCP texts. Each title includes a link to give free access to facsimile page images of the texts via Eighteenth-Century Collections Online.The website also allows users to curate online collections of texts via its ‘exhibit’ builder.

Early Modern OCR project (eMOP). Matt Christy has uploaded to GitHub 2,188 individual plain text files. Plus, in ‘file finder’ search for ‘ECCOTCPcombined’ and you’ll find these combined and split into three tranches each available as a file to download.

John Levin has curated a very large set of TCP corpora on his GitHub pages, including ECCO-TCP in either SGML or XML:

John has also created a list of 2,188 ECCO-TCP titles on an open Zotero group:

Visualizing Early Print is a recent project to better enable the large-scale analysis of Early Modern texts and is a collaboration between University of Wisconsin-Madison, the University of Strathclyde, and the Folger Shakespeare Library. It makes available a ‘corpus of plain text files extracted from ECCO-TCP XML files, offered in both standardized spelling and original spelling versions.’ Each zip file includes a metadata file (csv). The site also includes some exciting tools.

Also, a single file of all 2,188 texts (plain text) combined can be downloaded from my own Google Drive (it will warn you that it cannot preview the file or scan it for viruses).

[1] Numbers: TCP and Visualising Early Print cite 2,473 texts; the GitHub repository (and lists derived from that) cite 2,188 texts. I have yet to find out why the number if texts are different.

Spiralling: teaching undergraduate digital literary studies

Corporal Trim's spiral, from Tristram Shandy. Courtesy
Corporal Trim’s spiral, from Tristram Shandy. Courtesy

It was a privilege to be invited to deliver a keynote talk at the Digital Humanities Congress 2016, hosted by the Humanities Research Institute, University of Sheffield. My sincere thanks to the organiser Michael Pidd for both the invite and a vibrant and supportive conference.

My talk concerned the practice of teaching digital literary studies to undergraduate students (slides and audio recording are here). I wanted discuss the English literature student’s experience of technology in the classroom. I also talked about the meaning of digital humanities as it is deployed by both scholars and university managers; how the relationship between a discipline and the digital  – from both an academic’s and a student’s point of view – is very different from the kind of learning technology that tends to manage students rather than a pedagogy that enables students to become creators. Finally, I argued for a tactical pedagogy that focuses on small-scale praxis, and a focus on building and enabling connections between academic colleagues, between academics and students, and students and the world beyond the institution.

My childhood and computers

I was 9 in 1969 and a child of the Apollo moon-landings, the film 2001 A Space Odyssey (more of which later), and TV. The time is significant because between then and my early teenage years my TV viewing was filled with the sci-fi fantasies (and re-runs) of Star Trek, the Gerry Anderson productions Thunderbirds, Stingray, Captain Scarlett, Joe 90, UFO, and Space 1999, and the Irwin Allen productions The Time Tunnel, Lost in Space, and Voyage to the Bottom of the Sea. In the background, lights blinking in seemingly meaningful patterns, were the computers, banked in serried rows behind the human actors. (I recently found out that all the computers in the Allen films were the same ex-US Air Force air-defence computer).

Lee Meriwether in front of computer, from The Time Tunnel (1966-67). Via Wikimedia Commons

These images also found their echo in my brief flirtation with the electronic-prog-rock group Tangerine Dream: when I saw them live I gazed, not at the performers, but at the rhythmic lights of the Moog sequencers.

Such was my fascination that I remember asking my bewildered dad, who was a wood-carver and joiner by trade, for help in building a computer (I don’t remember if we ever did build a replica, although I do remember us building a satellite out of wood and tin foil).

It is perhaps telling that the most magnificent and startling computer did not have tapes or blinking lights. In Stanley Kubrick’s 1968 film 2001: A Space Odyssey (Arthur C. Clarke wrote the novel at the same time), the single camera eye of the initially benevolent computer HAL came to symbolize its terrifying implacability when it murders most of the space crew. The film has, in all sorts of ways, left its mark on me. But it is striking that HAL was not left as a homicidal nemesis. In a scene of considerable poignancy, the surviving crew-member, Dave Bowman, pulls out HAL’s circuit boards. As more and more boards are pulled out, HAL says “Dave. Stop. … I’m afraid. … My mind is going. I can feel it’ and when it slowly sings a song it had been taught, it regresses to a kind of childhood.

Unlike the more recent films and a TV series about artificial intelligence and consciousness – Artificial Intelligence, I, Robot, Her, Ex_Machina, Humans) – this scene does not depend upon humanoid features to enable a leap of empathy. It’s a difficult trick to pull off, generating that connection for a machine that is both terrifying and awe-inspiring (and just look at my own anthropomorphic language). There is also an uneasy feeling that we are being pulled ever-so-slightly off-centre – but it is a hallmark of Clark’s fictions, and Kubrick’s film, that such human decentring is accompanied by feelings of the sublime.

After becoming a radio amateur and working as a telecommunications engineer the arc of my life swung away from electronica to embrace acting, punk, forming a band, going to University, becoming a lecturer in English literature. Now embedded in the humanities and fascinated by the impact of the digital humanities, and almost permanently glued to my very own computer, I find the old interests coming back, the arc re-connecting my fascination with humans and technology. I’m not sure, even now, where it will lead, but I feel again that peculiar off-centred-ness and excitement.



1748: ‘Fiction’ in the Database:

Not so long ago I was reviewing a lecture I regularly present to students studying Samuel Richardson’s Clarissa. Looking back, I had no idea that this would lead me to speculate about how bibliographic data relating to English literary history is recorded in electronic databases.

It was a lecture that aimed to give some context about the ‘rise of the novel’ and I always had fun by reminding them just how illegitimate the novel was in the first half of the eighteenth century, and how even literary works formed a tiny proportion of what was published. But this time around, I thought I would actually present them with evidence. Some actual quantities. I came up with the idea of homing in on the year Clarissa was first published: 1748. My first attempt was quick and dirty.


Using the English Short Title Catalogue (ESTC) I typed in the year 1748, left everything else blank, and noted how many publications it returned (2550). I then thought it would be instructive to see how many of these were literary (in the loosest sense), so I went to Eighteenth-Century Collections Online (ECCO) and narrowed the 1748 search down via their category ‘Literature and Language’ (c.250 hits).[1] Now to find how many of those were novels. Both databases yielded results with the subject ‘fiction’ and I then – to ram home the point – narrowed that list down to new titles published that year. Only 0.5% of all works produced in 1748 could be classified as new fiction.[2] In a culture which perceives imaginative writing as practically synonymous with the novel, the result was a gratifying gasp of surprise from my student audience.

However, this rough-and-ready exercise set me on a different path, and made me think about how these databases, upon which we rely so trustingly, categorise our literary heritage. The simple exercise above revealed clear disparities between these databases in both the numbers and the titles returned, and some odd things about the way ESTC and ECCO had tagged these works. For a detailed breakdown of tags and titles I found, see my spread-sheet here.

A quick bit of history. The page images available in ECCO are digital scans of microfilm photographs of the original physical copies; in other words, as Ben Pauley has pointedly remarked, a remediation of a remediation.[3] The original microfilming was contracted out by the British Library in the mid- to late-twentieth century. These were then purchased and sold to research libraries by a US company called ‘Research Publications’ (I have a sudden flash of memory from my postgraduate days, seeing that name on the microfilm boxes as I painstakingly loaded a film into the reader). In the 1990s that company was then bought by Gale.[4] By 2002 the microfilms had been scanned and ECCO was launched as commercial database in 2003. A second tranche of material (ECCO Part II) was published in 2009.

ECCO got its bibliographic meta-data (for example, details about printers, publishing history, physical description, holding libraries) from the ESTC. However, the ESTC itself has a tangled history. It began life as the Eighteenth-century Short Title Catalogue in 1977. In 1987 it extended its remit to include material from c.1472 to 1700 (incorporating data from the Short Title catalogue of books printed in England, Scotland, and Ireland, and of English books printed abroad, 14751640 and the Wing catalogue which covered the period 1641-1700), and was then renamed the English Short Title Catalogue.[5] Indeed, the precise relationship between ECCO and the palimpsest that is the ESTC is an obscure one, echoing (if you’ll forgive the pun) that between Pro-Quest’s database Early English Books Online, the ESTC and the Short Title Catalogue, as Bonnie Mak has elegantly pointed out.[6]

When it comes to the question of how subject headings were assigned, there are few hard facts. However, Gale-Cengage gives some clues about this metadata on their website FAQs. At some point around 2009, just before the second tranche of digitized texts were published, the MARC (Machine Readable Catalogue) records for ECCO were ‘enhanced’ by adding Library of Congress (LoC) subject headings.[7] These were obtained from ‘existing’ library records which held the physical copy. However, where this was not possible, ‘ESTC licensed the work of adding LoC headings.’ This process resulted in ‘[o]ver 274,000 subject headings’ being added; Gale notes that these ‘were added through the combination of harvesting and manual assignment.’[8]

It seems there was at least considerable potential for divergence between these two systems of gathering and assigning subject headings, driven as they were by different organisations and groups of people. This might well have led to the bibliographers or cataloguers at Gale to adopt a different way of tagging and searching for subject headings.

Returning to the oddities I encountered in preparing my lecture: ESTC enables a search via ‘Subject (genre)’ and ‘Subject;’ ECCO has a drop-down option for ‘Subject.’ However, while ESTC tagged the genre field with ‘novel’ or ‘fiction’ and its subject field ‘fiction’, ECCO tagged the subject fields as ‘fiction’ and/or ‘English fiction’ (note the ‘and/or’ for further confusion). In all, this yields five different sets of results. Moreover, just looking at the widest set of results for the subject heading of ‘fiction’ (including reprints and new editions), the most striking aspect was the far larger number of results returned by the ESTC than by ECCO. There are no instances where ECCO identifies a work as ‘fiction’ that the ESTC does not. Even when ECCO tags A spy on Mother Midnight: or, the Templar Metamorphos’d as ‘fiction’ and the ESTC does not, the ESTC nevertheless tags it as ‘novels.’ However, there are some notable instances where ECCO does not follow the ESTC’s lead.[9] For example, where the ESTC rightly categorises Henry Carey’s Cupid and Hymen: a voyage to the isles of love and matrimony as ‘fiction’ it is not listed as such in ECCO. Even more obviously missing as ‘fiction’ in ECCO is Henry Fielding’s canonical novel The History of the Adventures of Joseph Andrews! Conversely, someone at ECCO must have thought tagging Ovid’s Heroides. English Ovid’s epistles … Translated into English verse as ‘fiction’ – as did the ESTC – was, at best, misleading.

Perhaps this goes beyond the issue of the management of data? It is intriguing to speculate on the human intelligence behind the original LoC headings and how they were assigned. Are we talking about individuals who were re-interpreting the nature of the actual texts themselves? How else to account for some of these idiosyncrasies?

Let’s go back to Fielding’s The History of the Adventures of Joseph Andrews (first pub. 1742; 4th ed. 1748) which is tagged by the ESTC as ‘Tobacco-fiction,’ a subject heading that is at least consistent across the ESTC and ECCO. But this is assigned to just three texts in the whole catalogue; the other two are novels by Tobias Smollett: The Adventures of Peregrine Pickle (1751) and The Expedition of Humphry Clinker (1773). Now, it’s true that there are people who smoke in these novels; but there are plenty of other protagonists from the fiction of the period who smoke too and it’s not as if tobacco is a significant plot-device. To take one more example, the anonymous Suite des lettres d’une Peruvienne. Again the subject heading is consistent across the two databases: ‘Epistolary fiction, French-18th century;’ but it is the only title from the entire database that is associated with this subject heading.

More interesting still is what happens to the two variants of Nehemiah How’s A narrative of the captivity of Nehemiah How. For the first on my list (ESTC Number W014008) ECCO seems to agree with its status as fictional, although its ESTC category ‘novels’ has been changed to the less contentious ‘fiction.’ Was someone working for Gale more astute in their reading of eighteenth-century narrative form? Human interpretation in the database is also evident when it comes to the other variant (ESTC number W34168), which looks to have been added later since it appears in ECCO Part II. Notably any tags formally declaring its fictionality have gone: in the ESTC it is replaced with the more precise genre tag of ‘captivity narrative.’ However, in the ECCO even this slight hint of narrative is ignored, and instead opts to follow ESTC’s more historical-sounding subject tag of ‘Indian captivities.’

More anomalies could be found (help yourself!) but these few examples are intriguing. How this metadata has been assigned seems to have been the result of a tangled history of cataloguing and bibliography, machines and human agency, and the messy process of translation between academic projects and commercial digital publishing. It’s a warning – just in case we need another – about how we use the meta-data available to us via resources like ECCO, EEBO and the ESTC. While invaluable, careful use also requires knowledge about the historical processes behind the creation of these databases. We might also say that human database bibliographers faced the same problems of interpreting and categorising the eighteenth-century novel as literary scholars do, and as critics in the eighteenth century clearly did. So one more thing: it’s easy to forget that behind the search interface on your computer screen, that black box of the database, what we are looking at is evidence everywhere of human intelligence, diligence, error, and above all, interpretation.



[1] Characteristically, ECCO returns slightly different numbers even when the same search is repeated. See Joseph Dane, What is a Book? The Study of Early Printed Books (University of Notre Dame Press, 2012), pp.224-7.

[2] In this essay I make no claim for a comprehensive list of fiction published in 1748 or even to define what fiction is or was. For example, Jerry Beasley’s Check List of Prose Fiction Published in England 1740-1749 (University Press Virginia, 1972), might also be a good place to start. But would we want to include, for example, chapbooks as fiction? Quite possibly, but neither Beasley’s checklist nor ECCO includes them, and the ESTC’s coverage of this genre is unclear.

[3] Thanks to Ben Pauley; also to Scott Gibbons, Giles Bergel, and Elizabeth Grumbach for helpful conversations.

[4] See Laura Mandell, ‘The Business of Digital Humanities: Capitalism and Enlightenment’, Scholarly and Research Communication, 6.4 (2015).


[6] Bonnie Mak, ‘Archeology of a Digitization.’ Pre-print, pp.10-11.

[7] For the Library of Congress subject headings and genre terms see


[9] As well as a number of texts which do not exist in ECCO at all.