Monthly Archives: August 2017

Finding ECCO-TCP texts

This is a list of the places you can access the corpus of texts from the Eighteenth-Century Collections Online Text Creation Partnership (ECCO-TCP).

[updated 8th October 2019]

[Image “Magnifying Glass” by nathanmac87 is licensed under CC BY 2.0]

A quick bit of history. The Text Creation Partnership started, in 1999, as a collaboration between the university libraries of Michigan and Oxford, the Council on Library and Information Resources, and the publisher of Early English Books Online, Proquest. The aim was to create high quality ‘standardized, digitally-encoded electronic text editions’ starting with 25,000 titles from Early English Books Online.

In 2005 the project expanded to include Gale-Cengage’s Eighteenth-Century Collections Online (as well as Evans Early American Imprints by Newsbank). However, while the EEBO-TCP project flourishes (with around 40,000 texts transcribed so far), the work on ECCO-TCP stagnated at around 2,000 texts. As well as the main partner institutions of Michigan and Oxford that offer access to the ECCO-TCP corpus, there have been a variety of spin-off projects. The result is a rather confusing jumble of access to the eighteenth-century TCP texts, so I’ve listed them below, with a few comments. [1]

Via ECCO-TCP page at the Text Creation Parnership main site

From Text Creation Partnership main page there’s a ‘About the texts’ tab, under whch you can select ECCO-TCP. Here there are a variety of links that enable you to search the corpus, view the full text of a title, and/or download files of the texts. The links are available below, with some comments.

University of Michigan library. A useful variety of ways of searching the corpus. You are able to view the full text of individual works online.

ARTFL. A collaborative project between University of Chicago and the French Government to provide access to a variety of digitized resources. This search engine enables some very useful types of search, including KWIC views, terms by frequency and/or year, as well as being able to view the texts online (I haven’t had much time to seriously play with this yet):

‘Download the original SGML/XML files’ (in several batches) available to download as zip files

The Oxford Text Archive. A gateway to various TCP-transcribed  collections, sorted by date range, orginating collection (e.g. EEBO, ECCO, Evans), and even subject. IMHO, it’s great if you’re looking to find and download individual works since they are available in a useful variety of file formats: HTML, XML, ePub, mobi (for Kindle), and plain text (UTF-8).  [URL updated 8 Oct. 2019]

Corpus of Late Modern Early English Medical Texts, a project run by Helsinki that uses ECCO-TCP files (though the title of the Heksinki project is the ‘Corpus of Early English Medical Texts’).

Other Places!

18thConnect Offers a searchable catalogue of TCP texts. Each title includes a link to give free access to facsimile page images of the texts via Eighteenth-Century Collections Online.The website also allows users to curate online collections of texts via its ‘exhibit’ builder.

Early Modern OCR project (eMOP). Matt Christy has uploaded to GitHub 2,188 individual plain text files. Plus, in ‘file finder’ search for ‘ECCOTCPcombined’ and you’ll find these combined and split into three tranches each available as a file to download.

John Levin has curated a very large set of TCP corpora on his GitHub pages, including ECCO-TCP in either SGML or XML:

John has also created a list of 2,188 ECCO-TCP titles on an open Zotero group:

Visualizing Early Print is a recent project to better enable the large-scale analysis of Early Modern texts and is a collaboration between University of Wisconsin-Madison, the University of Strathclyde, and the Folger Shakespeare Library. It makes available a ‘corpus of plain text files extracted from ECCO-TCP XML files, offered in both standardized spelling and original spelling versions.’ Each zip file includes a metadata file (csv). The site also includes some exciting tools.

Also, a single file of all 2,188 texts (plain text) combined can be downloaded from my own Google Drive (it will warn you that it cannot preview the file or scan it for viruses).

[1] Numbers: TCP and Visualising Early Print cite 2,473 texts; the GitHub repository (and lists derived from that) cite 2,188 texts. I have yet to find out why the number if texts are different.