Category Archives: ECCO-TCP

Finding ECCO-TCP texts

This is a list of the places you can access the corpus of texts from the Eighteenth-Century Collections Online Text Creation Partnership (ECCO-TCP).

[Image “Magnifying Glass” by nathanmac87 is licensed under CC BY 2.0]

A quick bit of history. The Text Creation Partnership started, in 1999, as a collaboration between the university libraries of Michigan and Oxford, the Council on Library and Information Resources, and the publisher of Early English Books Online, Proquest. The aim was to create high quality ‘standardized, digitally-encoded electronic text editions’ starting with 25,000 titles from Early English Books Online.

In 2005 the project expanded to include Gale-Cengage’s Eighteenth-Century Collections Online (as well as Evans Early American Imprints by Newsbank). However, while the EEBO-TCP project flourishes (with around 40,000 texts transcribed so far), the work on ECCO-TCP stagnated at around 2,000 texts. As well as the main partner institutions of Michigan and Oxford that offer access to the ECCO-TCP corpus, there have been a variety of spin-off projects. The result is a rather confusing jumble of access to the eighteenth-century TCP texts, so I’ve listed them below, with a few comments. [1]

Via ECCO-TCP main page

From the ECCO-TCP main page, there are a variety of links that enable you to search the corpus, view the full text of a title, and/or download files of the texts. The links are available below, with some comments.

University of Michigan library. A useful variety of ways of searching the corpus. You are able to view the full text of individual works online.

ARTFL. A collaborative project between University of Chicago and the French Government to provide access to a variety of digitized resources. This search engine enables some very useful types of search, including KWIC views, terms by frequency and/or year, as well as being able to view the texts online (I haven’t had much time to seriously play with this yet):

‘Download the original SGML/XML files’ (in several batches) available to download as zip files

The Oxford Text Archive. There are in fact two catalogues that give you access (searchable by author, title word or genre). The ECCO-TCP site links to the first one, which includes material from a variety of digital transcription projects, as well as the TCP, and so although it is largely made up of pre-1800 material, there are a smattering of texts from a wider chronological range. IMHO, it’s great if you’re looking to find and download individual works since they are available in a useful variety of file formats: HTML, XML, ePub, mobi (for Kindle), and plain text (UTF-8).

The second, newer, catalogue is just for TCP corpora (ECCO, EEBO, and Evans). Individual works are able to be viewed online (‘web’), or a file downloaded as either an ePub or XML (‘source’).

‘Download plain text files … from DataHub.’ These files no longer seem to be available at this source.

Bibliographical information … available as Open Linked Data.’ This site doesn’t seem to exist anymore.

Other Places!

18thConnect Offers a searchable catalogue of TCP texts. Each title includes a link to give free access to facsimile page images of the texts via Eighteenth-Century Collections Online.The website also allows users to curate online collections of texts via its ‘exhibit’ builder.

Early Modern OCR project (eMOP). Matt Christy has uploaded to GitHub 2,188 individual plain text files. Plus, in ‘file finder’ search for ‘ECCOTCPcombined’ and you’ll find these combined and split into three tranches each available as a file to download.

John Levin has curated a very large set of TCP corpora on his GitHub pages, including ECCO-TCP in either SGML or XML:

John has also created a list of 2,188 ECCO-TCP titles on an open Zotero group:

Visualizing Early Print is a recent project to better enable the large-scale analysis of Early Modern texts and is a collaboration between University of Wisconsin-Madison, the University of Strathclyde, and the Folger Shakespeare Library. It makes available a ‘corpus of plain text files extracted from ECCO-TCP XML files, offered in both standardized spelling and original spelling versions.’ Each zip file includes a metadata file (csv). The site also includes some exciting tools.

Also, a single file of all 2,188 texts (plain text) combined can be downloaded from my own Google Drive (it will warn you that it cannot preview the file or scan it for viruses).

[1] I have yet to find out why the number of texts vary. TCP website cites 2,231; the GitHub repository and the Zotero list has 2,188; 18thConnect lists 2,172; Visualising Early Print cites 2,473.