Improving ECCO part 2

Part of the excitement is the further option to create – and be credited as editor of – an entire text from your corrected OCR text. Gale’s release of the texts though 18thConnect to be corrected by TypeWright aims to have those texts re-imported in Gale’s database. But it seems Gale is also offering the chance for those corrected texts to be published either (possibly via 18thConnect or at least peer-reviewed by them) as digital editions or via Gale as a print text.

Now this is the odd point – what does Gale get out of releasing into the wilds of the open-access world its texts? ECCO isn’t cheap and a number of universities have spent a considerable amount of money for it; even JISC’s one-stop interface for both EEBO and ECCO isn’t much cheaper. Gale’s income would presumably suffer. One might be tempted to think that both of those moves to wider access suggest Gale’s anxiety over the continuing authority of ECCO (with its old OCR software, its reliance on microfilmed texts and small images) and the sustainability of this kind of database publishing model. One need only look at databases such as London Lives, or the William Godwin’s Diaries or the Digital Miscellenies Index to see where digital resources are going. It looks as if Gale is trying to maintain ECCO’s relevance by opening it up to wider access, paradoxically undermining potential income. Perhaps they figure that the market for ECCO is saturated and that there is nothing more to loose: they would reap the kudos from keeping up with the general thrust of more recent digital resources towards open access (there’s probably a buzzier-sounding phrase than that, I’m sure). As for those texts that would be released for publication outside of ECCO, they might figure that this would amount to only selected areas or authors and that the vast majority of texts on ECCO (non-canonical and found only through specialist searching) would be unaffected and so would continue to be the USP of ECCO.

Interesting times.

Improving ECCO

I’m quite excited by a recent development that aims to improve the OCR-generated texts of ECCO.

As my colleague Ian Gadd pointed out (in an as yet unpublished note) searching ECCO is a hit and miss affair, especially since the orginal OCR-generated text often failed to read the ‘long s’ correctly. The weakness of ECCO’s OCR text was also, inadvertantly, revealed by JISC’s latest project ‘Historic Books’. This is a single interface for both EEBO and ECCO (and soon some 19thC collections) and can give users direct acces to the OCR text of ECCO – and it’s easy to see that this is often in poor shape. Sayre Greenfield has helpfully noted some shortcuts to help get around this in a piece for Early Modern Online Bibliographies here.

More radically, Gale have teamed up with the good people of 18thConnect to offer users the chance to use TypeWright software to correct the orginal OCR text line-by-line. I’m a heavy user of ECCO and always try to get my students to use it as much as possible too. I have an un-researched hunch that searching is of increasing importance as readers – in parallel with or in some cases replacing ‘normal’ linear reading – navigate and research etexts with key terms, looking for images, tropes, themes, specific turns of phrase or peculiarities of languge. So I’m quite excited by the possibility that searching on ECCO may eventually be much more reliable.