Improving ECCO

I’m quite excited by a recent development that aims to improve the OCR-generated texts of ECCO.

As my colleague Ian Gadd pointed out (in an as yet unpublished note) searching ECCO is a hit and miss affair, especially since the orginal OCR-generated text often failed to read the ‘long s’ correctly. The weakness of ECCO’s OCR text was also, inadvertantly, revealed by JISC’s latest project ‘Historic Books’. This is a single interface for both EEBO and ECCO (and soon some 19thC collections) and can give users direct acces to the OCR text of ECCO – and it’s easy to see that this is often in poor shape. Sayre Greenfield has helpfully noted some shortcuts to help get around this in a piece for Early Modern Online Bibliographies here.

More radically, Gale have teamed up with the good people of 18thConnect to offer users the chance to use TypeWright software to correct the orginal OCR text line-by-line. I’m a heavy user of ECCO and always try to get my students to use it as much as possible too. I have an un-researched hunch that searching is of increasing importance as readers – in parallel with or in some cases replacing ‘normal’ linear reading – navigate and research etexts with key terms, looking for images, tropes, themes, specific turns of phrase or peculiarities of languge. So I’m quite excited by the possibility that searching on ECCO may eventually be much more reliable.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s