Part of the excitement is the further option to create – and be credited as editor of – an entire text from your corrected OCR text. Gale’s release of the texts though 18thConnect to be corrected by TypeWright aims to have those texts re-imported in Gale’s database. But it seems Gale is also offering the chance for those corrected texts to be published either (possibly via 18thConnect or at least peer-reviewed by them) as digital editions or via Gale as a print text.
Now this is the odd point – what does Gale get out of releasing into the wilds of the open-access world its texts? ECCO isn’t cheap and a number of universities have spent a considerable amount of money for it; even JISC’s one-stop interface for both EEBO and ECCO isn’t much cheaper. Gale’s income would presumably suffer. One might be tempted to think that both of those moves to wider access suggest Gale’s anxiety over the continuing authority of ECCO (with its old OCR software, its reliance on microfilmed texts and small images) and the sustainability of this kind of database publishing model. One need only look at databases such as London Lives, or the William Godwin’s Diaries or the Digital Miscellenies Index to see where digital resources are going. It looks as if Gale is trying to maintain ECCO’s relevance by opening it up to wider access, paradoxically undermining potential income. Perhaps they figure that the market for ECCO is saturated and that there is nothing more to loose: they would reap the kudos from keeping up with the general thrust of more recent digital resources towards open access (there’s probably a buzzier-sounding phrase than that, I’m sure). As for those texts that would be released for publication outside of ECCO, they might figure that this would amount to only selected areas or authors and that the vast majority of texts on ECCO (non-canonical and found only through specialist searching) would be unaffected and so would continue to be the USP of ECCO.
1 thought on “Improving ECCO part 2”
I think Gale’s position is an interesting one. I’m not privy to their thinking, certainly, but things I’ve heard suggest that Google Books caught them a bit off guard—and understandably so, since what Google did would have made no sense for any other company. To date, Google doesn’t offer ads against book searching. The explanation I’ve heard is that they feel that book search pays dividends for them in improving their other search services, so they’re willing to forego direct income from book search. But that means that the profit/loss calculations for Google Books must be really weird—not the kind of thing most businesses could contemplate. (I’m not sure how Google’s relatively recent foray into e-book sales change those calculations.)
If the struggle (for lack of a better word) between ECCO and Google were to hinge solely on who can algorithmically eke out better OCR, I think the advantage would doubtless be with Google—they simply have more resources to throw at the problem (financial, technical, and computational). By embracing an opportunity for getting human-corrected text, Gale doesn’t have to play that game. If the 18thConnect project can deliver Gale (practically) unimpeachable text for searching without costing them actual money (as opposed to notional lost subscription fees), then that might be worthwhile.
The other thing I suspect Gale is counting on, though (again, just a hunch) is the fetishization of the page image (even when it’s a scan of a mediocre black-and-white microfilm made 40 or more years ago). At some level, it may not matter if the texts of the books seeps out of ECCO, so long as people desire the page images, as well: go ahead and make the text open access, the page images will maintain a kind of aura of authenticity and authority, and only Gale will have the page images and the high-quality searchable text that users provided them (for free). I could be badly wrong about this, but my hunch is that Gale wouldn’t feel too threatened by the availability of isolated digital editions (minus page images) so long as ECCO remained the largest source in aggregate of eighteenth-century texts. How 18thConnect might alter that landscape in its role as an aggregator of digital editions is, I think, a really interesting unknown.