Recently, when Google announced it was getting into the newspaper digitization business, many of us digitizing newspapers already took note. And who wouldn't? It's Google: they do a lot of great things and they've got a lot of money to do more. They've tried their hand at books so it's only natural that newspapers should follow. Their announcement wasn't unexpected.
(Google sample newspaper page)
Nevertheless, it gave us pause to consider the impact(s) this might have on our own digitization efforts in several key areas:
- the long-term preservation of the digital data....quality imaging not withstanding (see below)
- for those of us funded by grants - our livelihoods
- and, most importantly, title selections
I can't imagine Google would have financial worries for maintaining the enormous amount of data these newspapers generate. Even if they save their master files in a compressed format like JPG, JP2 or, God forbid some lesser format, they're still faced with loads of material to save in perpetuity. Choosing the right format and thinking in forever terms are but two issues involved with digital preservation, all of which are beyond the scope of this posting.
As to our livelihoods - between Google and the current economic collapse/crisis, it feels kind of silly to even talk about. Let's just be thankful to have a jobs and leave it at that for now.
But title selection is a different animal altogether. If you're an NDNP awardee, as we are here at the University of Kentucky, then you're bound by the NEH rules. Of particular importance here is the fact that we cannot digitize titles that have been digitized by another entity, whether it's a commercial entity or someone like Google who may make them freely available.
Some argue that there's plenty to go around, and that's a reasonable enough argument. There are millions, if not billions, of historic newspaper pages waiting to be digitized. So, yes, there's plenty to do in that respect. But what happens to "collections"? What happens to their preservation? And who is responsible for those two things?
Picture this: what would you think if you, as a researcher - professional or layperson - landed upon a website that had tons of newspaper pages only to find that just a few newspaper titles are available? Would you feel cheated? Would you feel like you've wasted your time because, now, you have to keep looking for what you need? Or would you feel satisfied?
Take Chronicling America or our own Kentuckiana Digital Library...How strange would it be to look at Kentucky's newspapers at the end of NDNP's 20 year cycle to find we have every historic Kentucky newspaper except Louisa's Big Sandy News or the Kentucky Reporter, for instance? Wouldn't it seem odd for the University of Kentucky - the state's flagship University and Kentucky's sole NDNP content provider - to have everything except those two titles? Would you feel cheated? Would you feel like you've wasted your time because, now, you have to keep looking for what you need? Or would you feel satisfied?
And what would we say, as an arbitor for the state's historic collections and digital preservation, to those newspapers who may have opted to have their titles digitized by Google or some other outfit instead of UK when their stuff comes up missing, corrupt, distorted, or otherwise unusable? "Since you didn't let us preserve the material it's just lost. Sorry about your luck, Mr. Publisher."
In fact, it's not the publisher who stands to lose, but all of us - Kentuckian, American, Global citizens alike. Newspapers are a shared history and should be free to everyone. Further, it seems childish to want anything but the best preservation standards applied to every single page, no matter what your role may be. After all, who are we making this stuff for if not our children, or our children's children? Is it simply to glorify ourselves or is it really because this stuff matters?
I'd like to think it's the latter.