Justin du Coeur (jducoeur) wrote,
Justin du Coeur
jducoeur

Short (well, mostly) Takes

Books: Today's interesting item on Project Gutenberg: Old Cookery Books and Ancient Cuisine (1902), by William Carew Hazlitt. Largely about rather than of period cookbooks: there is only one original source transcribed herein, and it's post-period (1730s). Still, an interesting-looking if scattershot discussion of the history of cooking and eating in England.

Politics: editrx pointed out that the contractors controlling soldiers' access to the Internet in Iraq have just clamped down on it, in perhaps the most suspicious piece of timing I've seen to date. The more I look at the government, the contractors and the way the war is going, the more it looks like a situation that is spinning utterly out of control, with the people in charge desperately trying to cover it up by any means necessary. Creepy as hell -- we're living through one of those periods that is going to have an entire chapter in the history books a century from now, and I have no idea how close to the end of that chapter we are...

Progress: Believe it or not, I am making progress on my half of Tabula Rasa III. I think I have enough character archetypes, plenty of major factions, and a good fraction of the plots. Time to start boiling it down: combining archetypes to make fully-rounded characters, drawing lines on the web and seeing what plots fall out of it. It's still a long ways off, but it's beginning to look like a game.

Facsimiles: In conversation with msmemory yesterday, it occurred to me that it's now terribly easy for me to produce PDFs from scanned images. Hmm. I think it's time for me to scan all of those period books that I have in public-domain facsimile (mostly off of microfilm), so that I don't depend on the paper copies. If I do this, I'll post as I go, and can probably give copies of the files to folks who want them.

More Scanning: Related to that: it's wonderful to have the right toys for the job. Twice this month, I've tried to OCR books in using the software provided with my scanner, and neither has gone all that well. First, I was scanning in Spavento's "Brags" for ladysprite (she's translating them, so that I Sebastiani can have better period source material to work from) -- the Italian text scanned in okay, but the formatting got utterly scragged. Then I tried to OCR The Description of Pembrokshire, a 19th-century transcription of a period source containing a long description of Knappan (essentially hurley) -- that was a complete disaster, with the results completely garbaged.

Anyway, I finally knuckled under and bought a copy of ABBYY Finereader Pro 5.0. This is an old and obsolete (but decently cheap) version of what seems to be the state of the art OCR software -- specifically, it's the program recommended by the Distributed Proofreaders project, and the version that they specifically say is good enough.

I'm pretty impressed. The recognition right out of the box is a good deal better than the HP stuff. It has recognition built in for a *zillion* languages (and spelling checking for many of them, although sadly not Latin). Best of all, it has a built-in training mode, so that you can teach it how to cope with fancy or obsolete orthography. It's still not perfect -- the combination of the faded text of Pembrokshire and the period spelling results in it usually seeing "e" as "c" (since the crossbars have mostly faded, and it can't use the dictionary to figure that out). But I've taught it to deal with the tall period "s", which is cool in and of itself, and I've got the accuracy of the period text up to around 95%: pretty impressive, given that it's basically flying without a dictionary. (The accuracy in the modern-English footnotes is far higher: almost perfect, despite the small and faded print.) And the "spelling check" mode is really quite a nice side-by-side editor for fixing the transcription errors.

I'm pleased enough that I might yet upgrade to a current version of the software -- while my copy of 5.0 is "new" and legal, I'm fairly sure that it's been sitting in a retailer storeroom for several years, and my shareware instincts encourage me to actually give some money to the creators of this excellent piece of software. Need to figure out how much more an upgrade would cost...
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 7 comments