Justin du Coeur (jducoeur) wrote,
Justin du Coeur
jducoeur

Please ignore the sound of that exploding head...

So as I've mentioned before, my current programming project is the OP Compiler: taking the existing Order of Precedence and taming it with code, so that it can get fed into a nice, neat, vastly easier-to-maintain database going forward. I figured it would be a meaty but reasonably straightforward project -- after all, Caitlin was inhumanly good with data, and so the old HTML files should be at least *reasonably* consistent, right?

I am beginning to realize that my assumptions were incorrect. Caitlin *was* fabulous with data, and the flat files are perhaps more consistant than any other person could possibly have managed. But even she was human, and dealing with data from a zillion sources, with nothing automated checking the details.

So now I'm up to the point where I am successfully "compiling" a fair number of chronological court-report files (around the past eight years' worth), and all of "A" in the alphas, and I'm finding just how impossible the job had been. Everything *looks* great, and I don't think one person in 100 would catch more than a tiny number of errors. But besides the structural irregularities that I've been pulling my hair out over (mind, those huge files are completely hand-edited HTML, and the format isn't even remotely as consistent as it looks on the screen), it turns out that there are tons of *tiny* data bugs.

Let's just take the King, for example. Now that I'm actually able to print out what the compiler thinks is going on, I find that "Kenric Burn of Northampton" has a Valiant Tyger; "Kenric of Warwick" was a Rattan Champion, has a King's Cypher, and was named Crown Prince; and "Kenrick of Warwick" was Queen's Champion a couple of times, and got the Shield of Chivalry a couple of times. And when I get to parsing the K's, I'm going to have to rewrite his entry so that it cross-references all of these properly.

Mind, none of this is to fault Caitlin -- by and large, she was typing in what she was given by the heralds, and I'm 99% certain that she screened out 90% of the errors that were handed to her. But it is all demonstrating that the job of Shepherd's Crook really *is* impossible to do by hand, and it's miraculous that she managed to make it work as well as she did for as long as she did.

Anyway, the end result of all of this is going to be quite a substantial chunk of code. I am likely to open-source it, more as a way for me to kick the tires of Github than because I expect it to ever be used a second time (yes, I'm spending a solid two months writing a program that will, in the end, be run exactly once). But if anyone wants to see a medium-sized body of decently structured and not *excessively* cryptic Scala code, just pipe up and I'll be happy to point you at it and discuss what's going on in it...
Tags: jane, programming, sca
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 14 comments