Log in

No account? Create an account
Previous Entry Share Next Entry
Extensible enumerations
(This one's just for the programmers, but it's not as opaque a topic as usual.)

So in the course of thinking about my last post (about Querki), I happened to look at my desired feature list. And I was reminded of one of those bugbears of most programming languages: enumerations.

Here's the thing: what I want, much of the time, is the ability for subclasses to *extend* enumerations. The most common use case is return codes. I might have a base class that implements the concept of return codes, and includes an enumeration of the basic codes for success, warning, failure -- generic stuff. As I add subclasses with more precise functionality, I want those subclasses to be able to add more-precise return codes, that describe the results in more detail.

But I can't do that, at least not safely, in any language that I'm thinking of offhand. The thing is, I wind up with two possibilities. Either the return code is something like a raw integer under the hood, so it's totally unsafe and not really type-checked -- this allows for risky code. Or the enumeration *is* strongly typed -- in which case, I can't add values. Why not? Because adding values is effectively *extending* the type, rather than *specializing* it, and that's a no-no in any decent OO system. It implies that a subclass could add values that don't exist in the parent, so code written for the parent doesn't necessarily work with the subclass -- a violation of the basic principles of OO.

When I think about it, though, it occurs to me that it's easy to fix this: just redefine my terms. What I really want is the idea of *hierarchical* enumerations. That is, the base class defines "success" and "failure", but no details. But this is precisely what OO is good at: what I want is to think of those success and failure values as *classes*, which I can then subclass. So I want to be able to define a "subclass" of the enumeration that adds specializations to the existing values. So instead of tacking "Null pointer" or "Not authorized" to the end of my enumeration, I should be thinking of them as specialized values of "Failure". That follows good OO principles, and seems like it would work well.

Does anyone know any languages that formally do this? I *can* do it in Scala decently well using case classes, but that's not quite optimal: I'd prefer a syntactic construct that is more explicit about this notion of a hierarchical, extensible enumeration. But I don't think I've ever come across one.

Anyway, I need to chew on this further. It's directly relevant to Querki, because I want to add a formal concept of enumerations there, and it needs to be exceptionally rich in order to cope with the use cases. (Since Querki is going to be used for world-building, I need a lot of schema flexibility.) I may well apply this concept there...

This was originally posted via DreamWidth, at http://jducoeur.dreamwidth.org/975.html. Feel free to comment either here or there.

  • 1
(Deleted comment)
That's what I'm thinking of doing if I add this to Adder. I'd have a syntax like:

(enum Fred Success Failure)

(enum (Barney Fred) (Created Success) (Fetched Success) (DoesNotExist Failure) (NotPermitted Failure))

...which would create classes named Fred, Barney, Success, Failure, Created, Fetched, DoesNotExist, and NotPermitted.

The interesting bit is that the enum values can actually be the classes, rather than instances. That comports with the intuition that enums should be cheap, and should be comparable for identity; and it allows a simple isinstance check to determine whether a value is a Success or a Failure.

(Deleted comment)
Thanks. It's obviously only for languages where classes are first-class values, like Python and Java; I wouldn't try it in C++.

Of course, in C++, there wouldn't be the need, because objects aren't always heap-allocated, it's easy to define operator==(), and it'd be possible to write something at least as fast as isinstance (the Python reason for using isinstance is that it should be faster, since it's implemented in C).

Good observation. That distinction hadn't occurred to me, mostly because I was thinking about it in the context of Querki -- which is prototype-based OO, so there's little distinction between class and instance.

(I do sometimes worry about that decision, but prototype-OO tends to be optimal for representing extremely complex semantics, so it suits worldbuilding really, really well.)

Oh, yeah, prototypes are good for worldbuilding. If you have an operator to test whether object A descends from object B, you can use it the same way I was thinking of using classes.

In fact, the class approach takes advantage of the fact that there's a prototype mechanism lying around; it's just that the objects are called classes.

Well, I'm just using return codes as an example, so I don't want to get *too* mired in those details. But there are enough domains in which you don't want to rely on Exceptions for it to be an interesting example. Among other things, while Exceptions tend to be available, they are *ferociously* expensive in many implementations -- enough so that you really want to reserve them for genuinely exceptional situations, not any routine ones.

But yes: that sort of sub-classing is exactly the case class model I was describing above. (Scala does this much better than Java does.) My issue is simply that the hierarchy is rather indirect and opaque -- it lacks the concise clarity of an enumeration. So what I'm pondering here is the idea of a middle ground: something that has roughly the semantics of a class tree, but is deliberately simpler. Scala case classes *may* actually be the best you can get, but I'm not assuming that...

I've added it to Adder; here's a sample, which defines an enum for HTTP status codes...which suggests that it might be a good idea for enums to have associated values, in this case the integer status codes.

which suggests that it might be a good idea for enums to have associated values, in this case the integer status codes.


I like this. This sort of thing is really why I created Adder: to be able to use macros to add completely new language features. This took fewer than 100 lines of code, and all of it in the prelude, not the core compiler.

Yeah, when you follow that logic through, I think you wind up with a desire for parameterized enumerations, which have more or less precisely the same semantics as Scala's case classes. I still want better syntax (and I think the Adder implementation looks closer to what I was thinking than anything I've seen heretofore), but I'm amused to find that Scala has all the semantics I was looking for already...

  • 1