Previous Entry Share Next Entry
Typeclasses
device
jducoeur
[For programmers only]

Yesterday was another of those programming epiphanies, in which I finally not only understand what a functional-programming concept means, but (more importantly) grok *why* it works the way it does. As I did with Monads a few months ago, let's take a stab at explaining the concept, while it's still fresh in my brain. Today's concept is "Typeclasses", another Haskellism that has made its way into Scala. (The functional programmers with more experience than I are encouraged to correct any misconceptions below.)

That said, my first try at writing this started getting horribly long. So let's see if I can keep this to a sane length:

The name seems a bit weird to those of us from an OO background, but it's literally correct: a Type is (or more accurately, can have) an "instance" of a Typeclass in much the same way that an Object is an instance of a Class. That is, a Typeclass describes a somewhat abstract bit of functionality, and you can define an instance that maps from a Type to that functionality.

Why is it useful? The usual reason: once you've got the idea of this Typeclass, you can then write higher-order functions that use it, and thereby reduce duplication and boilerplate.

A simple example in pseudo-code: say I have an "Addable" Typeclass:
typeclass Addable[T] {
  def +(a:T, b:T):T
}
That is, an Addable is a Type for which there exists a binary function "+", which adds two values of this Type together, and returns another value of this Type. Using that, I can define sum:
def sum(elems:List[Addable]) = {
  if (elems.length == 0)
    0
  else
    elems.head + sum(elems.tail)
}
At this point, the sharp-eyed OO programmers are saying, "So, it's exactly the same as an interface, right?". Not quite. The thing is, a Typeclass instance is, like I said, a mapping from the Type to the Typeclass -- and that implies that you can have *multiple* such mappings.

The classic illustration seems to be the Monoid Typeclass, which seems to be basically the abstraction of our Addable above. It is a Typeclass that defines an "append" operation, which takes two values, combines them, and produces a new value of the same Typeclass instance. (With the added constraint that the operation must be associative, and you must have an "identity" or "zero" value.) So Monoid looks roughly like this:
typeclass Monoid[T] {
  def append(a:T, b:T):T
  def identity:T
}
The equivalent of our sum() function above is "concat" in Haskell-speak:
def concat(elems:List[Monoid]) = {
  if (elems.length == 0)
    identity
  else
    elems.head append concat(elems.tail)
}
Obviously, we can define an SumMonoid -- the Typeclass instance that represents the concept of "summing" -- by saying that append is (a+b), and identity is 0; that becomes exactly Addable, above. But you can also define a ProductMonoid -- representing the concept of "multiplying" -- by saying that append is (a*b), and identity is 1. There are at least two distinct and *separate* Monoids for Int, because "Monoid" isn't a Type -- it's a Type *plus* some operations.

(The same is true for Boolean and Monoids: the concept of the "conjunction" of a List of Booleans is the Monoid of "and" with the identity of true, and "disjunction" is the Monoid of "or" with the identity of false.)

In other words, Monoid is (roughly) the abstract concept of "an operation that you can perform on a couple of Foos, and get another Foo". That's a Typeclass.

How much does this matter in real-world programming? I'm honestly not certain yet, but I'm slowly starting to appreciate that it helps you separate concerns, even without the multiple-definition thing. It's very un-OO, but thinking of a Typeclass instance as being something separate from the Type itself allows you to add functionality without endlessly complicating your inheritance trees -- it's not *part* of your class, it's a kind of thing that you can do *with* your class. (Or course, carried to its extreme, this logic gets very anti-OO; I suspect it helps explain the visceral functional-vs-OO wars I sometimes see.)

And of course, I wind up asking: is any of this going to get into Querki? I suspect so eventually, although not soon. I am likely to add the concept of Typeclasses internally at first, for some of the numeric operations (eg, I might actually implement sum() in terms of Monoids, as described above, to prove out the concept). Eventually it'll likely become user-visible, although only around the edges for high-end power-user use.

Questions? The above is largely me getting the ideas straight in my head, so discussion is welcome...

  • 1
I work with SQL on a regular basis, which brings up the following question: Are the elements nullable? How would that affect your "Haskell-speak" declarations for the operations? I assume it would just be an expansion on the if-then block structure, but you would first need a good concept of your rules for how each of the instances would handle that case. (Seems like an obvious statement, but misunderstanding of handling of nullable values in expressions in SQL has often caused us grief.)



Are the elements nullable?

There's no general answer. Nullability is basically a quality of a Type, and modern programming languages are beginning to be more explicit about that.

For example, "null" is basically a dirty word in Scala -- it's built on the JVM, so it's impossible to completely avoid null pointers, but as a rule of thumb the word should never show up in well-formed, self-contained Scala. Instead, if you want your variable x of type Foo to be nullable, you declare it as Option[Foo]. This means that x can either have the value Some(someFoo), or None. (Which is conceptually a strongly-typed version of null.) In other words, your Type isn't Foo in this case -- it's Option[Foo], a higher-level concept.

The details vary from language to language. I don't think Haskell even *has* a concept of raw "null" (although it has a version of Option). Even C# now has a concept along the lines of Option, IIRC, although it's been a few years since I last played with that. Even SQL is usually explicit about whether nullability is allowed or not, although too many amateur DBAs don't really think it through. Basically, everyone has begun to realize that you need to take nullability seriously, and should be explicit about whether it is allowed or not.

(Querki is *exceptionally* strict about this: you have to explicitly declare "how many" values you can have in a given Property: Exactly One, Optional, List or Set. The concept of "null" doesn't exist -- there is only the concept of "empty collection". If it is declared as Exactly One, it can never be empty, by definition.)

So basically, the elements are nullable if and only if you declare the Type to have nullable elements. The implication is that, if you wanted to be able to have "nulls" in the list, your function signature would be def concat(elems:List[Option[Monoid]]), and yes -- you'd need to handle that inside your blocks.

(If this was Scala, the compiler would complain at you if you didn't do so: the reason to use Option instead of raw nulls is that it allows the compiler to detect such cases, and give you warnings if you're not handling None.)

In practice, though, List[Option[X]] almost never happens -- it just isn't something you want except in rare circumstances. Instead, if your inputs are Option[X], you would usually do a filter(_.isNonEmpty) or something like that first, to strip out the empty values...

It is, perhaps, a question of semantics and the same word being used for multiple concepts.

Yes, SQL is usually explicit about whether a value is required or not (NULL is effectively no value, rather than a null pointer as with the JVM), but not in all cases, and if you don't specify, the default for whether or not a value is required can vary, even in the same database engine depending on whether or not ANSI SQL is specified. For example, in the extensions to SQL in MS SQL Server, declared variables are nullable, meaning values are always optional, while columns in table declarations may default to required (not nullable) or optional (nullable), depending on the ANSI SQL setting.

The usual interpretation of any of the expressions that involve a NULL (no value) element is also NULL. If you're working with a Boolean interpretation of that result, the interpretation is NOT TRUE (effectively FALSE, but NOT TRUE is often used to express that the inverse of the expression is also NOT TRUE). The result of using null values in expressions is probably the source of the most problems that I have to troubleshoot, especially as the database engines that I work with vary in how they handle NULL values in expressions.

I also have to deal with the different interpretations of zero-length strings (both as literals and variables) in different database engines. For example, in Oracle 12c, for instance, a zero-length string is interpreted as a NULL value (as it formerly was for Sybase), whereas in Informix Dynamic Server 11.70 it is interpreted as a string that happens to have zero length (as it is for MS SQL Server and PostgreSQL). Testing that string, appending it onto another, storing it in a required column et al has different results because of that.

So I'm kind of sensitized to the question, even when it may not apply. ;-)

To some degree it's a DB-vs-compiler thing, but there are some crucial details here. The most important one is how you *use* these values. When I say that, in Scala, you use None instead of null, that isn't just a semantic difference -- the *access* patterns are wildly different.

In a traditional nullable variable, you simply say
var foo;
and that's initially a null value. Similarly, in a DB, you declare a column nullable, and you don't put anything into it, so it's null.

But the important thing is, how do you *use* those values? In the traditional language, there is nothing preventing me from saying
result = foo + 1;
And *kaboom*, I get a null pointer exception. In your DB case, the null "infects" the expressions that it is being used in. In both cases, the problem is that the nullability is unbounded, and there isn't anything preventing the coder from failing to take the nullity into account.

It's very different with Option. As I mentioned before, if I define my variable as:
var maybeFoo:Option[Foo]
That means that I have two options: Some(fooVal) or None. I do *not* simply have fooVal, though, and I can't use it that way! That is, saying:
result = maybeFoo + 1
will fail to compile, because maybeFoo isn't a Foo, it's Option[Foo], which doesn't know anything about this "+" operation.

Instead, I have to say (in Scala, anyway):
result = maybeFoo match {
  case Some(actualFoo) => actualFoo + 1
  case None => myDefaultValue
}
And again, if you fail to provide a "case None", the compiler will complain at you that you aren't taking all the possibilities into account.

There are ways around this (there exists no language in which it's actually hard to shoot yourself in the foot), but this is the proper idiomatic-Scala way to handle the equivalent of nulls. And the result is that None is *always* in your face: the compiler tells you if you're not dealing with it, and it's usually easier to do so properly than to work around it.

So that's the heart of the difference. "null" is essentially an empty *version* of an existing Type, and you are allowed to use it anywhere you use that Type; decades of experience has taught that that's a great way to get into trouble. Option[Foo] is *not* a version of Foo -- it's a completely different and unrelated Type, a container that wraps *around* a value of Foo that may or may not exist. That allows the compiler to be much, much stricter, and forces you to think more precisely about what you're doing.

Not that Querki actually does this, mind: the usage of Querki's Optional is, in practice, closer to those of SQL. (For many of the same reasons SQL works that way: it's a sensible semantic for DB access.) But this conversation is useful food for thought, and I'm going to have to ponder ways to make it easier to understand the sorts of problems you're describing...

Caveats: I'm way behind you in understanding Scala, and I'm only partway through Odersky's MOOC on Scala, and he hasn't covered typclasses yet: maybe he won't.

So, these are statements, but also perhaps questions.



I'm not sure that I see typeclasses quite as you do.

I see typeclasses as a way for a type to nominate itself for polymorphism of a type T while accepting certain default behaviors or code for that type T. In that sense, it is a step toward making types just as first-class as a classes and objects.

Classic OO polymorphism means that you must be a subclass of a common ancestor to be polymorphic. The addition of multiple inheritance (and later, its replacement by interfaces) added the ability of classes to be multiply polymorphic.

Scala traits add the ability to be multiply polymorphic in your class and (new!) inherit code behaviors. (And override them only on purpose).

Types fail to have that multiple polymorphism. But typeclasses move some of that multiple class polymorphism into the type structure. (And add default behaviors, 'cause Scala that's why).

Does that make sense?

It does, and it's not an unreasonable way to describe the Scala viewpoint on typeclasses. (Indeed, I think it represents the way Scala *implements* typeclasses fairly well.)

I'm not sure that it's quite how Haskell and other hardcore functional folks would describe the problem, but again, I'm fairly new to this viewpoint. I've never worried about pure-functional programming much, but as I gradually begin to realize that QL *is* a pure functional programming language (more or less by accident, but I think I'm likely to keep it that way), it behooves me to better understand the implications.

I *suspect* that Martin covers typeclasses in his followup course (which IIRC is specifically on functional programming), but I haven't done either one yet, so I'm not sure...

The follow-up class, I'm told, is largely Akka. This class is about functional programming, using his language as the class language. He makes free with references to others (at times).

I'm finding the artificiality of a class schedule to be less than helpful: my real life doesn't always give me the same amount of free time each week. So, the class is hard to schedule.

The class is challenging on a mental level: not in the sense of "I can't do this", but literally: functional programming is striking at many of the reflexive assumptions that constitute my daily tools and career. It's fun, and it's tickling bits of my brain that have been quiescent for a long time.

I doubt I'll have a chance to, professionally, take advantage of Scala, but I'd like to.

So far, I've managed a perfect score on all the programming assignments, which is nice. :-)

The follow-up class, I'm told, is largely Akka.

Ah, right -- that's "Principles of Reactive Programming". Makes sense that that would be Akka-heavy, although I wonder how much that one is led by Martin. While Scala is his baby (and nowadays he is focused on Dotty, the experimental language that will likely grow up to be Scala 3), he's always been quite arm's-length with Akka. That project was created by Jonas Boner, and is nowadays run by Roland Kuhn, so I suspect Roland is the driving force behind the second course.

I'm finding the artificiality of a class schedule to be less than helpful: my real life doesn't always give me the same amount of free time each week. So, the class is hard to schedule.

Yeah, that's part of why I haven't tried to take them. OTOH, I have the advantage that I've been following both Scala and Akka since 2007, so I am mostly just filling in the gaps these days.

(My problem isn't Scala, it's the Scalaz functional-programming library, which is still a gigantic and ill-documented ball of Dark Art. Far as I can tell, the answer to every Scalaz question is, "well, that works exactly the same way it does in Haskell, of course".)

The class is challenging on a mental level: not in the sense of "I can't do this", but literally: functional programming is striking at many of the reflexive assumptions that constitute my daily tools and career. It's fun, and it's tickling bits of my brain that have been quiescent for a long time.

Yep -- it's a paradigm shift, at least as great a mental leap as from simple imperative to OO was. By this point I'm fairly good at it, but I'm still figuring out the details, after using this paradigm increasingly heavily for half a dozen years.

I doubt I'll have a chance to, professionally, take advantage of Scala, but I'd like to.

In the near term, probably not, but you've still got a decent amount of career left, and I am still of the opinion that Scala will entrench itself as one of the major languages. Wouldn't surprise me if you find yourself doing something with it down the road.

And regardless, it's worth learning the concepts, since they are gradually spreading everywhere. C# has had enough of the machinery to do basic functional programming for a good ten years now (indeed, I taught a couple of classes on "Intro to Functional Programming in C#" at Memento), and even Java is now beginning to catch up to the rest of the world with the enhancements in Java 8. So even if you don't wind up programming in Scala per se, I think it's likely that you'll find some of the techniques helpful -- I believe the concept of "functional" will wind as ubiquitous (if as poorly-defined and variable) as "object-oriented" has become.

Do please poke at me if you have questions, or need pointers to stuff. I can't answer everything, but I know the lay of the land pretty well...

Thank you for the offer: it will not be forgotten. :-)

The Akka class, according to my co-worker who has taken it, is taught by a number of people and is bit less well organized - he said. I don't know.

Functional techniques will be everywhere, soon enough, if by soon enough you mean "a decade" or so. F# is being adopted, Ruby has functional features (I am told). I doubt Haskell will be widely adopted, but one never knows. I don't care that JavaScript is technically qualified as a functional language, but it is I guess...

As a quality assurance engineer, sometimes I code, but as often as not I don't. My particular value-add is the ability to think just like a programmer, and to be conversant enough in programming that I can find a bug "in the code", or code review as well as test. People don't tend to expect that skill set in someone with "quality" in their title, but they are happy to exploit it when they get it. :-)

Plus, keeping my brain amused, challenged and expanding is a critical part of my "this is how I have fun in life" definition. :-) Same thing, again and again, every day, is my definition of Hell.

Functional techniques will be everywhere, soon enough, if by soon enough you mean "a decade" or so.

Yep. The rollout seems to be much like OO's was: gradual and *extremely* uneven, but fairly steady. And like OO, I expect that there will always be some holdouts against it, but the definition of "mainstream" will eventually change.

Ruby has functional features (I am told)

The basic ones, certainly -- indeed, Ruby was really my introduction to functional techniques. Even as of a dozen years ago, Ruby considered functional blocks (more or less closures, IIRC) to be *the* main way you did many things, and treated them fairly casually.

I don't care that JavaScript is technically qualified as a functional language, but it is I guess

Like Ruby, it mostly has the basics of functional programming, but again, they live *deep* in the core of the language, and are actually quite crucial. Most JS OO libraries actually use the functional-programming techniques to make it into a vaguely adequate OO environment. (That is to say, most implementations of JS "classes" involve enormous, multi-level closures.) JS really only took off as a semi-serious language when Doug Crockford taught everyone the crazy things you could do with its FP capabilities.

If you do ever wind up stuck using JS for something, I recommend at least picking up the Underscore library, which adds a bunch of the table-stakes functional methods that you need for FP patterns. It doesn't make JS beautiful, but it at least reduces the suck.

Plus, keeping my brain amused, challenged and expanding is a critical part of my "this is how I have fun in life" definition.

Oh, heck yes. And I'm very explicit that I consider this sort of technical self-education to be part of the job for any kind of engineer. Most of it won't be useful today, but much of it will prove useful at *some* point, and you rarely know when. So it's well worthwhile to stay well-rounded.

(I am half-tempted to write an hour a day of random, undirected technical self-education into the job description for Querki's engineers, if and when I get to the point of hiring. It's the sort of thing that pays off in the long run, IMO...)

Around here, if you are "between projects" you are expected to use that time to self-educate. Most people around here do so all the time - but we are a consulting organization, so it's "after you get your billing done".

Part of what I like about the place is that most people take the challenge seriously.

I've lined up my next "amusement" - once the Scala class is complete, a co-worker who is very up on mobile suggested a particular Android programming book. I bought it, and it is "stacked and waiting".

Interestingly, one company I recently encountered offered a different "random time" thing. Every employee is REQUIRED to put in 3 hours a week of some kind of community service, paid by the employer.

I thought that was very exciting, actually.

Hmm -- yeah. Many places pay some lip service to the community, but that's putting their money where their mouth is, much more sincerely than the usual "everyone can have 8 hours a quarter"...

(Replying rather belatedly)

I think I kind of see the conceptual distinction you're making, but in terms of the functionality provided it still seems like you get about the same thing with interfaces using generics and inheritance. E.g.
public interface Monoid<T> {
  T append(T a, T b);
  T identity();
}
public interface Addable<T> extends Monoid<T> {}

Or for something that uses our typeclass/interface directly (though admittedly I haven't thought much about possible categories of typeclass yet, so I don't know if this is conceptually that useful):
public interface Composable<T extends Composable<T,R>,R> {
  T compose(T a, T b);
  R apply(T a, R b);
}
public interface Filter<R> extends Composable<Filter<R>, R> {}

And this gives us a huge amount of flexibility in terms of chaining such definitions - languages with interfaces generally allow them to multiple-inherit as well, so Filter could be Composable and Applicable if we wanted those to be separate typeclasses.

Now, admittedly, Filter is not considered to be an instance of the Composable typeclass, but rather a child typeclass... but since that allows us to have an arbitrary number of levels of ancestry, I'm not sure that's a bad thing.

Am I missing something? - the concept is brand new to me.

Mind, I'm still a newbie to many of these concepts myself, but a key difference seems to be between an is-a relationship and a can-be-used-as-a one. At least as used in Scala, you typically wouldn't declare your object as an instance of Addable; instead, you would have an implicit converter that lets you use your existing class *as* an Addable.

The advantage here is avoiding a combinatoric explosion of base interfaces, which can happen easily in many circumstances. By putting the interface "outside" the classes that it is working with, you can sometimes get a better separation of concerns: it's a way of adding functionality to classes without requiring the direct complicity of those classes.

Mind, I think your stated interfaces look fine -- the difference as I understand typical Scala idiom is how that interface gets applied to a given class. When folks talk about typeclasses in Scala, I gather that they generally mean the combination of a typeclass interface as you describe it above, and a set of typeclass converters -- implicit functions that describe how to view the class as the desired typeclass. And then the higher-order function takes a generic type parameter of "any class that can be viewed as an Addable".

Of course, Scala has all the traditional inheritance mechanisms and then some -- Scala's traits are quite a bit more powerful than Java interfaces -- and Querki uses them almost exclusively so far. But I have a few places where that interface-explosion problem is starting to bite me on the ass, and I'm pondering how I might be able to use typeclasses to address that...

Ah, yes, having it be external to the type does change things. Of course, the level of abstraction here also means that the inference path to do the implicit conversion can get very hairy indeed.

While I have worked in languages with implicit conversion on occasion, I've usually found it one of the finickiest features to use and maintain. Of course, most of the same effect you're trying to achieve here should work just as well with explicit conversion, which is less migraine-prone.

Yaas. Implicit conversion is both the blessing and bane of Scala: it's incredibly powerful when used judiciously, and produces some truly beautiful and easy-to-use libraries, but if used unwisely the results can be truly hellish. Some folks love implicits, some hate them; I think most of us treat them as mighty beasts that are never quite tame.

Personally, I've used it in a couple of places, and it's been lovely for producing really expressive code. But my rule is to be careful with it.

(That said, I use implicit *parameters* quite frequently. They're less dangerous, and more frequently an easy boilerplate-reduction strategy.)

  • 1
?

Log in

No account? Create an account