?

Log in

No account? Create an account
Previous Entry Share Next Entry
My first experience with Monadism
querki
jducoeur
[Yes, I know, it sounds sordid. I couldn't resist, because even programming virginities are intimidating the first time. The truth is, this is a deeply geeky programming post, for the serious engineers. This is a fairly long burble -- I encourage the curious to ask questions. The experienced functional programmers will mostly just nod sagely, recognizing a newbie getting past a hurdle, but those who are still trying to get a handle on Monads may be interested in how I got through it.

NB: this is getting into fairly advanced Scala -- levels A3/L2 in the standard rankings. Note that you don't need this sort of stuff to *use* Scala most of the time, just if you want to write really powerful libraries and frameworks. I'm painfully aware that posts like this tend to lead to the common "But Scala is *hard*" whinging online, and it isn't actually true. Scala is *powerful*, and using that power to its fullest *is* a bit hard. But if you just want to write routine application code, it's as easy or easier than Java. This post is basically an example of how an Architect *makes* it that easy for the application engineers.]


Querki is chock-full of Functions -- that is, the code defines a lot of Functions that users can use in QL to transform their data. There are dozens of them now; there will be hundreds in the medium term, and it wouldn't astonish me if that eventually climbs into the thousands as we add more specialized App Libraries to mix in.

To date, writing those Functions has been laborious, and rather boilerplate-filled, which has irked me for a long time. I realized quite some time ago, in a vague way, that this was the sort of problem where Monads are supposed to help, but for a very long time I didn't grok them well enough to make that meaningful. But today I crossed the Rubicon, and wrote my first Monad. To my great satisfaction, it worked correctly right off the bat. So I think I'm finally starting to get it.

The thing I finally got into my head was to stop *worrying* so much about all the damned category theory, and instead focus on what I wanted to accomplish.

In particular, I've come to appreciate that for comprehensions are Scala's number one tool for boilerplate reduction -- the way that they can combine executing a function, *checking* the result and *extracting* the result into one line makes things far more concise. (I use them to eliminate Option all the time nowadays.) And Scala's particular flavor of Monads are how you build something you can use in a for comprehension.

So for my test case, I deliberately wrote the new _hasProperty Function out longhand a few days ago, so that I would understand the problem nice and clearly. The original version looked more or less like this [Example 1]:
override def qlApply(context:QLContext, paramsOpt:Option[Seq[QLPhrase]] = None):QValue = {
  implicit val state = context.state
      
  if (paramsOpt.isEmpty || paramsOpt.get.length < 1)
    throw new PublicException("Func.missingParam", displayName)
  val params = paramsOpt.get
   
  val contextValOpt = context.value.firstAs(LinkType)
  if (contextValOpt.isEmpty)
    throw new PublicException("Func.notThing", displayName)
  val contextVal = contextValOpt.get
      
  val thingOpt = state.anything(contextVal)
  if (thingOpt.isEmpty)
    throw new PublicException("Func.unknownThing", displayName)
  val thing = thingOpt.get
      
  val processed = context.parser.get.processPhrase(params(0).ops, context).value
  val paramValOpt = processed.firstAs(LinkType)
  if (paramValOpt.isEmpty)
    throw new PublicException("Func.paramNotThing", displayName, "1")
  val propOid = paramValOpt.get

  ExactlyOne(thing.props.contains(propOid))
}
What I *wanted* it to look like was more or less what I ended up with [Example 2]:
override def qlApply(inv:Invocation):QValue = {
  for (
    thing <- inv.contextFirstThing;
    propOid <- inv.processParamFirstAs(0, LinkType)
      )
    yield ExactlyOne(thing.props.contains(propOid))
}
In other words, the common boilerplate operations should get wrapped up into a standard "Invocation" class, which is the representation of a specific call to a specific Function.

Just to make it extra-fancy, I didn't want to be throwing actual Exceptions for routine code if I could avoid it -- instead, I'd like to detect the error and *carry* the error back to the top, short-circuiting the later operations. (One legitimate kind of QValue is a WarningValue -- a message to show to the user. If you throw a PublicException in this code, it eventually gets wrapped up as a WarningValue, but I wanted to do without the throw, for functional elegance if nothing else.)

I knew this was possible, because I do it with the Option Monad all the time. The original code *could* have been more concise if I'd just used a for comprehension over Option [Example 3]:
override def qlApply(context:QLContext, paramsOpt:Option[Seq[QLPhrase]] = None):QValue = {
  for (
    params <- paramsOpt;
    if (params.length >= 1);
    link <- context.value.firstAs(LinkType);
    thing <- state.anything(link);
    processed = context.parser.get.processPhrase(params(0).ops, context).value;
    propOid <- processed.firstAs(LinkType)
  )
    yield ExactlyOne(thing.props.contains(propOid))
}
In other words, a Scala for comprehension "flattens" the values it is dealing with. You can think of the "<-" operator as "unwrap" in a certain sense (I usually pronounce it "from") -- it expects the right-hand side to be some sort of wrapper around some values, and iterates over those values, assigning them to the name on the left. (In this code, the "wrapper" is Option[T].) If any stage winds up empty, the value None simply propagates through, and in practice nothing further happens. If the stages are the same Monad, it just squishes them together, so you end up with an Option[something] instead of Option[Option[Option[something]]].

Anyway, using Option would work, but wouldn't achieve my goal of returning nicely informative errors -- None doesn't really tell me much. So I would need a richer Monad to pull this off. (In practice, I believe I semi-reinvented scalaz' \/ Monad, but mine is a bit more tightly typed, and the process was educational.)

A key part of figuring out how to get there was remembering that for comprehensions are nothing but syntactic sugar. The compiler would take the above [Example 2], and turn it into [Example 4]:
inv.contextFirstThing.flatMap { thing =>
  inv.processParamFirstAs(0, LinkType).map { propOid =>
    ExactlyOne(thing.props.contains(propOid))
  }
}
All of which raises the interesting question: what am I flatmapping *over*? A Monad is always M[T] -- it wraps *around* something, more or less by definition. So what is this wrapping around?

The clue came when I realized that the T is what we're assigning to the left-hand side of each line in the for comprehension. So for example:
thing <- inv.contextFirstThing;
means that inv.contextFirstThing must be returning M[Thing] -- whatever the heck the M is.

Moreover, since I want to be preserving errors when they are found, and propagating them through, that implies that M *also* is holding on to the current error value, if there is one.

At this point, I finally began to grok that the Monad I wanted was *not* the Invocation itself, a point on which I'd been much confused. Rather, I wanted a type that is produced *by* the various methods on Invocation, which contains the value from that method (or the error), and which can combine with map/flatMap. So I started out with the signatures it had to contain, and wound up with roughly this (simplified, with comments added) [Example 5]:
// The class clearly needs to contain the value or the error, and the Invocation itself is useful:
case class InvocationValueImpl[T](inv:Invocation, vOpt:Option[T] = None, errOpt:Option[PublicException] = None)
  extends InvocationValue[T]
{
  def map[R](f:T => R):InvocationValue[R] = {
    errOpt match {
      // If there has already been an error, just propagate that and ignore the remaining functions:
      case Some(err) => InvocationValueImpl[R](inv, None, errOpt)
      // Otherwise, actually call f:
      case None => {
        vOpt match {
          // Invoke f() on our current value, and wrap the result as an InvocationValue[R]:
          case Some(v) => InvocationValueImpl(inv, Some(f(v)), None)
          case None => {
            QLog.error("Somehow got an InvocationValueImpl with no initial value!")
            InvocationValueImpl(inv, None, Some(UnexpectedPublicException))
          }
        }
      }
    }
  }
  
  // Basically the same as map(), but slightly different signature and handling:
  def flatMap[R](f:T => InvocationValue[R]):InvocationValue[R] = {
    errOpt match {
      case Some(err) => InvocationValueImpl[R](inv, None, errOpt)
      case None => {
        vOpt match {
          // Note that we already get an InvocationValue[R] from f(v), so we don't need to wrap it:
          case Some(v) => f(v)
          case None => {
            QLog.error("Somehow got an InvocationValueImpl with no initial value!")
            InvocationValueImpl(inv, None, Some(UnexpectedPublicException))
          }
        }
      }
    }
  }
 
  // For final processing at the end of the for comprehension -- see below    
  def get:T = vOpt.get
  def getError:Option[QValue] = errOpt.map { ex =>
    val msg = ex.display(Some(inv.context.request))
    QL.WarningValue(msg) 
  }
}
That turned out to be pretty much it. The key is the way map() and flatMap() work. If we've already hit an error, then we *ignore* the mapped functions, and just pass along the error (tweaking the type parameter as needed). If not, call the next stage in the processing. Note that none of this has *anything* to do with the actual details of the invocation, or the values produced by its various functions -- it just wraps *around* those values, in a way that lets us combine them into a pipeline. It just knows that this stage has a value of type T, and the next one is of type R, and it has a function that goes from T to R. (In practice, the compiler figures out T and R from context, so you don't have to write them out.)

The final trick is how we get the QValue at the end, given that our for comprehension is going to return an InvocationValue[T]. We handle this with the usual Scala magic of an implicit conversion [Example 6]:
  implicit def inv2QValue(inv:InvocationValue[QValue]):QValue = {
    inv.getError.getOrElse(inv.get)
  }
In other words, "To turn an InvocationValue[QValue] into a QValue, first try to get its error value; if that is None, then get its actual value". So if the for comprehension is correctly built, and ends with an InvocationValue[QValue], it will magically become simply a QValue on the way out the door. And if not, we'll get a compiler error telling us that we're not returning a QValue -- just the sort of strong typing we want.

In the end, it's surprisingly little code, given how many months I've been staring at this problem. Once you grok that a Monad is really just a combineable wrapper with map and flatMap (and possibly foreach and filter, depending on how you are using it), it's a snap to code. I can't say that I have completely gotten my intuitions around it, but this approach -- asking, "What do I want the resulting for comprehension to look like?", and then working backwards from what those signatures imply -- seems to be a good way to get past that hurdle...