?

Log in

No account? Create an account
Previous Entry Share Next Entry
The Ecology Pattern [LONG]
device
jducoeur
[This one is strictly for the programmers.]

As mentioned this morning, I spent the day reimplementing the Ecology Pattern, my preferred way to manage dependency injection in my programs. It's a tried and true pattern that I've been using for a solid 15 years now. I first learned it from Tom Leonard, my boss at both Looking Glass and Buzzpad -- he had evolved it to keep Dependency Hell at bay in C++/COM applications, but I quickly found that it's almost always appropriate for programs that are of more-than-modest size. It isn't necessarily the One True Answer to dependency injection, but I find that it consistently works well, and frankly, it's easy enough that I find it usually better to just reroll it for each application, instead of using a canned library.

Most of the concepts will be familiar to folks who are used to dependency injection, but there's an additional focus here: managing initialization and termination of the system in an organized and sufficiently-predictable way. Most programs pay too little attention to initialization. In the better cases, they simply have the top level of the program choose what order to initialize the components. (Which is difficult to maintain, and produces ugly coupling.) Most often, folks just use the Singleton Pattern in some fashion, initializing a system when it is first invoked -- which is great until you hit a dependency cycle, and it abruptly crashes in a hard-to-debug way. By contrast, Ecology treats initialization and termination as first-class problems, to be dealt with properly.

NOTE: in the following, I'm not going to deal with truly complex initialization problems, such as multi-threaded initialization (eg, when you have to initialize a subsystem that *must* live on a single master GUI thread), or asynchronous initialization (eg, when a component needs to work with a remote dependency before it can be considered fully initialized), or changing the Ecology while the system is running, or encapsulating subsystems in child Ecologies. I've dealt with all of these in previous projects, and none are *terribly* hard to solve, but I'm not going to muddy the waters with them here. Feel free to ask about them in the comments.

The key concepts of Ecology go as follows:

First, there is the Ecology itself. (The name is questionable -- previous projects have argued for "Ecosystem" as being more correct, but I am set in my ways.) This is the master wrapper for the whole world -- a single point of reference from which you can access all the major system components. It provides access to all of the Interfaces that have been registered in it. It keeps track of what has been initialized, and throws an exception if you try to access anything before it has been initialized. (There is also usually a side-interface called EcologyManager, that is used during setup and shutdown.)

The Ecology is composed of Ecots. Yes, this is a horrible piece of made-up jargon, but it's less ambiguous than a wishy-washy term like "Module". An Ecot is a self-contained system singleton -- anything from Logging to Configuration to Database Access. In the case of Querki, Ecots are required to be stateless (to keep threading clean), but that's not inherent in the concept -- many of my previous projects have involved stateful Ecots.

An Ecot may implement any number of Interfaces. The Ecot is private, not visible to the rest of the world; the Interfaces are public, and can be queried from anywhere once they are initialized. In particular, Ecots refer to each other via Interfaces.

Each Ecot declares the Interfaces that it depends upon in order to initialize. These dependencies are how you get clean initialization. System startup works like this:
  • First, the top level of the system creates all of the Ecots, passing the Ecology into each one. Each Ecot registers itself in the Ecology. During construction, Ecots are *absolutely forbidden* to refer to anything else -- they only do their own internal construction.

  • After all Ecots are registered, the top level calls Ecology.init(). This (effectively) does a topological sort of the Ecots, by their dependsUpon declarations. If it finds any dependency loops in those declarations, it immediately fails and reports the loop. Otherwise, it initializes in sorted order, starting with the Ecots that depend on nothing, and gradually working its way outward as the required dependencies are available.

  • Once that is finished, the system is up and running. Anybody can use any other system from this point forward, by fetching the needed interfaces from the Ecology.

  • At shutdown time, you terminate each Ecot, in the reverse order of how you initialized them. (This isn't strictly correct, but in my experience generally works as desired.)
Note that initialization order is *not* strictly deterministic, and doesn't try to be. Instead, it focuses on the important part: making sure that the world is ready before each element is initialized.

That's pretty much it. It is *not* rocket science -- I implemented the whole system, including unit tests, today. But a surprisingly large number of projects don't even go to this much effort -- they simply leave initialization and the inter-relation of subsystems up to the Singleton pattern, and eventually find themselves in all sorts of hell as a result, only after the code has gotten truly complex. By *starting* with Ecology, you can avoid those hells from the beginning, and have an architecture that is solidly scalable from a code POV.

Here are some simplified versions of the main traits (what most languages call "interfaces") from the Querki version of Ecology, to give you an idea. Questions welcomed...
trait Ecology {
  // Get the Manager for setting up and shutting down this Ecology
  def manager:EcologyManager
 
  def api[T <: EcologyInterface : TypeTag]:T
}
 
trait EcologyManager {
  // Gets the Ecology that this is managing.
  def ecology:Ecology
 
  // Adds the specified Ecot to this Ecology.
  def register(ecot:Ecot):Unit
  
  // Initializes the world.
  def init()
  
  // Terminates the world.
  def term()
}
 
/**
 * This is a pure marker trait. All "interfaces" exposed through the Ecology *must* have this as their
 * first trait linearly. (Usually, it will be the only thing that an exposed interface extends, but
 * that is not required.)
 */
trait EcologyInterface
 
case class InterfaceWrapper[T <: EcologyInterface](ecology:Ecology)(implicit tag:TypeTag[T]) {
  lazy val get:T = ecology.api[T]
}
 
trait Ecot {
  def dependsUpon:Set[Class[_]]
 
  // This is messy, but is the method you actually call inside the Ecot, to get an init-time reference
  // to an external Interface.  This populates dependsUpon().
  def initRequires[T <: EcologyInterface](implicit tag:TypeTag[T]):InterfaceWrapper[T]  
 
  def init = {}
  def term = {}
  
  /**
   * Note that registration takes place during construction.
   */
  ecology.manager.register(this)
 
  // This is the set of all EcologyInterfaces that this Ecot implements.
  def implements:Set[Class[_]]
}
There's a bunch of implementation, but honestly, it's not hard -- like I said, I wrote pretty much the whole thing today. (Yay for Scala.) I strongly recommend going to the effort of setting up something like this at the beginning of any major project: it'll save you lots of hassle down the road...

  • 1
Interesting, but a challenge to read because you have some embedded < (and maybe >) that are messing with the markup.

Are there other references to this pattern in other sources, because I don't think I've ever seen it before? I'm on the edge of wrestling with a big initialization/shutdown mess, and this sounds like the solution I'm going to need.

Edited at 2014-01-04 03:56 am (UTC)

Interesting, but a challenge to read because you have some embedded < (and maybe >) that are messing with the markup.

Grr. I'm too used to Querki, which (being Markdown-based) is mostly pretty smart about this sort of thing. Fixed -- thanks for pointing it out.

Are there other references to this pattern in other sources, because I don't think I've ever seen it before?

To my constant surprise, not that I know of. Granted, it's more complex than most patterns, but it's so broadly useful that I would really expect it to catch on.

Note that the terminology, including the "Ecology" name, is entirely mine -- made it up all the way back at my bubble-company Trenza and have held onto it since. IIRC, Tom didn't have a name for it: he simply had evolved it as a common-sense way to keep C++ #includes from turning into a complete nightmare.

It's a variation of Dependency Injection, and there are a bunch of variants of that, but this one's unusual in treating init/term as part and parcel of the same problem. I can't imagine it's unique, but it does seem to be surprisingly unusual.

Feel free to ask questions about details. I've found that every big app is a bit different, so I reimplement and tweak it each time, but this is at least the sixth or seventh time I've done so -- I have a lot of tricks of how to make it work. (Including a few versions of the init algorithm, which can range in complexity depending on how hard the problems are. It is *conceptually* a topological sort, but I rarely bother to implement it that way.)

Agreed... I think it's the <: inheritance-looking statements in the code that are messing things up.

Interesting way to lay things out. What do you use for tree-stitching? (to create the partial ordering)

Have you ever written visualization tools for ecology? There's a similar pattern at play at work, where the challenge is getting an enormous legacy codebase to comply with basic things like "no loops" and "every bit of code goes in only one ecot" (which are approximately like what we call Components), and I spend a lot of time working on tools to help people understand the rather complex resulting structure--especially when trying to untangle loops, or figure out where new code goes.

What do you use for tree-stitching? (to create the partial ordering)

In fact, I usually don't, because most apps only require the effect of the topological sort, not the reality. The actual code for Querki goes like this:
  def init(initialSpaceState:SpaceState):SpaceState = {
    initializeRemainingEcots(_registeredEcots, initialSpaceState)
  }

  private def initializeRemainingEcots(remaining:Set[Ecot], currentState:SpaceState):SpaceState = {
    if (remaining.isEmpty) {
      println("Ecology initialization complete")
      currentState
    } else {
      remaining.find(_.dependsUpon.forall(_initializedInterfaces.contains(_))) match {
        case Some(readyEcot) => {
          val newState = readyEcot.addSystemObjects(currentState)
          readyEcot.init
          _initializedEcots += readyEcot
          _termOrder = readyEcot :: _termOrder
          readyEcot.implements.foreach(interface =>_initializedInterfaces += (interface -> readyEcot))
          initializeRemainingEcots(remaining - readyEcot, newState)
        }
        case None => {
          remaining.foreach { ecot =>
            ecot.dependsUpon.foreach { dependency =>
              if (!_registeredInterfaces.contains(dependency))
                throw new InitMissingInterfaceException(dependency, ecot)
            }
          }
          
          throw new InitDependencyLoopException(remaining)
        }
      }
    }
  }
In other words, I start with the list of registered Ecots, and do a bunch of passes through the list (recursively in this case, because Scala). Each time, I knock off an Ecot that has no unresolved dependencies. If I ever find that I can't do that, it automatically means that there is either a *missing* Interface, or a dependency loop.

This is the simplest version of the algorithm, which works nicely if all of the initialization can be synchronous and in a single thread. Asynchronous isn't *too* much harder so long as you put a requirement on the Ecots to be explicit about what they're doing. Truly multi-threaded is a pain in the ass, but I got that working at Trenza. (Admittedly a dozen+ years ago, so I don't remember all the details offhand.)

Have you ever written visualization tools for ecology? There's a similar pattern at play at work, where the challenge is getting an enormous legacy codebase

Haven't done so to date, no. Keep in mind that I'm not usually working with a codebase that is in any meaningful way "enormous", at least when I'm writing Ecology -- I usually implement it at company bootstrap or not long after. This is the latest I've ever done it, and in the grand scheme of things, Querki's codebase is still pretty small.

If Querki should prove successful, that might well change -- I can see the value of this sort of visualization tool. Not trivial to write, but given modern Scala's introspection capability, I expect that you could probably build a compiler plugin that can build both the initialization and full dependency trees. (Which are not necessarily the same: Ecology recognizes explicitly that init-time dependencies are separate, and stricter, than ongoing ones.)

Edited at 2014-01-04 04:34 pm (UTC)

Each time, I knock off an Ecot that has no unresolved dependencies.

Cute, and obvious in retrospect!

(Deleted comment)
It might be interesting (and more generalizable) to permit an ECOT to specify its wind-down dependencies separately

In principle I agree; in practice, I've never found any reason to care enough to write that. I'm basically treating it as a solution in search of a problem until I encounter an actual motivating use case.

(If I was trying to write The One True Library I'd probably view this differently. But since I just code it up for each project, and customize to that project's needs, I try to avoid adding any bells and whistles that this project doesn't require. The simpler the Ecology code, the more I feel comfortable just coding it up and forgetting about it.)

Could your pattern be extended to handle that case, easily?

Not obviously so, and I haven't tried to do so. I'm usually working on fairly large systems, typically server-side, so I've come to view lazy instantiation/initialization as a design flaw, frankly -- in a large system, typically all the *major* modules will need to be initialized anyway, and therefore I prefer to do so in an organized fashion, at the beginning.

I'll note several caveats to that, though. First, this pattern is primarily aimed at "system singletons" -- the major services within the application -- hence the strong emphasis on fetching via the interface's identity. Most lazy-initialization situations I've come across aren't truly system singletons, at least from the system's POV, they're dynamically-loaded plugins that share the same top-level interface. Ecology isn't trying to solve that problem -- typically, I'll have the *manager* for those plug-ins in the Ecology, and use that to navigate to them.

Second -- in principle, you could dynamically load an entire interdependent subsystem after main system initialization. I've coded that up once, as a notion of "sub-Ecologies" that could be constructed and initialized separately after main system init. (I did this at Convoq: we had an airy theory that our corporate customers would be able to provide us with dynamically-loaded subsystems, which we wanted to be able to load and unload while the server was running. I don't think we ever actually *used* this capability, but I got it working.) This is basically the closest I've come to what you're describing.

But to your specific question: no, I really haven't ever tried to do lazy instantiation of Ecots that are referenced by other parts of the system. I don't love the idea in principle -- it adds an extra measure of non-determinism, and part of the point of Ecology is to make the system's setup *reasonably* deterministic. (So that, for example, dependency problems get caught early, in the functional tests.)

I can believe that this sort of lazy instantiation might be useful in, for example, a relatively short-lived service that didn't require a large number of the subsystems in a typical run. But I generally haven't found programs like that to be complex enough to warrant the Ecology in the first place -- for those, I *do* generally allow myself to just use Singleton.

My suspicion is that it could be done, if you separated the dependency information about an Ecot from the Ecot itself (so that the Ecology could detect dependency problems without instantiating the actual Ecot). Don't know that it would ever be worth the hassle, though...

(Deleted comment)
It may be asking too much of the concept. Ecology was created mainly as a way to tame complex but fundamentally *local* program management. Remoting was never really part of the equation, and I'm not sure that it really scales to that in any elegant way.

Indeed, it's notable that Querki has wound up with a sharp dichotomy. I'm using Ecology in a big way, but specifically for *stateless* code. That turns out to be a great deal more than I'd originally anticipated -- Scala has driven me towards surprisingly clean, nearly pure-functional code, with the result that the vast majority of the system is stateless-after-initialization. (That is, I allow the state within an Ecot to evolve during init, but after that it should be effectively immutable.)

I'm treating state, OTOH, as a problem that needs to scale arbitrarily, as must threading. So all state *must* live within Akka Actors, which buys me scalability and robustness in the face of threading more or less for free. (So long as I don't do anything stupid.) Those Actors are Ecology-aware, and treat the Ecology as essentially a gigantic internal library.

But "the Ecology" is specifically local to a single node. Remoting happens via Akka, and when an object is remoted to somewhere else, it works with that node's Ecology. Indeed, it occurred to me yesterday that, once I do get to clustering, I'm going to have to write custom serializers, in order to remove the old Ecology pointer before something goes on the wire, and to add the new one once it's received. That's a hassle, but I believe it'll be worth it.

So basically, I'm not trying to solve that problem. Part of how Ecology manages to be clean and easy is that it's fairly focused on one (extremely common) set of problems...

  • 1