Justin du Coeur (jducoeur) wrote,
Justin du Coeur
jducoeur

Singletons vs. Ecologies

Having just written this up at work (we're starting a new code base, and I'm putting my foot down), it occurs to me that this makes for a good programming-journal entry. So here it is, only slightly edited.

This is pure programmer-speak -- non-engineers are going to find it *utterly* eye-glazing, and should simply skip it. I'm not sure whether engineers are going to like it or not, but I think it's an important topic for Good Program Structure, so if you care about programming *well*, especially in large serious programs, it's worth your time to read. I'm constantly astonished that this Ecology Pattern isn't more widespread.

Context: we've been discussing Singletons at work. Specifically, whether one should just use the Singleton pattern, or at least a Singleton-style static accessor, to get at the Configuration information. That set off the following rant, on why one should avoid Singleton, and instead use Ecologies.

First, some terminology. When I say “the Singleton pattern”, I’m referring to this common code pattern, and variants thereupon:
Class myClass
{
   private static myClass m_Instance;

   public static myClass Instance 
   {
      get
      {
         if (m_Instance == null)
            m_Instance = new myClass();
         return m_Instance;
      }
   }
}
There are two aspects to this, both of which have serious problems. First, it lazily creates the object, on first demand, and returns that object forevermore. Second, it returns the object pointer from a static accessor.

My stake in the ground, based on many years of using both approaches, is this: the common Singleton pattern is broken. That’s not a terribly controversial opinion: many software researchers have agreed the same point for a good decade or so now. It has some specific problems:
  • It provides miserable lifespan management. When a program is simple that’s not an issue, but complex programs tend to require very careful management of what gets started and terminated when. The Singleton pattern *totally* hoses this: since it starts things up implicitly, it’s easy to wind up accidentally firing up systems in a different order than you intended, causing all sorts of subtle bugs and unsubtle crashes. And it usually provides no mechanism for clean shutdown at all. Many programs simply live with this, and shut down poorly with resource leaks all over the place, but it’s lazy code, and again often leads to subtle bugs.

  • It couples interface and implementation in terrible ways. Even if you are using interfaces to separate your major code areas (and you *should* be doing so whenever there is the slightest possibility that it might matter), the Singleton pattern *always* breaks interface/implementation decoupling, by definition. The static accessor always returns the same thing, so in practice, you always have to use the same implementation.

  • This, in turn, is a big problem for testing. One of the most useful paradigms for unit testing is the Stub. You hook the class you are testing up to a bunch of stub implementations of the related classes, so that you can tightly constrain and control the environment of the class being tested. Doing this demands rigorous use of interfaces between classes – you use interfaces a *lot* (indeed, I recommend simply using them as a matter of habit unless the classes are intentionally intimately bound to each other), so that you can easily stub out any class when you need to for testing. But you can’t do that if the classes are talking to each other via the Singleton pattern – the coupling of interface and implementation destroys the ability to stub things out.

  • It hardcodes the notion of Singleton-ness, which again can be a problem for testing. Another frequently-useful testing mechanism is to pull together multiple “ecologies” of objects into a single process, so you can rigorously test how they interact. But these ecologies are often incompatible with each other at the Singleton level: they have different assumptions, and simply can’t work together. (For instance, they require different instantiations of a Singleton, because they require different data due to their different roles in the problem.) Again, because the Singleton pattern *enforces* Singleton-ness, you can’t work around it for testing.
Hence, the "Ecology" pattern, to replace it.

It is a general architecture for object-oriented programs, which I’ve used at several companies and have consistently found makes for a much more stable architecture for *any* program. It takes a very different approach, declaring that you *never* use the Singleton pattern. Instead, all of the top-level objects that might otherwise be Singletons get registered in a master Ecology collection, indexed by type. Each top-level object (which I’ll call a “Module”) has the following characteristics:
  • Its constructor takes the master Ecology collection, and stores that as a public member, so the object can easily access the Ecology. It automatically registers itself within the Ecology. There is a base class to make it more convenient to deal with this. The constructor *never* makes use of other Modules or Interfaces; usually, it just fills in some of its own members.

  • It has well-defined Init() and Term() methods for startup and shutdown. These are allowed to refer to Interfaces in the Ecology, and most of the real work happens in here. These are invoked automatically by the top-level Ecology.Init/Term() methods, which are called by the master method of the program.

  • It declares which Interfaces it is exporting to the rest of the system. While it’s legal to export a Class, it is strongly recommend that you instead export Interfaces, to promote better decoupling. A given Module may export any number of Interfaces (including zero, which is often useful for high-level workflow logic that consumes Interfaces and needs lifespan management, but doesn't provide anything to the rest of the system).

  • It declares its own Dependencies. That is, it says which other Interfaces it expects to be using. Specifically, a Module is required to declare dependency on any Interfaces that it intends to use during Init() and Term(). This allows the Ecology to automatically calculate the optimal order to call the various Modules Init/Term() methods, and automatically detects and reports any reference loops, so you get explicitly and immediately told about bugs that would otherwise show up much more subtly.
Non-top-level objects are typically dependents of one top-level Module, and they usually are constructed with a pointer to their master; through that, they can get at the master Ecology. The same Init()/Term() pattern is strongly recommended for them; specifically, Term() should always drop the pointer to the parent, so that reference loops are dropped.

Access to these top-level objects goes through the master Ecology; you say which *kind* of object you want, and it gives it back to you. So in C#, you would have something like this:
IMyThing thingy = Ecology.Get<IMyThing>();
Ecology is a property on your object, that returns the master IEcology object; it is provided by the standard base class that most of the Modules derive from. It provides quick and easy access to the various top-level Interfaces. These fetched Interface pointers may also be stored as member variables, provided that you have declared a dependency on them. (An underlying assumption is that Modules exist for the full lifespan of the program. There are exceptions to this that have to be worked around, but in my experience they are rare.)

This architecture is specifically designed to avoid all of the problems described for Singletons. In particular:
  • Lifespan management is explicit, clear, and automatic. Since each Module is only responsible for declaring its own dependencies, no single bit of code needs to have a priori knowledge of the overall startup/shutdown order. This avoids many initialization bugs. Moreover, system shutdown is just as automatic and consistent, so it is easy to build a program that shuts down *completely* cleanly.

  • Interface and implementation are explicitly separated. Modules are the implementation, and they explicitly declare which Interfaces they are exporting. This provides excellent control over exactly what can be seen from the outside.

  • Stubbing is trivially easy. You simply write an alternate implementation that provides the desired Interface, and stick that into the Ecology instead of the real Module. This makes well-structured unit testing much easier.

  • While the Ecology provides implicit Singleton-ness (in that the Ecology can only contain a single instance of a given Interface, and will complain if you try to register another), that is *not* enforced at the process level. It is quite easy to stick two Ecologies side-by-side within a test framework, so that you are able to simulate complex multi-process tests in a single well-controlled process.
The Ecology architecture was originally designed for a C++ environment (I picked it up from Tom Leonard, my old boss at Looking Glass and possibly the best engineer I've ever worked with), and it provides even more important benefits there (relating to C++ compilation dependencies). But even without the problems of C++, it still provides enough benefit to be well worth following.

So I advocate a *strict* use of this architecture, eschewing the Singleton pattern entirely. Yes, it’s a smidgeon more work, but really – if you use sensible base classes and templates, the extra work is pretty modest. It enforces good program structure, and it gives you the flexibility to decide that something isn’t *quite* a Singleton at future times.

IMO, there is no such thing as a component that is *so* permanent and absolute that the Singleton pattern is appropriate. For instance, this discussion was started by the Config system, which reads in configuration information stored in external files and makes it accessible to the program; that's sufficiently universal and low-level that people often gravitate towards the ease-of-access that Singleton provides. But Config is a *fine* example of something that would benefit from the Ecology pattern instead. If you use the Ecology, it’s easy to precisely test the ways that Modules are dependent on Config values. Indeed, it allows you to come up with an alternate implementation of the Config system, that allows you to define a strict subset of the overall Config values at the test level *in code* – that allows the test code to set, say, the four Config values that you expect this Module to depend upon, and to throw an exception if it accesses any others. That removes the formal dependency on external Config files (which makes it hard to define really well-encapsulated tests), and allows you to test that Modules aren’t introducing unexpected dependencies.

In short, it's a better architecture for large programs. Most programmers develop their habits from small programs, and simply keep using them as they move onto big stuff, but those habits often cause problems in the large scale, and this is a *fine* example of that. Instead, get into the habit of using a solid architecture like this (and I use variations of this approach in pretty much all of my work nowadays), and it will stand you in good stead regardless of the program size. It's the best pattern to use for overall program structure for most non-trivial programs.

Caveat: none of this dealt with threading. If you have a program with complex threading requirements, a number of the fine details of the Ecology system will get more complex. I've implemented this architecture with serious threading, and it's a tad tricky to get right -- but then, getting threading right is *always* the hardest part. If you need serious thread-safety, you may need a heavier-weight architecture, with true marshalling and proxy objects. In my experience, that can often be done tactically rather than as the heart of the architecture, but YMMV. At the least, the Ecology approach, with its strict separation of interface and implementation, brings you much closer to being able to deal with threading cleanly...
Tags: programming
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 5 comments