?

Log in

No account? Create an account
Previous Entry Share Next Entry
Is there a good "prioritization annealing" algorithm?
device
jducoeur
This is a very long-term question -- there's no chance anything will be done about it in the next six months, and it's likely to be a year. But it occurred to me, and it's an interesting question, so I toss it out to the Lazyweb for pointers and/or thoughts.

Priorization

In Agile development, Estimation is almost always a group project -- we figure out how much effort a given User Story is likely to take by playing Planning Poker or something similar. But Prioritization -- deciding what order we're going to tackle things in -- typically isn't: most often, it is a single Product Manager sweating bullets, rearranging the Story Stack every hour or two and getting tagged to describe the current state of the world at the beginning of each sprint.

(Note: I am assuming a proper prioritized Story Stack -- the User Stories are arranged in strict order of priority, defined however you like, and the team pulls the top stories off the stack for each sprint unless there is a good reason to do otherwise. I have never seen another system that actually works *well*, at least for the near-term stories. For more distant stuff you can be more coarse-grained, and simply mark things High/Medium/Low, or something like that.)

There are good reasons for this approach, and I think it's always the way to make the final decisions. But what about when you have a project like Querki, with literally hundreds of user stories (indeed, I suspect I'll go over a thousand before the end of the year), not nearly enough resources, and you want deep community involvement in the prioritization process?

I've seen sites that do something simplistic for this, with a simple thumbs-up, or maybe up/down voting. I don't think it actually works all that well, though, because it's prone to all sorts of systematic problems -- it's very easy to game, it's prone to the highly-voted items getting more up-votes, the newest items tend to become trendy, and so on. The Law of Large Numbers helps, but my sense is that it's a pretty weak system for making *good* decisions. (It's perfectly fine for making the community *feel* involved, but not as much for working with them to make good prioritization decisions.)

So it occurred to me that it might make more sense, instead of treating it as "voting", to literally have the community participate in prioritization -- just at a small enough grain to be practical.

A Possible Design

Here's a strawman scenario for your consideration:

As an active member of the community, I go to the "Prioritize" page. The page offers me five random User Stories in a drag-and-drop list. I can click on each to see more information about it, and I rearrange the list to match my *personal* opinion of what their priority order is. I press "Done" to record my vote, or maybe "Done and Give Me Another" to keep playing the game with a new list. My opinion then gets fed into an overall Querki-wide list, which combines all of these votes to come up with a current master state of things.

There are many possible tweaks -- tracking my personal "priority order" or not, being able to see and/or work with a larger list, letting me simply declare "Pass" on the entire list because I don't care about any of these -- but that's the core idea I'm pondering: taking lots of little easy-to-digest lists and combining them into a big one.

Personally, I think this would not only be a good way to get the community genuinely involved, I believe it would actually be kinda *fun*. I could actually see myself getting a lot more involved doing this than I would in the usual upvote systems.

The Problem

Plain and simply, the issue is: how does it actually *work*? What is the algorithm to take these individual snippets of prioritization, and combine them into a single gigantic list that represents the gestalt feelings of the community?

I'm sure that I could come up with *some* sort of adequate approach if I sleep on it for a day or two. But I'm wondering whether there are any well-established algorithms for this sort of problem. In airy theory, the question is how to take a collection of inconsistent ordered sublists, and combine them into a single best-fit complete list. Intuitively, it feels kind of like a statistical problem, but despite three years at Memento, my statistics are quite weak. I'd love to find that there are already some well-established best practices, and maybe even a proof of a "best" algorithm within certain constraints.

Suggestions? This is a case where, while I probably could reinvent the wheel, I suspect it's smarter not to...

  • 1
If you're willing to split it into two problems, I have a pointer to a solution for the second half:

a) Collate preference information for each user: If you had enough data, this would be "create a fully ordered-by-preference list of items for each user". In practice, you're not going to have all this data; what you'd need instead is for every pair of use cases (A,B) to record whether A is preferred to B, B is preferred to A, or no information. (Which will be more than just the raw list answers, since there are transitive deductions to be made: (A>B ^ B>C --> A>C))

b) Merge individual user preferences into a global ordered list: You can aggregate those individual preferences into an ordered list via the methodology discussed in this thread on BGG. (May require some further research, but it gives the name of the model, and you can always create a BGG account and GeekMail one of the primary users in the discussion - the stats junkies on BGG tend to be friendly to those who are interested in the numbers. :)

Caveat: the algorithm seems calculation-intensive, particularly for large datasets (you're dealing with an N^2 matrix of preferences for every user in the system), so might be better suited for a nightly recalculation rather than realtime updates. It also has a hard time with items with very few rankings, though, so it's not like realtime would actually be a huge benefit - you'd probably want to exclude items which didn't have at least $THRESHOLD rankings. (But with random presentations, you'll probably have less of a problem with this than BGG does.)


Edited at 2013-08-16 04:57 pm (UTC)

Ah, very neat. I'd have to think about whether the algorithm actually requires collation per-user, or could simply treat each grouping on its own. I don't necessarily want to assume that a given user is actually consistent -- my suspicion is that most people would default to mild inconsistency, and I don't actually have any problem with that in principle.

But anyway -- thanks much. This is just the sort of pointer I was hoping for, to at least give me some idea of where to look...

Ah, very neat. I'd have to think about whether the algorithm actually requires collation per-user, or could simply treat each grouping on its own.

I'm 99.99% positive that you could just use raw grouping data (ie, not bother with the transitives) and it would work fine, albeit not quite as accurately.

I'm less sure about omitting per-user correlation. I suspect it would work if there were no duplicates, but if a given user (eg) ranks) item A higher than item B in multiple lists, I'm pretty sure it would skew the results. How much of a problem this would be in practice, I'm not sure.

I suspect it would work if there were no duplicates, but if a given user (eg) ranks) item A higher than item B in multiple lists, I'm pretty sure it would skew the results. How much of a problem this would be in practice, I'm not sure.

I would *guess* that that cancels out in practice, due to the large numbers, but yes, that bears examining...

(And I should note that I don't actually care about precision -- I just care that there are no serious systematic biases...)

(Deleted comment)
Makes sense, as a starting point for an algorithm even if it needs some tuning. Thanks!

Do it as simple instant runoff voting?

Though, honestly, this doesn't work, for a very, very simple reason: Business Value is the strongest determiner of prioritization, and only occasionally does "what the users think is the feature they want most" equate to "greatest business value". This is strongly coupled with the observed behavior that when you ask users what their problems are, they generally give you their desired solution, not the actual problem!

This is specifically why you have a product owner - someone who is supposed to be versed in the business needs and goals of the project, making the decisions. User requests are *supposed* to be strongly filtered.

Yes and no. Keep in mind that Querki is a rather weird "product", and the rules are different. I'm not building an application here, I'm building a *platform*. That means that I genuinely don't know the use cases very well -- I know *my* use cases, but those are likely to be a small fraction of the overall scope.

(Granted, most platforms don't engage in this sort of formal exercise. But most platforms spend several years thrashing around, building a lot of the wrong features and ignoring what is actually needed. I don't consider that a good model.)

So user input really is much more important here than it usually is. I have a pretty clear vision for Querki, but that vision is intentionally focused on *how* I am developing it, what the overall shape of the system is, and how I'm interpreting and managing the competing feature demands. It is quite likely that I don't have a correct idea of what, statistically, the users will actually care about, though, so this input is more genuinely important than usual.

In particular, while I have a great deal of vision of what this will eventually look like, I am massively resource-constrained, and likely to be so for at least a couple of years. That means that prioritization really does depend mainly on what users want to do.

Or in other words, the use cases that the users want, and the features needed to implement those, *are* the business goals, necessarily -- that's how a successful platform works. I'm not going to follow that user input in any kind of simplistic order (that would simply cause chaos), and I am absolutely going to interpret their desires in ways that make coherent sense for the platform. But the use cases and needs of what the users want to build *in* Querki are the main determinant of when everything gets built...

With respect - every single product owner I have ever worked with at some time says, "But *my* project is special, and you should totally use my suggested rules for it!" It isn't like nobody else has ever built a platform using Agile methods, you know.

Not to mention that your current users are not exactly a representative sample of the users who will make your business fly - the group is too small, and too self selected. Their vision is pat to be too limited to their own needs, at the very time when you need a better view of the big picture, so to speak.

But, if you must, my professional recommendation is below...

Not to mention that your current users are not exactly a representative sample of the users who will make your business fly - the group is too small, and too self selected. Their vision is pat to be too limited to their own needs, at the very time when you need a better view of the big picture, so to speak.

The *big* picture I already have: I know to a pretty comprehensive degree where this wants to end up in the 3-5 year timespan. It's the *small* picture I don't have -- what is specifically useful to do *early* for the early adopters I actually have. The question isn't "what should I build?", it's "in what order?". There are enough orthogonal dimensions to this that that is *exceptionally* unobvious.

That's not a trivial consideration, and I'm not playing special snowflake here. The current, reasonably well-evolved wisdom in the startup biz is that you should focus quite hard on the users you actually *have*, and what they need now, rather than getting carried away by the big picture. Startups fail far more often by being excessively grandiose in their vision, than for delivering what the users actually want.

Hence, I need to actually *ask* those users, rather than making guesses. Which is the point of this exercise...

As for instant-runoff -- is there a way to make that work with subsets? It doesn't make any sense to ask users to weigh a thousand different stories. Hence, I think the only approach that *does* make sense is to ask folks to give opinions within a limited subset, and have an algorithm to combine those results into something coherent...

Here's my recommendation: do it in two stages. One on the use-case level ("Querki does wedding management", "Querki does larp content management", and so on). Then, develop the user stories within the individual use case, and get those prioritized in a second round. Following the "done is done" principle, you don't stop on one use case until you've gotten enough user stories that users can reach an effective point release where people can use it. Only then do you go to another Use Case, and repeat.

That's not a bad general approach, although a bit too simplified. In practice, since my users are people *developing* the Apps, and no two Apps, even for the same use case, are likely to be identical (by design -- customizability is the whole point here), it's less about filling out a precise use case like that, and more about focusing on a *category* of use cases, and giving it the features it most needs.

In other words, "done is done" is essentially meaningless here. There *intentionally* is no such thing as "done" on the level you're talking about, because no two users are going to agree on it. Within any given Feature Area, there is going to be a broad spectrum of details, ranging from broadly-useful stuff that is worth doing very early, to very-specialized parts that only a few users need. So instead, I am necessarily focusing on "good enough" within any Feature Area, at least for the first year or two -- otherwise, it's a recipe for gold-plating, which is a fatal error for any startup.

But yes, it is likely that focusing on one (or at most a couple) of Feature Areas at a time, addressing a particular category of use cases, is likely a good direction. I'll chew on how to manage that...

limited Borda count

Remember, instant-runoff is only one of several ways to aggregate ranking data from a bunch of "voters"; the other leading ones are Condorcet and Borda. IRO and Condorcet, it seems to me, sorta require global information, but Borda lends itself better to working locally with partial information.

How about this ridiculously simple algorithm: every time you present a random set of five stories and a user ranks them, the stories gain +2, +1, 0, -1, and -2 points respectively. (Expected value = 0, so how often a given story is randomly selected neither hurts nor helps its position, although infrequently-chosen stories are pulled towards the middle of the pack.) At any given time, which stories have the most points?

Edited at 2013-08-19 05:37 pm (UTC)

Re: limited Borda count

Hmm. Possible. It would probably want some sort of "aging" mechanism, or else older stories automatically have an advantage over newer ones. But something along these lines might work...

Re: limited Borda count

+2, +1, 0, -1, -2 is exactly what I was thinking.

For ageing, I had been going to suggest adjusting the numbers, eg, +20, +10, -1, -10, -20. But that only deals with older middling ones, not high or low priority ones.

How about every X period of time, the starting value increases slightly. I don't know what your time periods would be like, but say in week or month 0, the priority of all new stories (well, all stories) starts at 0; in week 1, they start at 5; week 2, 10; etc. Obviously the numbers will need adjusting depending on how active the voting and story creation is.

  • 1