Stuff I'll forget if I don't write it down: Patterns

Related to my previous post on Approval Cycle's in work-flows, here's a shorter and more patterns-based discussion of work-queues. Again the GraphViz markup for the diagrams is available in this github gist.

For some background on Design Patterns see Christopher Alexander's Pattern Language book, or the famous "Gang of Four" Design Patterns book.

Pattern-Name

Work-Queue

Intent

In a work-flow where tasks are handed over from one or more users to one or more other users, it is often the case that the person receiving the task cannot begin work on it right away, as they are busy with something else.

Sometimes we want to provide a means of hand-over that allows person A (the producer) to say "i'm done", so that they do not accumulate any more apparent time-spent on this task, but also we don't want person B (the consumer, or next recipient of the task) to immediately begin accumulating time-spent if they aren't actually working on the task yet.

Often the two sides of the queue are not balanced - there may be many more producers adding to the queue than consumers taking from it, or vice-versa.

Also known as

"Wait-State", "In-Tray", "To-Do's", "Async Hand-over", ... ?

Motivation

For the purposes of reporting, we want to isolate:

The time the producer spends working on the task.
The time at which the producer handed-off the task (put it in the queue).
The time spent waiting for the task to be picked up and resumed by a consumer.
The time at which the consumer took the task from the queue.
The time spent by the consumer actually working on the next stage of the task.

There are many useful things we can do with the information thus captured. We can identify the true bottlenecks in a workflow (producer is too slow? consumer is too slow? consumer is doing other tasks so the queue-time is long, even though task processing time is low?) and in doing so give ourselves an opportunity to address those bottlenecks.

A typical reporting scenario might be to monitor performance of various groups or individuals against SLA's - for example there might be SLA's on the maximum amount of time a task may sit idle in the queue, and/or on the amount of time each task step should take to complete, or an SLA on the complete task.

Implementation

At its simplest this pattern involves just 3 states, one after the other, such that the producer completes his/her part of the task and submits to the work-queue, from which the consumer takes the task when ready.

Fig1. Simple Work-Queue

Note that this says nothing about the number of users involved - there may be any number of producers and consumers, but in this configuration all tasks follow the same path to arrive at the queue. A slightly more complicated alternative entails tasks arriving in the queue having followed different paths through the work-flow prior to the queue, and likewise diverging again upon leaving the queue.

Fig2. Work-Queue with many paths in and out

The multiplicity of states either side of the queue may change the criteria required for reporting (depending on whether you are interested in all paths or just one of the combinations), but in other respects the pattern is unaffected.

A nice possibility arises from having separated the phases of work with a work-queue: We can force a state-change if a task has waited too long in the queue (for example, to bump its priority somehow). This goes beyond the simple pattern described here, and probably needs another name to indicate that it is a Work-Queue with additional constraints - perhaps "Escalating Work-Queue"?

Fig3. Work-Queue with maximum wait-time threshold

Known Uses

Simply to mark a handover point, as an aid to configuration and visualisation of the steps in a work-flow.
Isolating time-spent working from time-spent waiting for reporting purposes.
Monitoring performance against Service-Level Agreements.
Prevent some tasks idling in the queue for too long by bumping their priority when a certain wait-threshold is exceeded.

Spent quite some time building a Work-flow/Reporting tool recently. Took an hour out from coding to think about a common use-case for the sorts of work-flows we typically encounter in our field: Approval cycles. Any excuse to take GraphViz for a spin :)

For the moment this is just looking at what kind of cycles we might encounter, rather than how to configure the system to handle them. I'm pretty confident we can deal with all of these scenario's without any further development.

All of the diagrams here were generated with GraphViz. The "dot" markup for all of the diagrams can be found in this github gist.

Lets start with the most basic approval cycle I can imagine: either your work gets approved and moves forward in the work-flow, or it doesn't and is returned to you for re-work:

Basic Approval Cycle

Another simple work-flow might entail two phases of checking - possibly just double-checking for the same mistakes or whatever, but also perhaps because the first-pass can be done by someone less qualified, in order to weed out the obvious errors and save the valuable time of a more qualified person for more significant problems. Here's what that could look like:

Double-checked Approval Cycle

Nice. Recording these states (with any number of do's and check's) is a good start, but what if you want to make it easy to spot or report on cases where re-work was required?

Why would we want to do this? Well, maybe to find out what kind of tasks typically cause problems so that we can address those tasks with new approaches or tools, or to find out who amongst the work-force might be a candidate for re-training (lots of re-do's) or bonus (few re-do's).

To achieve this you could just count the number of instances of "do" or "approval" in the life-cycle of a single task. Sure, that'll work, but another possibility (which I feel better about, and think is worth exploring more) is to introduce extra states in your work-flow, something like this:

Approval Cycle with Explicit Redo State

Great, now we have a specific state (re-do) that we can search for when reporting, or spot at a glance when looking at an overview of the life-cycle of a task in the work-flow, which enables us to pick out those tasks that did not pass muster at the first attempt.

Going back to our earlier example of multiple levels of checking by less/more qualified persons, we can use the same technique to isolate re-do attempts, this time capturing the level at which the re-do takes place:

Double-checked Approval Cycle with Explicit Redo States

We can take this concept further: what if its ok to make a mistake or two, but what you really want to know is what are the particularly tricky or badly handled tasks? Again, you could do this just by counting the number of times the same task passes through the "do" or "approval" states, or you could make it explicit in your work-flow.

I'll try and show how this can be a good thing to do by the end of this post but, for now, nod, smile, and bear with me. I'll go back to just one level of checking (dropping the junior/senior distinction) to keep this from getting too complicated too quickly:

Approval Cycle with Redo State Tracking

OK, what's going on here? From "do" we shift to "approval", from which our work can be "approved" directly, or refused. Following the initial refusal from "approval" we shift to the "1st re-do" state, from which the only means of moving forward is to shift to "2nd approval".

We can use "1st re-do" to report on, say, tasks that did not get approved at the first go. At "2nd approval" we can jump directly to approved, or shift into "2+ re-dos". Reporting on "2nd approval" gives us a means to find (or filter out) tasks that took at least 2 attempts to get approved.

Finally, from "2+ re-dos" we shuttle back and forth with "2+ approvals", until eventually we get approved. We could introduce more levels of course, but my suspicion is anything that requires 2+ re-do's is a big warning sign that there's a significant problem that more levels of recording won't help to resolve.

Where to go from here? Well, this is where some of the benefits of the "explicit in the work-flow" approach could be realized. For example, so far I've just had a single goal-state: "approved". Given that an approval cycle is probably, or at least potentially, just a small part of a larger overall work-flow, we might propagate to different parts of that larger work-flow depending on the path we take through the approval-cycle. I don't want to introduce too much complexity here, so lets keep it simple by having a set of different goal-states to illustrate the point ...

Approval Cycle with redo-directed alternate forward-path

Hopefully that's fairly self-explanatory. The important point to note here is that by designing the work-flow to capture these additional states separately we can follow a completely different forward path at various points if we need to. A more realistic/useful possibility that I wish I had diagrammed might be re-assigning the task to a different user if we fail at the 2nd approval.

I want to mention at this point that this might look like its getting pretty complicated already, but the people involved in the work-flow do not see any of this complexity. The person performing the task will perform the task, then submit it for approval. Occasionally a task might come back for re-work. For the person doing the approving, he/she either approves or does not. The system takes care of the rest.

There's one other alternative that I drew up, which opens a whole 'nother topic (subject for a different post I think!). This one involves using a "queue" to do things like hand-off tasks to any of a number of alternative users who can fulfil the next stage of a task's lifecycle, and/or to measure things like wait-times before the next stage of the lifecycle really begins. I think queue's could be really useful as tools for identifying bottlenecks and reporting on SLA targets.

Here's what the diagram looks like for the case where a task is added to an approval-queue (I called it "checking queue" in the diagram, and now i'm too tired to change it). Here I've gone back to the multi-level approval cycle, chewing over the idea that there may be a large number of "junior" checkers, but far fewer senior checkers.

My aim in writing this was to discuss patterns (in the Christopher Alexander
sense) for what the sub-work-flow for an approval cycle might look like. On reflection I think I could have done much better at laying these things out as patterns - ie. giving them names and describing the forces and all that. I've gone back and captioned the graphs to add some flavour of pattern-name. Hopefully the discussion was useful and the language and terminology sufficiently abstract from our implementation to keep from getting embroiled in implementation details.

Stuff I'll forget if I don't write it down

Monday, 13 December 2010

"Work-Queue" Pattern for Work-flows

Friday, 10 December 2010

Patterns for Approval Cycles in Work-flows