Getting A Grip: Special Report on the AWS Special Report

Late last week a seemingly comprehensive takedown Amazon, titled “Amazon’s extraordinary grip on British data“, appeared in the Telegraph, written by Harry de Quetteville.

Read quickly it would suggest that Amazon, through perhaps fair and foul means, has secured too great a share of UK Government’s cloud business and that this poses an increasingly systemic risk to digital services and, inevitably, to consumer data.

Read more slowly, the article brings together some old allegations and some truths and joins them together so as get to the point where I ask “ok, so what do you want to do about it”, but it doesn’t suggest any particular action. That’s not to be said that there’s no need for action, just that this isn’t the place to find the argument.

The main points of the Telegraph’s case are seemingly based on “figures leaked” (as far as I know, all of this data is public) to the newspaper:

  • Amazon doesn’t pay tax (figures from 2018 are quoted showing it paid £10m euros on £1.9bn revenues, using offshore (Luxembourg) vehicles. For comparison, the article says, AWS apparently sold £15m of cloud services to HMRC.
  • There is a “revolving door” where senior civil servants move to work for Amazon “within months of overseeing government cloud contracts.” Three people are referenced, Liam Maxwell (former Government deputy CIO and CTO), Norman Driskell (Home Office CDO) and Alex Holmes (DD Cyber at DCMS).
  • Amazon lowballs prices which then spiral … and “even become a bar to medical research.” This is backed up by a beautifully done Amazon smile that says DCLG signed a contract in 2017 estimated at £959,593 that turned out to cost £2,611,563 (an uplift of 172%)
  • There is a government bias towards AWS giving it “an unfair competitive advantage that has deprived British companies of contracts and cost job[s].
  • A neat infographic says “1/3 of government information is stored on AWS (including sensitive biometric details and tax records); 80% of cloud contracts are “won by large firms like AWS
  • Amazon’s “leading position with … departments like the Home Office, DWP, Cabinet Office, NHS Digital and the NCA is also entrenched.
  • Figures obtained by the Sunday Telegraph suggest that AWS has captured more than a third of the UK public sector market with revenues of more than £100m in the last financial year.

Let’s start by setting out the wider context of the cloud market:

  • AWS is a fast growing business, roughly 13% of Amazon’s total sales (as of fiscal Q1 2019). Just 15 years old, it has quickly come to represent the bulk of Amazon’s profits (and is sometimes the only part of Amazon that is in profit – though Amazon would say that they choose not to make the retail business profitable, preferring to reinvest).
  • Microsoft’s Azure is regularly referred to as a smaller, but faster growing business than AWS. Google is smaller still. It’s hard to be sure though – getting like for like comparisons is difficult. AWS’ revenues in Q2 2019 were $7.7bn, and Microsoft’s cloud (which includes Office 365 and other products) had $9.6bn in revenues. AWS’ growth rate was 41%, Azure’s was 73% – both rates are down year on year. Google’s cloud (known as GCP) revenue isn’t broken out separately but is included in a line that includes G-Suite, Google Play and Nest, totalling $5.45bn, up 25%
  • Amazon, as first mover, has built quite the lead with various figures published, including those in the Telegraph article, suggesting it has as much as 50% of the nascent cloud market. Other sources quote Azure at between 22 and 30% and Google at less than 10%%

There’s a almost “by the by” figure quoted that I can’t source, where Lloyd’s of London apparently said that “even a temporary shutdown at a major cloud provider like AWS could wreak almost $20bn in business losses.” The Lloyd’s report I downloaded says:

  • A cyber incident that took a “top three cloud provider” offline in the US for 3-6 days would cost between $6.9bn and $14.7bn (much of which is uninsured, with insured losses running $1.5-2.8bn)

What’s clear from all of the figures is that the cloud market is expanding quickly, that Amazon has seized a large share of that market but is under pressure from growing rivals, and that there is an increasing concentration of workloads deployed to the cloud.

It’s also true that governments generally, but particularly UK government, are a long way from a wholesale move to the cloud with few front line, transactional services, deployed. Most of those services are still stuck in traditional data centres, anchored by legacy systems that are slow to change and that will resist, for years to come, a move to a cloud environmnet. Instead, work will likely be sliced away from them, a little at a time, as new applications are built and the various transformation projects see at least some success.

The Crux

When the move to cloud started, government was still clinging to the idea that its data somehow needed protection beyond that used by banks, supermarkets and retailers. There was a vast industry propping up the IL3 / Restricted classification (where perhaps 75-80% of government data sat, mostly emails asking “what’s for lunch?”). This classification made cloud practically impossible – IL3 data could not sit on the same servers or storage as lower (or higher) classified data, it needed to be in the UK andsecured in data centres that Tom Cruise and the rest of the Mission Impossible team couldn’t get into. Let’s not even get into IL4. And, yes, I recognise that the use of IL3 and IL4 in regards to data isn’t quite right, but it was by far the most used way of referring to that data.

Then, in 2014, after some years of work, government made a relatively sudden, and dramatic, switch. 95% of data was “Official” and could be handled with commercial products and security. A small part was “Official Sensitive” which required additional handling controls, but no change in the technical environment.

And so the public cloud market became a viable option for governments systems – all of them, not just websites and transactional front ends but potentially anything that government did (that didn’t fall into the 5% of things that are secret and above).

Government was relatively slow to recognise this – after all, there was a vast army of people who had been brought up to think about data in terms of the “restricted” classification, and such a seismic change would take time. There are still some departments that insist on a UK presence, but there are many who say “official is official” and anywhere in the UK is fine

It was this, more than anything, that blew the doors off the G-Cloud market. You can see the rise in Lot 1/IaaS cloud spend from April 2014 onwards. That was not just broad awareness of cloud as an option, but the recognition that the old rules no longer applied.

The UK’s small and medium companies had built infrastructures based around the IL3 model. It was more expensive, took longer, and forced them through the formal accreditation model. Few made it through; only those with strong engineering standards and good process discipline and, perhaps, relatively deep pockets. But once “official” came along, much of that work was over the top, driving cost and overhead into the model and it wasn’t enough of a moat to keep the scale players out.

TL;DR

I’ve let contracts worth several hundred million pounds in total and worked with people who have done 5, 10 or 20x that amount. I’ve never met anyone in government who bought something because of a relationship with a former colleague or because of any bias for or against any supplier. Competition is fearsome. Big players can outspend small players. They can compete on price and features. Small players can still win. Small players can become big players. Skate where the puck is going, not where it was.

How does a government department choose a cloud provider?

Whilst the original aim of G-Cloud was to be able to type in a specification of what was wanted and have the system spit out some costs (along with iTunes style reviews), the reality is that getting a quote is more complicated than that. The assumption, then, was perhaps that cloud services would be true commodity, paying by the minute, hour or day for servers, storage and networks. That largely isn’t the case today.

There are three components to a typical evaluation

1) How much will it cost?

2) What is the range of products that I can deploy and how easily can I make that happen? Is the supplier seen by independent bodies as a leader or a laggard.

3) Do I, or my existing partners, already have the skills needed to manage this environmment?

Most customers will likely start with (3), move to (2) and then evaluate (1) for the suppliers that make it through.

Is there a bias here? With AWS having close to 50% market share of the entire cloud market, the market will be full of people with AWS skills, followed closely by those with Azure skills (given the predominance of Microsoft environments for e.g. Active Directory, email etc in government). Departments will look at their existing staff, or that of their suppliers, or who they can recruit, and pick their strategy based on the available talent.

Departments will also look at Gartner, or Forrester, and see who is in the lead. They will talk to a range of supplier partners and see who is using what. They will consult their peers and see who is doing what.

But there’s no bias against, or for, any given supplier. We can see that when we read about companies who have been hauled over the coals by one department and the very next week they get a new contract from a different department. Don’t read conspiracy into anything government ever does; it’s far more likely to be cockup.

Is there a revolving door?

People come into government from the outside world and people leave government to go to the outside world. In the mid-2000s there was a large influx of very senior Accenture people joining government; did Accenture benefit? If anything, they probably lost out as the newcomers were overcautious rather than overzealous.

Government departments don’t choose a provider because a former colleague or Cabinet Office power broker is employed by the supplier. As anywhere, relationships persist for a period – not as long as you would think – and so some suppliers are better able to inform potential customers of the range of their offer, but this is not a simple relationship. Some people are well liked, some are well respected and some are neither. There are 17,000 people in government IT. They all play a role. Some will stay, some will go. Some make decisions, some don’t.

Also, a bid informed by a former colleague could be better written than one uninformed. This advantage doesn’t last beyond a few weeks. I’ve worked on a lot of bids (both as buyer and seller) and I’m still amazed how many suppliers fail to answer the question, don’t address the scoring criteria, or waffle away beyond the word count. If you’ve been a buyer, you will likely be able to teach a supplier how to write a bid; but there are any number of people who can do that,

There is little in the way of inside information about what government is or isn’t doing or what its strategy will look like. Spend a couple of hours with an architect or bid manager in any Systems Integrator that has worked for several departments and you will know as much about government IT strategy as anyone on the inside.

Do costs escalate (and are suppliers lowballing)?

Once a contract is signed, and proved to be working, it would be unusual if more work was not put through that same contract.

What’s different about cloud is mostly a function of the sift from capex to opex. Servers largely sit there and rust. The cost is the cost. Maybe they’re 10% used for most of their lives, with occasional higher spikes. But the cost for them doesn’t change. Any fluctuations in power are wrapped into a giant overhead number that isn’t probed too closely.

Cloud environments consume cash all the time though. Spin up a server and forget to spin it down and it will cost you money. Fire up more capacity than you need, and it will cost you money. Set up a development environment for a project and, when the project start is delayed by governance questions, don’t spin it down, and it will cost you money. Plan for more capacity than you needed and don’t dynamically adjust it, and it will cost you money. Need some more security, that’s extra? Different products, that’s more as well. If you don’t know what you need when you set out, it will certainly cost more than you expected when you’re done.

Many departments will have woken up to this new cost model when they received their first bill and it was 3x or 5x what they expected. Cost disciplines will then have been imposed, probably unsuccessfully. Over time, these will be improving, but there are still going to be plenty of cases of sticker shock, both for new and existing cloud customers, I’m sure.

But if the service is working, more projects will be put through the same vehicle, sometimes with additional procurement checks, sometimes without. The Inland Revenue’s original contract with EDS was valued, in 1992, at some £200m/year. 10 years later it was £400m and not long after that, with the addition of HMCE (to form HMRC), and the transition to CapGemini, it was easily £1bn.

Did EDS lowball the cost? Probably. And it probably hurt them for a while until new business began to flow through the contract – in 1992, the IR did not have a position on Internet services, but as it began to add them in the late 90s, its costs would have gone up, without offsetting reductions elsewhere.

Do suppliers lowball the cost today? Far less so, because the old adage “price it low and make it up on change control” is difficult to pull off now and with unit costs available and many services or goods being bought at a unit cost rate, it would be difficult to pull the wool over the eyes of a buyer.

Is tax paid part of the evaluation?

For thirty years until the cloud came along, most big departments relied on their outsourced suppliers to handle technology – they bought servers, cabled them up, deployed products, patched them (sometimes) and fed and watered them. Many costs were capitalised and nearly everything was bought through a managed services deal because VAT could be reclaimed that way.

Existing contracts were used because it avoided new procurements and ensured that there was “one throat to choke”, i.e. one supplier on the hook for any problems. Most of these technology suppliers were (and are) based outside of the UK and their tax affairs are not considered in the evaluation of their offers.

HMRC, some will recall, did a deal with a property company registered in Bermuda, called Mapeley, that doesn’t pay tax in the UK.

Tax just isn’t part of the evaluation, for any kind of contract. Supplier finances are – that is, the ability of a company to scale to support a government customer, or to withstand the loss of a large customer.

Is 1/3rd of government information stored in AWS?

No. Next question.

IaaS expenditure is perhaps £10-12m/month (through end of 2018). Total government IT spend, as I’ve covered here before, is somewhere between £7bn and £14bn/year. In the early days of the Crown Hosting business case, hosting costs were reckoned to be up to 25% of that cost. Some 70% of the spend is “keep the lights on” for existing systems.

Most government data is still stored on servers and storage owner by government or its integrators and sits in data centres, some owned by government, but most owned by those integrators. Web front ends, email, development and test environments are increasingly moving to the cloud, but the real data is still a long way from being cloud ready.

Are 80% of contracts won by large providers?

Historically, no. UKcloud revenues over the life of G-Cloud are £86m with AWS at around £63m (through end of 2018). AWS’ share is plainly growing fast though – because of skills in the marketplace, independent views of the range of products and supportability, and because of price.

Momentum suggests that existing contracts will get larger and it will be harder (and harder) for contracts to move between providers, because of the risk of disruption during transition, the lack of skill and the difficulty of making a benefits case for incurring the cost of transition when the savings probably won’t offset that cost.

So what should we do?

It’s easy to say “nothing.” Government doesn’t pick winners and has rarely been successful in trying to skew the market. The cloud market is still new, but growing fast, and it’s hard to say whether today’s winners will still be there tomorrow.

G-Cloud contracts last only two years and, in theory, there is an opportunity to recompete then – see what’s new in the market, explore new pricing options and transition to the new best in class (or Most Economically Advantageous Tender as it’s known)

But transition is hard, as I wrote here in March 2014. And see this one, talking about mobile phones, from 2009 (with excerpts from a 2003 piece). If services aren’t designed to transition, then it’s unlikely to ever happen.

That suggests that we, as government customers, should:

1) Consciously design services to be portable, recognising that will likely increase costs up front (which will make the business case harder to get through), but that future payback could offset those costs; if the supplier knows you can’t transition, you’re in a worse position than if you have choices

2) Build tools and capabilities that support multiple cloud environments so that we can pick the right cloud for the problem we are trying to solve. If you have all of your workloads in one supplier and in one region, you are at risk if there is a problem there, be it fat fingers or a lightning strike.

3) Train our existing teams and keep them up to date with new technologies and services. Encourage them to be curious about what else is out there. Of course they will be more valuable to others, including cloud companies, when you do this, but that’s a fact of life. You will lose people (to other departments and to suppliers) and also gain people (from other departments and from suppliers).

And, as government suppliers, we should:

1) Recognise that big players exist in big markets and that special treatment is rarely available. They may not pay tax in this jurisdiction, but that’s a matter for law, not procurement. They may hire people from government; you have already done the same and you will continue to look out for the opportunity. Don’t bleat, compete.

2) Go where the big players aren’t going. Offer more, for less, or at least for the same. Provide products that compound your customers investment – they’re no longer buying assets for capex, but they will want increased benefit for their spend, so offer new things.

3) Move up the stack. IaaS was always going to be a tough business to compete in. WIth big players able to sweat their assets 24/7, anyone not able to swap workloads between regions and attract customers from multiple sectors that can better overlap peak workloads, is going to struggle. So don’t go there, go where the bigger opportunities are. Government departments aren’t often buying dropbox, so what’s your equivalent for instance?

But, don’t

1) Expect government to intervene and give you preferential treatment because you are small and in the UK. Expect such preferential treatment if you have a better product, at a better price that gets closest to solving the specific problem that the customer has.

2) Expect government to break up a bigger business, or change its structure so that you can better compete. It might happen, sure, but your servers will have long since rusted away by the time that happens.

Laws From Before (the Internet)

Years ago, I spent a happy three years living in Paris. I’d moved there via Germany, then Austria. I didn’t take much with me and the one thing I was happiest to leave behind was my TV. I didn’t own a TV for perhaps a decade.

Each European country I lived in had some quirky laws – that’s quirky when compared with the UK equivalents. For instance, shops in Vienna closed at lunchtime on Saturday and didn’t open on Sunday. The one exception was a store that mostly sold CDs and DVDs, right near the Hofburg (the old royal palace) that had apparently earned the right to stay open, when it sold milk and other essentials, direct to the royal family. It seemed that the law protected that right, even though there was no royal family and it didn’t sell milk.

I was perhaps not surprised to read recently that there are plenty of anachronistic laws covering French TV. For instance

  • National broadcasters can’t show films on Wednesday, Friday or Saturday
  • Those same broadcasters also can’t run ads for books, movies or sales at retailers
  • And they’re not allowed to focus any ads they do show on particular locations or demographics

The French government is considering changing these laws, but not until the end of 2020. Plainly the restrictions don’t apply to Youtube, Netflix or Amazon Prime. Netflix, alone, has 5m users in France. TV is struggling already; and it’s even more hobbled with such laws.

There are, of course, plenty of other more important issues going on that demand the attention of any country’s executive, and so perhaps it’s not a surprise that, even in 2019, laws such as these exist.

But in the digital world where, for instance, in the UK, we legislated for digital signatures to be valid as far back as 2000, it’s interesting to look at the barriers that other countries have in place, for historical reasons, to making progress in the next decade.

Balancing Capex and Opex

Government has a cash problem. It simply doesn’t have enough cash allocated to running costs for IT. Projects that were traditionally funded out of capital are, in the cloud world, funded out of operating budgets. This is going to hurt.

For many years, IT projects have been funded by capex (capital expenditure). Whatever came out of the project – servers, software licences, code, automation tools etc – sat on the balance sheet and was depreciated over an agreed period. Usually, for software, it was thought to be too long a period, but given that many of our systems are still working 20, 30 or even 40 years after launch, and so long since depreciated to zero, we clearly under-estimated the longevity of code. Similarly, we probably over-estimated the life of laptops and mobile phones where 5-7 years depreciation is common, but they have quickly become replaceable after 2 or maybe 3 years.

With the move to cloud, the entire infrastructure base switch from capex to opex – that is, it’s funded out of day to day expenses and nothing is held on the balance sheet. Millions of pounds of servers (and all the switches, routers and other kit associated with them, as well as some software, where SaaS products are used) left the balance sheet.

Governments tend to be capital rich – there are few departments who complain about not having enough capex. Capex buys actual things – in IT terms, servers with flashing lights and spinning disks that can be looked at, making the spend tangible (hence the use of tangible and intangible assets for different kinds of IT assets).

This has created a challenge for some departments who want to spend their capital, but also want to move to the cloud. There was a similar challenge early in the cloud era when VAT was not recoverable, putting further pressure on strained opex budgets.

I’m seeing a change though, now, where even software development is run as an opex project – on the basis that the code is expected to turn over rapidly and be replaced through an iterative agile approach. If a project goes wrong – at a micro or macro level – there’s no write-off (which can be important to some). At the same time, treating everything as opex means that, in some cases, there’s a building soon to be legacy code base (becuase it’s a fallacy to think that this code is iterated and replaced regularly) that is going unmaintained, meaning that there’s ever more spaghetti code that isn’t being looked at or tweaked. Knowledge of that code base is held by a smaller and smaller set of people … and changes to it become more difficult as a result.

It’s a strange move – one that perhaps implies that there is less scrutiny over opex spend, or that the systems being built will not be in use for the long term and so don’t quite count as assets. But IT systems have a habit of surprising us and sticking around for far longer than expected – ask the developers, if you can find them, of the big systems that pay benefits, collect tax, monitor imports, check passports at border etc what the expected life of their system was when they built it and the answer will never (ever) be “oh, decades.”

That’s not to say that there isn’t a case for classifying some IT spend as opex. If you are a fast moving startup building products for a new market and striving to reach product/market fit, you might be crazy to think that it was worth having IT on the balance sheet. If you know that you are building a prototype and will throw it away in a few weeks or months, it would, again, be crazy to capitalise it. If you’re doing R&D work and you’re not sure what will come out of it, you might well classify it as opex initially and revisit later to see if assets were created and then re-classify it.

I suspect that the tensions between capex and opex in government still have more room to play out

The Legacy Replacement Caveat

Yesterday I wrote about the difficulty of replacing existing systems, the challenges of meshing waterfall and agile (with reference to a currently running project) and proposed some options that could help move work forward. There is, though, one big caveat.

Some legacy systems are purely “of the moment” – they process a transaction and then, apart from for reporting or audit reasons, forget about it and move on to the next transaction.

But some, perhaps the majority, need to keep hold of that transaction and carry out actions far into the future. For instance:

– A student loan survives over 30 years (unless paid back early). The system needs to know the policy conditions under which that loan was made (interest rate, repayment terms, amount paid back to date, balance outstanding etc)

– Payments made to a farmer under Environmental Stewardship rules can extend up to a decade – the system retains what work has been agreed, how much will be paid (and when) and what the inspection regime looks like

In the latter case, the system that handles these payments (originally for Defra, then for Natural England and now, I believe, for the RPA) is called Genesis. It had a troubled existence but as of 2008 was working very well. The rules for the schemes that the system supports are set every 7 years by the EU; they are complicated and whilst there is early sight of the kind of changes that will be made, the final rules, and the precise implementation of them, only become clear close to the launch date.

Some years ago, in the run up to the next 7 year review, GDS took on the task, working with the RPA, of replacing Genesis by bundling it with the other (far larger in aggregate, but simpler in rules and shorter in duration) payments made by the RPA. As a result, Defra took the costs of running Genesis out of its budget from the new launch date (again, set by the EU and planned years in advance). Those with a long memory will remember how the launch of the RPA schemes, in the mid-2000s, went horribly wrong with many delays and a large fine levied by the EU on the UK.

The trouble was, the plan was to provide for the new rules. Not the old ones. An agreement could be made with a farmer a week before the new rules were in place, and that agreement would survive for 10 years – and so the new system would have to inherit the old agreements and keep paying. Well, new agreements could have been stopped ahead of a transition to the new system you might say. And, sure, that’s right – but an agreement made a year before would still have 9 years to go; one made 2 years before would have 8 years to go. On being told about this, GDS stripped out the Genesis functionality from the scope of the new system, and so Genesis continues to run, processing new agreements, also with 10 year lives … and one day it will have to be replaced, by which time it will be knocking on 20 years old.

Those with good memories will also know that the new system also had its troubles, with many of the vaunted improvements not working, payments delayed, and manual processes put in place to compensate. And, of course, Defra is carrying the running costs of the old system as well as the new one, and not getting quite the anticipated benefits.

IT is hard. Always has been. It’s just that the stakes are often higher now.

When replacing legacy systems where the transactions have a long life, sometimes there is a pure data migration (as there might be, say, for people arriving in the UK where what’s important is the data describing the route that they took, their personal details and any observations – all of which could be moved from an old system to a new system and read by that new system, even if it collected additional data or carried out differnt processing from the old system). But sometimes, as described above, there’s a need for the new system to inherit historic transactions – not just the data, but the rules and the process(es) by which those transactions are administered.

My sense is that this is the one of the two main reasons why legacy systems survive (the other, by the by, is the tangled, even Gordian, knot of data exchanges, interfaces and connections to other systems).

There are still options, but none are easy:

– Can the new system be made flexible enough to handle old rules and new rules, without compromising the benefits that would accure from having a completely new system and new processes?

– Can the transactions be migrated and adjusted to reflect the new rules, without breaching legal (or other) obligations?

– Can the old system be maintained, and held static, with the portfolio of transactions it contains run down, perhaps with an acceleration from making new agreements with individuals or businesses under the new rules? This might involve “buying” people out of the old contracts, a little like those who choose to swap their defined benefit pension for a defined contribution deal, in return for a lump sum.

– Can a new version of the old system be created, in a modern way, that will allow it to run much more cheaply, perhaps on modern infrastructure, but also with modern code? This could help shave costs from the original system and keep it alive long enough for a safe transition to happen.

Some of these will work; some won’t. The important thing is to be eyes open about what you are trying to replace and recognise that when you reach from the front end into the back end, things get much harder and you forget that at your peril. Put another way, Discovery is not just about how it should be, but about how it was and how it is … so you can be sure you’re not missing anything.

September Summary

I was away for the first few days of September so posted some pictures of what I’ve come to call Deergital Transformation, including this one:

Male fallow deer, aka bucks, a few days after losing their antlers

For much of the rest of the month I looked at the struggle to deliver projects, particularly ones that we sometimes mislabel as transformational, and how we might think about those in different ways:

  • We tend to approach projects as if they are always going to be successful. We go all in, often on giant projects. And yet real world experience in films (2% of films made get to the cinema and only 1/3 of those are profitable)
  • Similarly, Venture Capital companies know that they are going to kiss a lot of frogs before they find their prince or princess. They back new companies in rounds – seed, series A, series B etc – putting in more money as the principles are proven and the company moves from concept to demo to beta to live and to scale. Bad bets are starved of funds, or “pivoted” where the team is backed to do something different.
  • We, all of us, are quick to suggest numbers – a project will cost £100m, or it will take 48 months, or it will save £1bn – but we are rarely open about the assumptions, and, yes, the pure and simple Wild Assed Guesses. In short, all numbers are made up, treat them with caution unless the rationale is published.
  • We all like to set targets, but we don’t always think about the things that have to be done to achieve that goal. By 2040 “we will climb Everest” is fine as an aim, but the extraordinary preparatory work to achieve it needs to be laid out, to avoid the “hockey stick” problem where you get close to the date when you expected to realise the aim, only to find there’s not enough time left. As a regular half and full marathon runner, I know that if I haven’t put the time in before the race, it’s going to hurt and I’m going to let myself down.
  • Replacing legacy systems is hard. The typical transformational project when we take what we have had for the last 20+ years and replace it with something new, and add lots more functionality (to catch up with all of the things that we haven’t been able to do for the last couple of decades) is fraught with risk and rarely pays off. The typical agile model of MVP and rapid iteration doesn’t always align with the policy aspiration, or what the users want, because, on day one, they get less than they have today. New models are needed though, really, they’re old models.

October has started on much the same path, though let’s hope that the real storms seen at the end of last month have gone and that the only October storms are of the digital kind.

Storm over Devon, September 28th

Dealing With Legacy Systems

Legacy, that is, systems that work (and have worked for a couple of decades or longer in many cases) both do the lion’s share of the transactional work in government, but also hold back the realisation of many policy aspirations.

Our legacy systems are so entwined in our overall architecture with dozens (even hundreds) of interfaces and connections, and complicated code bases that few understand, that changes are carefully handled and shephered through a rigorous process whenever work needs to be done. We’ve seen what goes wrong when this isn’t handled with the utmost care. There were the problems at the TSB, or at Natwest, RBS and Tesco Bank, for instance.

The big problem we are facing looks like this:

Our policy teams, and indeed our IT teams, have much bigger aspirations for what could be achieved than the current capability of systems.

We want to replace those systems, but trying to deliver eveything that we can do today, as well as even more capability, is a high risk, big bang strategy. We’ve seen what goes wrong when we try to do everything in a single enormous project, whether that be the Emergency Services Network, the e-Borders programme, Universal Credit etc.

But we also know that the agile, iterative approach, results in us getting very much less than we have today with the promise that we will get more over future releases, though the delivery timetable could stretch out some time with some uncertainty.

The agile approach is an easy sell if you aren’t replacing anything that exists today. Monzo, the challenger bank, for instance, launched with a pre-paid debit card and then worked to add current accounts and other products. It didn’t try and open a full bank on day one – current accounts, debit cards, credit cards, loans, mortgages etc would have taken years to deliver, absorbed a fortune and delayed any chance to test the product in the market.

It’s not, then, either/or but somehow both/and. How do we deliver more policy capability whilst replacing some of what we have and do so at a risk that is low enough (or manageable enough) to make for a good chance of success?

Here’s a slide that I put up at a conference in 2000 that looked at some ways that we might achieve that. I think there’s something here still – where we build (at the left hand end), some thin horizontal layers that hide the complexity of government … and on the right hand side we build some narrow, top to bottom capabilities and gradually build those out.

It’s certainly not easy. But it’s a way ahead.

Computer Says No

The FT has a front page story today saying that Ulster Bank is absorbing the cost of negative interest rates (on money it has deposited at the ECB) because its systems can’t handle a minus sign. Doubtless whoever wrote the code, maybe in the 80s, never thought rates would fall below zero.

We had a similar problem at a bank in the 90s when our COBOL based general ledger couldn’t handle the number of zeros in the Turkish lita; we wrote to their central bank and PM to see if they wouldn’t mind looping a couple off so that we could continue to process transactions. History does not record the answer, but I suspect there came none.

Legacy systems were in the news in government IT this week as it was stated that there was no central register of such systems, that they are blocking data sharing and that there’s no plan to move off them. GDS, says Alison Pritchard, the interim leader, will be looking for money in the next spending review to deal with the problem.

This is, of course, an admirable aim. The trouble is, departments have been trying to deal with these systems for two decades – borders, immigration, farm payments, student loans, benefits, PAYE, customs etc all sit on systems coded in the 70s, 80s and early 90s. Legacy aka stuff that works. Just not the way we need it to work now.

Every department can point at one, and sometimes several, attempts to get off these systems … and yet the success rate is poor. Otherwise why would they still be around?

The agile world does not lend itself well to legacy replacement. Few businesses would accept the idea that their fully functional system would be replaced in a year or two with a less functional MVP. What would make the grade? How would everything else be handled? Could you run both in sync?

In the early 2000s a few of us tried to convince departments to adopt an “Egg” model and build a new business inside the existing business – one that was purely internet facing and that would have less capability than the existing systems but that would grow fast. Once someone (business or person) was inside the system, we would support them in that new system, whatever it took – but it would be a one way ticket. We would gradually migrate everyone into that system, adding functionality and moving ever more complicated customers as the capability grew.

It’s a challenging strategy. It would have been easier in the 2000s. Harder now. Much harder. But possible. With commitment. And a lot of planning.

Government Gateway At Nearly 19

Work on the new new Government Gateway started this time nearly 19 years ago. Here’s a picture from July 2000 showing how we thought it might all work – at the time the Inland Revenue was looking to extend an existing EDIFACT solution (the EDS EbX solution on the right). From the beginning the plan was to join everything up and become the traffic director for all transactions to and from government.

One of the oft-told stories of the development of the Government Gateway is that it took the team only 90 days, from flash to bang, to put the first version live (our MVP if you will). Remember that this was in 2000/2001, when servers had to be bought, installed and cabled up. When code was deployed on actual spinning disks that you could look at. When architects laboured in data centres, working long hours to make everything work.

Here’s another slide, from the same time, showing how we thought the Gateway would handle Self Assessment. Note the “*” in the bottom right that, again, recognises that the “app” (as we would call it today) could be from anyone.

It’s roughly true. There had been an earlier, failed attempt at delivering a Government Gateway, with a contract let by Cabinet Office. There was then a period when a signed entitled “Under New Management” hung on the office door (actually, in the Inland Revenue’s Bush House office) and, with the IR providing funding for a replacement, we went looking for a supplier who could deliver what we wanted. We knocked on a lot of doors and were mostly laughed at: our ambition was too great, no products existed that could do what we wanted, we should stick to email and send forms back and forth and so on.

We landed on Microsoft at about this time of year in 2000. Lots of people had to get involved in governing whether it could go ahead – all the way to Bill Gates at their end and all the way to the Minister of the Cabinet Office at our end. We picked the live date, 25th January 2001, largely because the MCO was Ian McCartney and we thought Burns Night was appropriate. For a month or so the project was even called “Caledonia.” Before that it had been called “Shark” on the basis that, to meet the timeline, it would need to keep moving and never sleep.

The live date was not entirely arbitrary – we were working back from needing to have PAYE live on April 6th 2001, and we knew we needed to launch the first part (registration and enrolment) by the end of January so as to give us time for the next release, the transaction engine which would process the tax forms.

And then, sometime in October, we got the go-ahead, after an independent OGC Gateway review by Andrew Pinder (who, at the time, was not the e-Envoy and who was not even working in government more widely).

Here’s what the homepage looked like when it was launched, on time and on budget, in January 2001.

I’m not writing about this for nostalgic reasons though, I’m writing because I’ve just seen another project launch in UK government that plans to take data from third party software packages and websites and process/transform (in the technical sense) them so that they can be handled in new, yet to be built government systems.

That’s what the Gateway was built to do. And it still does it, nearly 20 years later, for every PAYE form that is sent to Government. Until a few weeks ago, it did it for every VAT form too, though HMRC appears to have gone back to CSV files, abandoning the great work on GovTalk done by others in the Office of the e-Envoy when the Gateway was still a sketch on a piece of paper.

We are in some kind of endless loop where we keep building what’s already been built and proven, “because we’re special” or because “it doesn’t quite meet our needs” or “because it’s not open source” or “because we don’t want to be beholden to a supplier” … and so we don’t make any substantive progress or break any new ground. It’s a stairway to nowhere.

Digital Challenges

Over the last year I’ve worked in various different places, public and private, reviewing projects.

The most common line I’ve distilled from dozens of conversations is

“We are X years into our Y years transformation”

Which is greater, X or Y?

Obviously, X should be smaller than Y. Eg we are 1 year into our 3 year programme to digitise (whatever we are digitising).

Sadly, the opposite has been true in every case. We are 2 years into our 1 year plan to make change happen. Or even 3 years into our 1 year plan to turn the world on its head.

Things take longer than expected. Invalid assumptions are made. The specific outweighs the general. Every time.