Topics

Proposed policy: changes to the consensus rules / "hard forks"

Mike Hearn
 

Introduction

(a PDF of this email is attached, in case of formatting problems please read that instead)

This email deals with a topic that invariably makes everyone angry regardless of their positions, which is why I'm the sucker who volunteered to write it. As always, we're having all these discussions in the open, so James is seeing this email at the same time everyone else is :)

We will discuss a very specific and particular type of change to Corda, so it's worth reviewing what that change is. A "consensus rule" is a rule that everyone reading a block chain must follow in order to arrive at the same result. Typical blockchain systems have many consensus rules that cover things like how data is serialised, the structure of a transaction, how contract logic is executed and so on. Essentially anything you can find inside a transaction message will fall under consensus rules. But most changes to the software and protocol don't change consensus rules - it's OK for nodes to diverge in those ways. For example, none of the following changes would affect consensus rules:

  1. Changes to the transport layers, like upgrades to the flow session protocol
  2. Changes to flows, whether platform or app level
  3. Any change to app-level code
  4. Changes to how a node is implemented, how it's managed, how it's deployed or operated and so on.
Consensus rules are about things everyone needs to agree on or else nobody can use the new rule. They are things that in more traditional blockchain systems trigger "hard forks". Here are some examples of things that we've added in Corda 4 that change the consensus rules:

  1. Reference states
  2. Recording the network parameters that govern a transaction
  3. Signature constraints
All of these extend the transaction format. Because transactions can travel around the network in arbitrary and unpredictable ways (most typically, due to being combined with liquid tokens), using a new feature requires everyone to upgrade. If that wasn't the case then flows would appear to start randomly failing, due to a transaction in the dependency graph using a new feature that a counterparty didn't yet understand. This can happen even if none of the parties to a trade use that feature themselves.

Corda doesn't have a block chain so the concept of a hard fork doesn't directly apply. Instead it has minPlatformVersion - a variable in the network parameters, which indicates the version of the protocol all nodes are expected to understand. A node that is too old to satisfy the minPlatformVersion will shut down and refuse to start up again.

Proposed policy

The minPlatformVersion network parameter will be set to reflect a new platform version no more than 15 months have passed since its release in open source Corda. All changes to the min required version are network parameter changes and thus require a vote by the Foundation board. The board may opt to increment earlier if the community is ready for it and demand for the new features exists, or it may choose to void this policy and keep the network on older versions indefinitely.

Regardless of what deadline is eventually used, the Foundation will endeavour to encourage all users to upgrade to new versions as quickly as feasible in order to unblock developers that wish to use the new features on the Corda Network. At the same time, avoidance of business disruption will be considered paramount: there should be "no app left behind".

After a min platform version is bumped, notaries may begin insisting that new transactions use the new features.

Due to the importance of the new features in Corda 4 specifically (in particular signature constraints), the Foundation will be aiming to complete the global upgrade to v4 within 6 months of the release of a Corda Enterprise version that supports the new platform version.

Rationale

Background

In the nearly 10 year long history of blockchain technology, there are no major projects that have successfully navigated network-wide changes to their consensus rules. Instead such changes have led to splits of the networks and communities, acrimonious fights, losses of money and even death threats. I trust that our community will do a better job!

It's a tricky and sensitive topic because the theoretical ideal for any software project is that you build it, it works, then you move on to something else. Meanwhile happy users remain on the system forever. Changing business requirements, technology obsolescence and many other things work against this ideal. Consensus rule changes are an especially tricky example of this, because they can involve local cost for global gain. I've put some thoughts on incentive alignment below.

We're under no illusions about how hard it will be to build a global network of businesses that stay in sync with each other's upgrades. Large companies are famously slow at upgrading to new versions of any platform, even business critical platforms like Windows. This is OK - the purpose of this policy is to get us through the early years of Corda's life and allow us to finish off the parts of the core protocol that are still missing. It's not intended that upgrades are routinely forced by policy - working with stakeholders to ensure good outcomes on acceptable timelines for everyone will be a major part of the Foundation's and R3's mission.

If Corda keeps growing, then eventually the network will become so large that no amount of carrots or sticks will work to get people upgraded. Changing the protocol will have become so hard that people won't bother proposing upgrades anymore, because the rollout time will pass the planning horizon of any reasonable organisation. At this point the core consensus rules will have ossified completely, although as an optimist I plan to use the word "done" rather than "ossified". Like any stagnant technology it will eventually be outcompeted and die, as will whatever rises to replace it. That's OK: we all understand the circle of IT life. But it's OK only if it only happens because Corda has become very large and successful. If the protocol ossifies immediately because early adopters refuse to ever upgrade, it would represent a missed opportunity for everyone, and - given the limitations of the current protocol - an ongoing operational and development burden.

This tightrope is one we will all walk together. The success or failure of the Corda Network as a concept will ultimately depend on hundreds of individuals making intuitive decisions about the costs and benefits of upgrades.

Separability of upgrades

It's important to note that platform versions are not node versions. Today these things are linked: Corda OS/Enterprise 3 implements platform version 3, Corda OS/ENT 4 implements platform version 4. However that's not required. In theory there's no reason that new protocol features can't be backported to older versions. That is, Corda 3 could theoretically implement platform version 4. And in theory Corda 5 could still implement platform version 4, if it didn't add any new APIs or protocol features.

In practice this is unlikely because Corda currently combines the notion of API versions with the notion of consensus rule versions: apps say "I need platform version 4" and so does a compatibility zone. This is because invariably new protocol features imply new APIs for apps to use them. This complicates the notion of backports but doesn't make them impossible, just messy from a versioning perspective. The costs/benefits of doing this is not a concern of this policy document, which deals only with abstract "platform versions" unlinked to any particular release of any particular program. It's up to R3 to figure out the costs and benefits of backports with its own customers.

Ideally such backports will never be needed, because new versions will always be better than old versions for everyone (zero regressions). That's why we're working so hard on ....

Upgradeability

A key part of ensuring the consensus rules can evolve for a while longer is making upgrades as smooth as possible. This is hard!

Over time, upgrades between Corda versions have become easier and easier. In the beginning every monthly milestone changed the API in fundamental ways. Over the past three major releases, we've committed to backwards compatibility for more and more of the API and protocol. Corda 4 continues in this vein by introducing the notion of "target versioning", which allows apps to declare which version of the node they were tested against. Target versioning is used to enable or disable backwards compatibility modes and allow us to fix semantic bugs in the APIs without breaking old software. It also adds a new wire transaction field that records the network parameters that govern that particular transaction.

R3 takes versioning and compatibility seriously. We have an entire staffed team who focus on nothing but versioning and modules, with a particular focus on how user-created data and code evolve. We've also been bulking up our testing infrastructure throughout 2018. We now have regular nightly stress tests and performance measurements, so performance regressions are found and fixed quickly. We have been developing chaos monkey tests, compatibility/version skew tests and more. But there's still a long way to reach the utopia of seamless rock solid upgrades. Beyond general improvements to testing, we are also working on running popular CorDapps in our own CI system, so regressions that might break apps are identified as soon as the change is made during development. 

A reasonable person may ask, are totally trustworthy upgrades even feasible at all? We know that rock solid upgrades for complex platforms are possible because the Chrome team have managed to do this for the web. Chrome upgrades so smoothly and regularly that the vast majority of users have no idea what version they're on, and regressions are practically unheard of. This is despite the web platform being almost 30 years old and unfathomably complicated. When Chrome does break websites, it's always intentional and announced a long way in advance. So the investment in testing this requires is enormous, but it is feasible.

Timeframes

The initial timeframe of 15 months was selected because R3 supports major versions of Corda Enterprise for a year, but the start period is defined by the release of Corda open source (that being the reference implementation of the platform). This means there's a 3 month window after an OS release in which R3 can get an equivalent upgrade of CE out into the market, which should be sufficient.

15 months is quite a long period for developers to sit on their hands and not use new features. If people planned to use a new data model feature in their app, and then discover they can't deploy because tCN is forbidding the usage of it due to people running old versions, this is a potential source of problems and friction. The network operator in concert with the Foundation will act as a coordinator to bring together all the stakeholders together. The hope is that developers who use new features will create apps that act as upgrade carrots, and the pressure to upgrade will be propagated through trading relationships (e.g. if MiniCorp is on an old version, and MegaCorp wishes to deploy an app that requires a new consensus rule, then MegaCorp should politely request MiniCorp to upgrade).

Corda 4 is special and we ambitiously aim to get everyone upgraded in only six months. This is because there are critical improvements we believe everyone will need and want in 4, and because the network will be small at the start so there are fewer people to coordinate:

  • Without signature constraints, publishing new versions of an app is either (a) not possible without a convoluted explicit upgrade process, if using hash constraints or (b) requires a flag day and network parameter change to alter the zone whitelist, if using zone whitelist constraints. Both approaches are painful. Signature constraints are the right way to handle app upgrades without the involvement of the zone operator.

  • There are data model security improvements in 4 which block certain kinds of attacks malicious parties could mount.

  • Reference states are a fundamental data model upgrade that make many kinds of apps much easier to build, and we anticipate great demand to use it.

Thus 4 is not just about developer convenience but also about security.

Flexibility

The goal of doing Corda is doing business. So whilst there must be a way to stop one straggler holding an entire network hostage, the goal is not to mindlessly enforce policy just for the sake of it. All stakeholders will be considered as individuals, every special case taken into account.

In particular if upgrading on time is difficult due to bugs or inadequacies in Corda releases, then R3 will advocate to the board for deadline extensions. Upgrades should not require app changes as long as apps have followed some basic rules like not using internal or experimental APIs (enforcement of these rules will get stricter over time, as the versions&modules team makes progress). So their cost is intended to be predictable and bounded, to whatever level of testing each node operator feels is necessary.

Foundation control

Network parameter changes are controlled by the Foundation board, not R3. Thus R3 has no way to enforce any particular upgrade timeline, it can only use its own votes and evangelise the benefits of upgrades. This means all users of the Corda Network have a stake in when upgrades happen.

This is OK because R3's business model is about selling enterprise software and support, and the deals we're cutting often have a recurring revenue component to them. Therefore there's little immediate short-term financial risk if an upgrade takes longer than anticipated. The costs we bear are the same indirect costs all Corda users bear, of the platform being less capable, less attractive to developers and so on.

Ultimately we want all incentives to be aligned:

  • Developers want new versions because they make building apps easier.
  • Node operators (IT departments + individuals in consumer Corda) want new versions because they make running the node easier, and more secure.
  • R3 want new versions because they increase the value of Corda and open new markets.
  • Business network operators want new versions because they want the ability to interop with other BNs that use apps built for newer versions (and may also improve their own operations).

Potential for opt-outs

It's possible that future versions of Corda will provide ways to locally override the zone's choices, as is already possible for the network map. This would allow a user to stop their node shutting down, even if a flag day has not been acceped. If this is done then the operator accepts that they have entered an unsupported configuration and may start to experience internal errors, exceptions and flow deaths at any time. It would be useful for cases where a node is going to just miss the deadline, or where the operators happen to know through additional contextual knowledge that they will never encounter compatibility issues and don't want to or can't upgrade at the right time.

This feature is not implemented and not promised.