THE BYZANTINE FAULT — ATTACKING A DISTRIBUTED SYSTEM

- Falk Borgmann

On the vulnerabilities of distributed systems

In this post and the next, we will cover a number of attacks that take advantage of the Byzantine failure and their effects. Here, the main question is, under which circumstances can an attack be successful in a distributed system?

The theoretical problem behind this model is based on legends about the siege of Constantinople by the Osman Empire during the 15th century. The city was surrounded by various units of the Osman Army, whose local commanders had the challenge of coordinating among themselves the ideal time to begin their attack.

To give you a little spoiler, Byzantine fault tolerance for distributed systems requires more than two thirds of the nodes participating in a cluster to be working correctly at a time for the system to resist an attack. Viewed from the opposite perspective, this means that less than one third of the involved nodes can be malicious. The way this result was determined is not the subject of our current discussion — this is handled in considerable depth many other places.

Our focus is instead on the consequences of this model for the security of blockchain environments. Because according to Byzantine failure theory, just a single successful attack in a cluster with three nodes is sufficient to prevent a successful consensus. And if you’re thinking along the lines of parliamentary democracies in which a two-thirds majority should win against a one-third opposition, then you would be mistaken as the two graphics bellow illustrate.

Figure 1: An intact three-node cluster

Our greatly simplified model contains three nodes in its cluster. To further simplify the example, let’s assume that we are trying to agree on a meeting date at a specific time. The node labeled A sends a message containing “4:30 pm” quasi-simultaneously to the nodes B and C. The nodes B and C in turn confirm to one another and the sender A that they have received the information “4:30 pm.” The transaction is validated, the blockchain remains intact, there is consensus on the meeting.

Figure 2: Three-node cluster after successful one-node attack

In this illustration, node B has been compromised and is under the control of an attacker. We also assume that within the cluster, a simple majority (quorum) — in this case, two nodes — is sufficient to decide whether a piece of information or status in the cluster is defined as correct.

The intact node A sends the message “4:30 pm” to its peers and receives the conformation for “4:30 pm” from both the intact node C as well as the compromised node B. The same happens between nodes A and C. Everything seems fine at first, because now node A has a valid confirmation in its local storage. But the validity of the complete transaction has not yet been established — because it is only valid for the entire cluster when two of the three nodes have both validated and locally manifested the transaction. But that’s only the case for A so far.

C, meanwhile, receives from hacked node B the information that A did not transmit “4:30 pm”, but rather “6:45 pm” instead. Node C thus has two contradicting pieces of information about the content of the message — and will not label the message as validated. The compromised node B of course also does not validate the original message. And thus, despite a two-thirds majority of intact nodes, no consensus is possible. The planned meeting is a bust, the Byzantine attack successful across the entire cluster.

Share