How a 1-in-3 BFT bug led me to wall-clock-bucketed DAG rounds

1 / 2

How a 1-in-3 BFT bug led me to wall-clock-bucketed DAG rounds

DEV Community·Andrea Cadamuro·19 days ago

#srAUfBy8

#rust #distributedsystems #round #consensus #group #validator

Reading 0:00

15s threshold

About a year ago, the consensus runtime I'd been building started doing something annoying. The setup was straightforward: a Tendermint-style chained BFT with five masternodes finalising blocks proposed by a rotating set of lightnodes, partitioned into committees of 5-10 nodes each (we call them "groups"). The design was textbook. The implementation worked fine on a single machine, fine on two machines in the same datacenter, fine on three machines across two regions. Then we put it on a real testbed — four VMs across three geographic regions (US-East, EU-Central, EU-North), 26 masternodes, 115 lightnodes — and started pushing realistic load through it. About 10³ transactions per second, distributed across four RPC endpoints, sustained. And about every third group-formation transition, the BFT certificate would stall.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

How a 1-in-3 BFT bug led me to wall-clock-bucketed DAG rounds