Coinbase thought it could failover. It didn't count on bug in its Amazon MSK deployment.
Crypto exchange insists "we had an appropriate RF to survive a zone outage" as an AWS incident knocks it down, again.
The timing was rather unfortunate. Just days after CEO Brian Armstrong spoke of non-technical teams "shipping production code", Coinbase went down, with major disruptions to trade for some five hours.
But it was neither vibe-coded foundations nor entirely an incident in a single availability zone (AZ) in AWS's most storied region, US-EAST-1, that took Coinbase offline for hours – even though the trouble started in an AWS data centre. Rather, it was a perfect storm of issues.
Among them: Coinbase failed to fail over smoothly because of a bug in its deployment of Amazon Managed Streaming for Apache Kafka, better known as AWS MSK, Coinbase's head of platform, Rob Witoff, told The Stack this week.