4 min read

Coinbase thought it could failover. It didn't count on bug in its Amazon MSK deployment.

Crypto exchange insists "we had an appropriate RF to survive a zone outage" as an AWS incident knocks it down, again.
Coinbase thought it could failover. It didn't count on bug in its Amazon MSK deployment.
Photo by William Warby on Unsplash

The timing was rather unfortunate. Just days after CEO Brian Armstrong spoke of non-technical teams "shipping production code", Coinbase went down, with major disruptions to trade for some five hours.

But it was neither vibe-coded foundations nor entirely an incident in a single availability zone (AZ) in AWS's most storied region, US-EAST-1, that took Coinbase offline for hours – even though the trouble started in an AWS data centre. Rather, it was a perfect storm of issues.

Among them: Coinbase failed to fail over smoothly because of a bug in its deployment of Amazon Managed Streaming for Apache Kafka, better known as AWS MSK, Coinbase's head of platform, Rob Witoff, told The Stack this week.

This post is for subscribers only