Brian Puccio wrote:
Hmm, I thought they took a similar approach Google did and pretty much expected things to fail at the hardware level (e.g., the power supplies) and balanced that out with redundant nodes so what when (not if) something failed in a node, they could fix it, but it didn't have to be fixed right away since there'd still be no down time.
Not quite that simple, you need a good deal of scale to make that kind of system viable in a high-availability environment. Google and Backblaze have it. Linode probably isn't that big.
fukawi2 wrote:
That was a pretty impressive little setup they made, but there are some interesting holes in it. I love the way you have to power up one PSU at a time to prevent overloading the mains

That part is hardly unusual. Even when you're just dealing with a typical 1U server, it's not a good idea to power up everything at once, you're liable to blow the circuit. They're just externalizing logic that is usually contained within the big SAN units.
Stick it on a good smart PDU and stagger the startup per-outlet and you're good to go.
You'd be amazed how much time was spent by people just manually staggering server powerup on loaded circuits before those PDUs became widespread.