pclissold wrote:
Let’s wait for the RFO before calling them morons.
Sorry Peter, I failed to make myself clear. I'm not suggesting that the specific folks responsible for the design at HE are more or less morons due to this one incident; I'm saying that decades of experience with datacentres that (if you believe their owners) are as unsinkable as the Titanic says that they simply aren't that good, and their designers are - as a group - morons.
The mistake that all the big datacentres I've seen make is that there is an opinion that there is economy of scale in power systems, and really there isn't; to get the so-called economy of scale cost efficiencies, sacrifices to system integrity are made, and availability suffers as a result.
At the data centre at my place of work (ok, its not a big facility, its about 400KVA, basically a tier 2 facility with tier 3 power, but only a single genset) the first wednesday of every month the supply to the datacentre is pulled (from upstream distribution, not even in the datacentre building) for an hour or two, just to see what happens.
This little datacentre has suffered from the "economy of scale" problem I mentioned above; it was originally comissioned with 200KVA UPSs with 400KVA infrastructure, but the UPSs were upgraded to 400KVA by parallelling another 200KVA set. Paralleled UPSs are less reliable than single UPSs, so additional risk has been accepted for a lower cost upgrade. Only time will tell if this has a deleterious effect on availability.
When the power protection was needed in anger, twice, (the two big earthquakes that damaged Christchurch in New Zealand), with widespread and prolonged utility outages, the datacentre (and all the IT services) didn't miss a beat.
I'm reasonably convinced the datacentre willl survive a lightning strike to the distribution; it is an anticipated possible event (even though it has never happened historically) and the protection is in place in case it should.
But even this little datacentre which was designed by guys (and a girl!) with many years in the high availability power field, responsible for many facilities in London, it still has unfixed flaws. In the early days there was a flaw (now fixed) which caused the cooling systems to shut down, and for a internal power outage to some systems. Despite the fact that these engineers are really nice people, and seem really competent, and have bags of experience and history under their collective belts, they still made design errors I was seeing twenty years ago.
And that is largely the reason I call this group collectively morons. They aren't learning from history, they are to this day building systems with the same shortcomings that we discovered 20 years ago that we know will lead to outages.
/rant