Thread by @matthewskipsey, Completely unacceptable situation ongoing in @EquinixUK #LD8 (HEX 8/9) right now. Reports [...]

Completely unacceptable situation ongoing in @EquinixUK #LD8 (HEX 8/9) right now. Reports of a fire alarm, but this triggered the loss of both our A+B diverse power feeds to our main rack since 04:24. Lack of communication is abysmal. @Equinix need to sort the basics out.

This is a major critical data centre that is used by many network operators/ISPs. HEX/LD8 is probably the 2nd most important after the Telehouse campus for interconnection in the UK for ISPs. So why can't @EquinixUK provide prompt updates at least? I'd prefer a fix ASAP though...

Latest update here: https://twitter.com/giganet_status/status/1295628653376634880

https://twitter.com/giganet_status/status/1295628653376634880

This is pure speculation of course, but this is/was schedule power maintenance tonight in LD8. It sounds like they are bringing this forward ASAP. The UPS died 17hrs too early? Coincidence??

https://status.giga.net.uk/incidents/d13072pxgc2x.

Planned Core Engineering Works - London Core POP - LD8 Power Feed Maintenance - 18/08/20

https://status.giga.net.uk/incidents/d13072pxgc2x

Finally at 12:04 PM, some 7 1/2 hours after the incident started, the first sign of Equinix on social media. It's 2020!

https://twitter.com/EquinixUK/status/1295677974944129024

https://twitter.com/EquinixUK/status/1295677974944129024

Here's an update from our side:- we're still down. No estimate fix time. Just completely unacceptable

. Here's our latest extensive summary of where we're at. One of my big pet peeves is not enough communication in times like this. We're doing our best. https://status.giga.net.uk/incidents/3zcfz8s8g43h?u=vcsx3jk49864

Equinix LD8 Data Centre POP Outage

We're still waiting for our network rack to regain power following Equinix and their contractors migrating power supplies onto the new infrastructure following the earlier fault. There is sadly still...

https://status.giga.net.uk/incidents/3zcfz8s8g43h?u=vcsx3jk49864

No visible fire

. Quite a queue to get in

.

No visible fire . Quite a queue to get in .

The queue is shorter. The reason for the queue seems to not be due to Covid but due to Equinix systems being offline in the DC for access control. So everything running manually over two way radio then phoned through somewhere else. Crazy times. This is a hell of an MBORC.

Latest update from #Equinix #ld8: “targeting all racks within LD8 to have power restored by or before 21:00 UK local time.”

It seems a lottery as to who gets their rack powered next? Reports coming in that many are back online

. 3rd floor seems to be particularly ‘dark’ still

. We have one rack online on 3rd that was migrated to their new power system last week. Other one adjacent is offline.

We have power!

Of sorts. Shame it took a few hours to be let in. Our core router is powered on a temp feed from our adjacent rack which never lost power as was migrated onto new power feed. Some carriers still offline. Most online though. Most affected customers back on too.

It's easy for me to tweet my emotions today, but I do respect the techs at Equinix currently working their nuts off to fix this right now

. There are so many techs here trying to effectively 're-wire' the DC in a day! How we've ended up in this situation, idk. RFO will help.

Managed to get some Equinix engineers to help get our rack re-energised. We’re now fully on the new infrastructure.

All systems online. Been a long day.

I’ve been assured by an engineer that the new A+B feeds are fed via independent A+B PDUs and then route to separate UPS systems. He mentioned 4 UPSs, so not sure if that’s N+1 on each PDU? Or something else but he seemed to suggest that previously A+B went to a single UPS!

Equinix's global social media account claiming that 'all services have now been restored' in #LD8. We're back online, but is everyone else?? We have reports @vmbusiness are still offline in LD8 and customers down on their circuits. https://twitter.com/Equinix/status/1295826590677504001

https://twitter.com/Equinix/status/1295826590677504001

Final VMB and a few a TTB circuits back online as of 22:24. They must have their rack powered then! We’re seeing all customers back online.

It’s public knowledge that there was a SPOF on the power system on Tuesday. The good news is that this SPOF has been removed by the remediation work engineers carried out, and we now have a Tier 4 grade set up on the UPS (higher num is better in DC tiers).

They aquired & inherited the flawed SPOF UPS system from their acquisition of HEX from Telecity. But it took them 4+ years to get around to improving this, and regretfully and cooincidentally were in the middle of the migration project when the old system failed. Intriguing...

I appreciate the investment didn’t come cheap. But I fail to accept it takes a 450% increase in cross connect fees to make this investment happen. Equinix have certainly adopted the Apple model with eco-system lock-in and high fees with no alternatives in this regard.

For our part, we didn’t do the due dil on the power infrastructure. We wrongly assumed that due to the significance of this facility, carriers present, it was sure to have great redundancy and A+B feeds were totally resilient. We’ll be more curious and sceptical in the future.

First bit of curiousity is finding out what tripped off the UPS to cause it to fail at 04:23. Although a SPOF design, it worked fine for years, and only failed when Equinix were early into their migration project to the new infra. Suspicious.
https://twitter.com/matthewskipsey/status/1297146760440184833?s=21 https://twitter.com/matthewskipsey/status/1297146760440184833

https://twitter.com/matthewskipsey/status/1297146760440184833?s=21

Latest Threads Unrolled: