bfgeek 9 hours ago

> "At this point, the engineers in Australia decided that a brute-force approach to their safe problem was warranted and applied a power drill to the task. An hour later, the safe was open—but even the newly retrieved cards triggered the same error message."

What happened here (from what I recall) was far funnier than this does it credit.

The SREs first attempted to use a mallet (hammer) on the safe (which they had to first buy from the local hardware store - don't worry it got expensed later), then after multiple rounds of "persuasion" they eventually called in a professional (aka. a locksmith) who used a drill+crowbar to finally liberate the keycard.

The postmortem had fun step by step photos of the safe in various stages of disassembly.

mtlynch 7 hours ago

Sorry for the offtopic comment, but it's bizarre to me that Google is hosting their book on Github with a github.io domain. Their previous two SRE books are hosted at https://sre.google on Google-owned IPs.[0]

What was that decision process? "We're Google, and we're literally writing a book about how good we are at hosting services. But hosting some static HTML files that are almost entirely text? That's a tough one. We'd better outsource that to one of our competitors."

[0] https://sre.google/books/

  • nashashmi 6 hours ago

    I think one is a portal for GitHub developers, while the other is a public polished site. I reminisced the early Google forthright attitude that made life so simple and human.

Vexs 27 minutes ago

> restart required a hardware security module (HSM) smart card.

Out of curiosity, does anyone know why? My guess would be the PW DB would be encrypted with some token generated from this card.

I've had lots of "I have a secret and the server needs it" type problems but I've never been very happy with my solutions- smart cards seem like potentially an elegant solution.

kmoser 8 hours ago

> It took an additional hour for the team to realize that the green light on the smart card reader did not, in fact, indicate that the card had been inserted correctly.

I'm not sure which is worse: bad UI/UX use of lights, or inadequately trained engineers who misunderstood the lights.

  • GuB-42 7 hours ago

    I'd go with bad UI/UX.

    A lot of progress has been made by acknowledging that people are idiots and that the system has to work around that. Toyota, which went from one of the worst to one the most reliable automaker is known for formalizing idiot-proofing.

    If the reader was able to read the card both way, there wouldn't have been a problem and no training required. The next best thing would be for the card to not fit upside down. Or have a clear message "try flipping the card". It is not something you should train people for, it should be obvious.

    I also suspect the reader was in an unusual configuration, because everyone knows how to use smart cards and they probably did what they always do instinctively and it didn't work. On the thousands of times I did it, I don't remember having ever inserted my credit card the wrong way and don't remember anyone who did, it is just so instinctive. For an entire team to miss that, there must be something wrong with how the reader is set up.

    • chasing0entropy 4 hours ago

      Agree.

      The fundamental lesson of at least half my information systems undergraduate courses was you adapt the system to observed user behavior, do not expect the user to adapt their behavior to the system.

  • numpad0 3 hours ago

    If it's not obvious to multiple Google SREs and no instruction sticker was present, that's a bad UI.

kingforaday 5 hours ago

What I really like about this story is that Google for all that they are still have normal fallible people just like us behind the scenes.

netsharc 9 hours ago

What is this, sitcom slapstick? The slapstick of storing the security combination to the safe on the system that is locked by the card which inside the safe; and the slapstick of "You're inserting it wrong"...

lanthade 9 hours ago

The power drill mention in the headline is a bit click-baity because in the end while a power drill was used it was unnecessary and was not the solution to the problem. Had they known how to properly use the hardware security devices they had the power drill wouldn't have been deployed at all.

  • jumhyn 8 hours ago

    But the additional cards may very well have been necessary to understand “there is something wrong with our usage of the cards, this error is not a one-off failure due to corrupted data or broken hardware or other problem local to the California card(s)”. Having multiple independent reproductions of an issue helps you narrow down what the commonalities are!

  • Thorrez 9 hours ago

    Well, it would have been necessary if they hadn't managed to find the employee in California who had the password for the California safe memorized.

    • hshdhdhehd 9 hours ago

      There are lots of alternative ways this could have played out. Yes.

  • Daviey 7 hours ago

    Sorry, but someone happening to have memory of the combination can also not be considered an adequate solution.

    • reader9274 7 hours ago

      He clearly had the combination written down

      • Noumenon72 6 hours ago

        The text says "Fortunately, another colleague in California had memorized the combination to the on-site safe". You might think that's unlikely and he probably wrote it down, but it's not "clear" from the text.