Google (NSDQ: GOOG) engineer Steve Yegge was trying to start a robust internal discussion, not post a viral hit, when he published a 4,570-word self-styled rant about what he sees as the company’s greatest flaw to Google+. Unfortunately for Yegge, he didn’t check the settings and shared his view on Google’s failure to grasp platforms over products — including Google + — with everyone.
He later pulled it down on his own accord but he and Google aren’t asking that the copies already spread across the net be deleted. You can read the full post here and here, among other locations — and you should to get the real flavor about why Yegge thinks the company that does nearly everything right gets this fundamental so wrong. But a large chunk is also about his former employer Amazon (NSDQ: AMZN), what it does wrong and how Jeff Bezos — Steve Jobs “without the fashion or design sense” — got it so right. We’re posting a lengthy excerpt here for those of you interested in one engineer’s version of makes Amazon such a formidable presence:
Micro-managing isn’t that third thing that Amazon does better than us, by the way. I mean, yeah, they micro-manage really well, but I wouldn’t list it as a strength or anything. I’m just trying to set the context here, to help you understand what happened. We’re talking about a guy who in all seriousness has said on many public occasions that people should be paying him to work at Amazon. He hands out little yellow stickies with his name on them, reminding people “who runs the company” when they disagree with him. The guy is a regular… well, Steve Jobs, I guess. Except without the fashion or design sense. Bezos is super smart; don’t get me wrong. He just makes ordinary control freaks look like stoned hippies.
So one day Jeff Bezos issued a mandate. He’s doing that all the time, of course, and people scramble like ants being pounded with a rubber mallet whenever it happens. But on one occasion — back around 2002 I think, plus or minus a year — he issued a mandate that was so out there, so huge and eye-bulgingly ponderous, that it made all of his other mandates look like unsolicited peer bonuses.
His Big Mandate went something along these lines:
1) All teams will henceforth expose their data and functionality through service interfaces.
2) Teams must communicate with each other through these interfaces.
3) There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
4) It doesn’t matter what technology they use. HTTP, Corba, Pubsub, custom protocols — doesn’t matter. Bezos doesn’t care.
5) All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.
6) Anyone who doesn’t do this will be fired.
7) Thank you; have a nice day!
Ha, ha! You 150-odd ex-Amazon folks here will of course realize immediately that #7 was a little joke I threw in, because Bezos most definitely does not give a shit about your day.
#6, however, was quite real, so people went to work. Bezos assigned a couple of Chief Bulldogs to oversee the effort and ensure forward progress, headed up by Uber-Chief Bear Bulldog Rick Dalzell. Rick is an ex-Armgy Ranger, West Point Academy graduate, ex-boxer, ex-Chief Torturer slash CIO at Wal*Mart, and is a big genial scary man who used the word “hardened interface” a lot. Rick was a walking, talking hardened interface himself, so needless to say, everyone made LOTS of forward progress and made sure Rick knew about it.
Over the next couple of years, Amazon transformed internally into a service-oriented architecture. They learned a tremendous amount while effecting this transformation. There was lots of existing documentation and lore about SOAs, but at Amazon’s vast scale it was about as useful as telling Indiana Jones to look both ways before crossing the street. Amazon’s dev staff made a lot of discoveries along the way. A teeny tiny sampling of these discoveries included:
- every single one of your peer teams suddenly becomes a potential DOS attacker. Nobody can make any real forward progress until very serious quotas and throttling are put in place in every single service.
- monitoring and QA are the same thing. You’d never think so until you try doing a big SOA. But when your service says “oh yes, I’m fine”, it may well be the case that the only thing still functioning in the server is the little component that knows how to say “I’m fine, roger roger, over and out” in a cheery droid voice. In order to tell whether the service is actually responding, you have to make individual calls. The problem continues recursively until your monitoring is doing comprehensive semantics checking of your entire range of services and data, at which point it’s indistinguishable from automated QA. So they’re a continuum.
- if you have hundreds of services, and your code MUST communicate with other groups’ code via these services, then you won’t be able to find any of them without a service-discovery mechanism. And you can’t have that without a service registration mechanism, which itself is another service. So Amazon has a universal service registry where you can find out reflectively (programmatically) about every service, what its APIs are, and also whether it is currently up, and where.
- debugging problems with someone else’s code gets a LOT harder, and is basically impossible unless there is a universal standard way to run every service in a debuggable sandbox.
That’s just a very small sample. There are dozens, maybe hundreds of individual learnings like these that Amazon had to discover organically. There were a lot of wacky ones around externalizing services, but not as many as you might think. Organizing into services taught teams not to trust each other in most of the same ways they’re not supposed to trust external developers.
This effort was still underway when I left to join Google (NSDQ: GOOG) in mid-2005, but it was pretty far advanced. From the time Bezos issued his edict through the time I left, Amazon had transformed culturally into a company that thinks about everything in a services-first fashion. It is now fundamental to how they approach all designs, including internal designs for stuff that might never see the light of day externally.
At this point they don’t even do it out of fear of being fired. I mean, they’re still afraid of that; it’s pretty much part of daily life there, working for the Dread Pirate Bezos and all. But they do services because they’ve come to understand that it’s the Right Thing. There are without question pros and cons to the SOA approach, and some of the cons are pretty long. But overall it’s the right thing because SOA-driven design enables Platforms.
That’s what Bezos was up to with his edict, of course. He didn’t (and doesn’t) care even a tiny bit about the well-being of the teams, nor about what technologies they use, nor in fact any detail whatsoever about how they go about their business unless they happen to be screwing up. But Bezos realized long before the vast majority of Amazonians that Amazon needs to be a platform.
You wouldn’t really think that an online bookstore needs to be an extensible, programmable platform. Would you?
Well, the first big thing Bezos realized is that the infrastructure they’d built for selling and shipping books and sundry could be transformed an excellent repurposable computing platform. So now they have the Amazon Elastic Compute Cloud, and the Amazon Elastic MapReduce, and the Amazon Relational Database Service, and a whole passel’ o’ other services browsable at aws.amazon.com. These services host the backends for some pretty successful companies, reddit being my personal favorite of the bunch.
The other big realization he had was that he can’t always build the right thing. I think Larry Tesler might have struck some kind of chord in Bezos when he said his mom couldn’t use the goddamn website. It’s not even super clear whose mom he was talking about, and doesn’t really matter, because nobody’s mom can use the goddamn website. In fact I myself find the website disturbingly daunting, and I worked there for over half a decade. I’ve just learned to kinda defocus my eyes and concentrate on the million or so pixels near the center of the page above the fold.
I’m not really sure how Bezos came to this realization — the insight that he can’t build one product and have it be right for everyone. But it doesn’t matter, because he gets it. There’s actually a formal name for this phenomenon. It’s called Accessibility, and it’s the most important thing in the computing world.
The. Most. Important. Thing.
If you’re sorta thinking, “huh? You mean like, blind and deaf people Accessibility?” then you’re not alone, because I’ve come to understand that there are lots and LOTS of people just like you: people for whom this idea does not have the right Accessibility, so it hasn’t been able to get through to you yet. It’s not your fault for not understanding, any more than it would be your fault for being blind or deaf or motion-restricted or living with any other disability. When software — or idea-ware for that matter — fails to be accessible to anyone for any reason, it is the fault of the software or of the messaging of the idea. It is an Accessibility failure.
Like anything else big and important in life, Accessibility has an evil twin who, jilted by the unbalanced affection displayed by their parents in their youth, has grown into an equally powerful Arch-Nemesis (yes, there’s more than one nemesis to accessibility) named Security. And boy howdy are the two ever at odds.
But I’ll argue that Accessibility is actually more important than Security because dialing Accessibility to zero means you have no product at all, whereas dialing Security to zero can still get you a reasonably successful product such as the Playstation Network.